[Haskell-cafe] External Sort: Sort a 10-million integer file with
just 256M of ram.
Thomas Hartman
tphyahoo at gmail.com
Thu Oct 23 12:41:04 EDT 2008
Now on hackage: cabal install external-sort
demo (included in distribution):
thartman at thartman-laptop:~/external-sort>cat demo.hs
{-# LANGUAGE PatternSignatures #-}
import Algorithms.ExternalSort
import Data.List
import System.IO
import System.Environment (getArgs)
import System.Time
import HSH
-- to do: compare speed against unix sort util on a 10 million line file.
-- pure in-memory prelude sort will crash your computer when the list
gets over a million elements or so
-- externalsort caches the sublists used in the sort algorithm on your
hard drive, so you can sort a much larger list.
{-
The behavior below was on a demo executable, compiled. (In ghci, even
last on a 10 million element list
caused an out of memory error.) The test computer had 256M physical
ram and was ulimited to 256M cache.
*Main>:! ulimit -v
262144
For 10 million element list:
*Main> :! time ./demo preludesort 7
demo: out of memory (requested 1048576 bytes)
Command exited with non-zero status 1
4.88user 0.68system 0:21.11elapsed 26%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+64817minor)pagefaults 0swaps
*Main> :! time ./demo externalsort 7
10000000
73.87user 1.96system 1:24.25elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
792inputs+156280outputs (6major+16739minor)pagefaults 0swaps
./demo unixsort 7
wrote bigfile, time: Mon Oct 20 15:25:26 CEST 2008
demo: out of memory (requested 1048576 bytes)
For 100 million element list, external sort failed. Can it be made to
work? maybe in some future version.
time ./demo externalsort 8
demo: out of memory (requested 1048576 bytes)
real 10m14.061s
user 8m26.712s
sys 0m11.793s
thartman at thartman-laptop:~/external-sort>ls -lh ExternalSort.bin
-rw-r--r-- 1 thartman thartman 764M Oct 20 15:50 ExternalSort.bin
The problem is not fitting a 10^8 element list in memory, the
following works fine
(when compiled, though not in ghci):
t = putStrLn . show . last $ [1..10^8::Int]
Maybe think about this more later.
-}
main = do
[s,e] <- getArgs
let exp = read e
case s of
"preludesort" -> sortwith exp $ return . sort
"externalsort" -> sortwith exp externalSort
"unixsort" -> unixsort exp
_ -> let msg = "usage: ./demo preludesort 7 or ./demo externalsort
7 or ./demo unixsort 7 \
\(sort 10 million element list)"
in fail msg
sortwith exp s = putStrLn =<< return . show . last =<< s ([1..10^exp ::Int])
unixsort exp = do
let fn = "bigfile"
withFile fn AppendMode (\h -> (mapM_ (hPutStrLn h . show)
([1..10^exp::Int]) ) )
putStrLn . ( ("wrote " ++ fn ++ ", time: ") ++ ) . show =<< getClockTime
run $ "time tail -n1 | sort " ++fn :: IO String
return ()
t = putStrLn . show . last $ [1..10^8::Int]
More information about the Haskell-Cafe
mailing list