[Haskell-cafe] Elegant external sorting
Bulat Ziganshin
bulat.ziganshin at gmail.com
Thu Jan 22 12:17:30 EST 2009
Hello haskell-cafe,
SPJ asked us a few years ago about examples of simple and elegant
haskell programs. i want to propose this external sorting program: it
reads stdin in chunks of 5000 lines, writes sorted chunks into files
and then merges files together:
import Data.List
import Control.Monad
import Control.Arrow
main = do -- (Lazily) create sorted sublists
sublists <- fmap chop_and_sort getContents
-- Write the sublists into files named 1..n
n <- fmap length $ zipWithM writeFile (map show [1..]) sublists
-- Read (again lazily) contents of these files
sublists <- mapM (fmap lines.readFile.show) [1..n]
-- Merge them and write result to stdout
putStr (unlines (merge_lists compare sublists))
-- Split input list into 5000-line chunks and sort them
chop_and_sort = lines >>> recursive (splitAt 5000) >>> map (sort>>>unlines)
-- unfoldr variant that stops on empty list
recursive splitFunc = unfoldr (\xs -> guard (xs>[]) >> Just (splitFunc xs))
-- Merging sorted lists (kidnapped from Data.List)
merge_lists :: (a -> a -> Ordering) -> [[a]] -> [a]
merge_lists cmp [] = []
merge_lists cmp [xs] = xs
merge_lists cmp xss = merge_lists cmp (merge_pairs cmp xss)
merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]]
merge_pairs cmp [] = []
merge_pairs cmp [xs] = [xs]
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys)
= case x `cmp` y of
GT -> y : merge cmp (x:xs) ys
_ -> x : merge cmp xs (y:ys)
--
Best regards,
Bulat mailto:Bulat.Ziganshin at gmail.com
More information about the Haskell-Cafe
mailing list