[Haskell-cafe] Elegant external sorting

Bulat Ziganshin bulat.ziganshin at gmail.com
Thu Jan 22 12:17:30 EST 2009


Hello haskell-cafe,

SPJ asked us a few years ago about examples of simple and elegant
haskell programs. i want to propose this external sorting program: it
reads stdin in chunks of 5000 lines, writes sorted chunks into files
and then merges files together:

import Data.List
import Control.Monad
import Control.Arrow

main = do -- (Lazily) create sorted sublists
          sublists <- fmap chop_and_sort getContents
          -- Write the sublists into files named 1..n
          n <- fmap length $ zipWithM writeFile (map show [1..]) sublists
          -- Read (again lazily) contents of these files
          sublists <- mapM (fmap lines.readFile.show) [1..n]
          -- Merge them and write result to stdout
          putStr (unlines (merge_lists compare sublists))

-- Split input list into 5000-line chunks and sort them
chop_and_sort = lines >>> recursive (splitAt 5000) >>> map (sort>>>unlines)

-- unfoldr variant that stops on empty list
recursive splitFunc  =  unfoldr (\xs -> guard (xs>[]) >> Just (splitFunc xs))


-- Merging sorted lists (kidnapped from Data.List)
merge_lists :: (a -> a -> Ordering) -> [[a]] -> [a]
merge_lists cmp [] = []
merge_lists cmp [xs] = xs
merge_lists cmp xss = merge_lists cmp (merge_pairs cmp xss)

merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]]
merge_pairs cmp [] = []
merge_pairs cmp [xs] = [xs]
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        _  -> x : merge cmp    xs (y:ys)

  

-- 
Best regards,
 Bulat                          mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list