[Haskell-cafe] Re: Space Efficiency When Sorting a List of Many
Lists
Heinrich Apfelmus
apfelmus at quantentunnel.de
Thu Dec 31 11:27:27 EST 2009
Felipe Lessa wrote:
> Luke Palmer wrote:
>> But if you're serious, you can probably do better than just generating
>> them all and passing them to sort. I get the impression that there is
>> some structure here that can be taken advantage of.
>
> Isn't what he wants a trie? In particular, a Patricia trie?
Peter, this is a very nice problem. Is this a programming exercise or
did you encounter it in the "real world"?
There is indeed a structure that can be taken advantage of and it
involves tries.
The key point is that thanks to the lexicographic ordering, you can
*interleave* exploding and sorting the rows. In other word, we can
exploit the fact that for example
(sort . cartesian) ([8,12]:[11]:[7,13]:[10]:[])
= [8 : (sort . cartesian) ([11]:[7,13]:[10]:[]) ]
++ [12 : (sort . cartesian) ([11]:[7,13]:[10]:[]) ]
where cartesian denotes the cartesian product. The code will mainly
work with functions like
type Row a = [a]
headsIn :: [Row a] -> [(a, [Row a])]
which groups rows by their first element. The result type is best
understood as a finite map from a to [Row a]
headsIn :: [Row a] -> Map a [Row a]
And unsurprisingly, the fixed point of the (Map a) functor is the
trie for [a] .
Without much explanation, here the full formulation in terms of
catamorphisms and anamorphisms.
{-# LANGUAGE NoMonomorphismRestriction #-}
import qualified Data.Map
import Control.Arrow (second)
-- the underlying structure
newtype Map a b = Map { unMap :: [(a,b)] } deriving (Eq, Show)
-- category theory: bananas and lenses
instance Functor (Map a) where
fmap f = Map . (map . second) f . unMap
newtype Fix f = In { out :: f (Fix f) }
cata f = In . fmap (cata f) . f
ana f = f . fmap (ana f) . out
-- very useful type synonym to keep track of rows and colums
type Row a = [a]
-- grouping and "ungrouping" by the first elements of each row
headsIn :: [Row a] -> Map a [Row a]
headsIn xss = Map [(x,[xs]) | x:xs <- xss]
headsOut :: Map a [Row a] -> [Row a]
headsOut (Map []) = [[]]
headsOut xxs = [x:xs | (x,xss) <- unMap xxs, xs <- xss]
-- cartesian product
cartesian1 :: Row [a] -> Map a (Row [a])
cartesian1 [] = Map []
cartesian1 (xs:xss) = Map [(x,xss) | x <- xs]
cartesian = ana headsOut . cata cartesian1
-- sorting
sort1 :: Ord a => Map a [b] -> Map a [b]
sort1 = Map . Data.Map.toList . Data.Map.fromListWith (++) . unMap
sortRows = ana headsOut . cata (sort1 . headsIn)
-- and cold fusion!
-- sortCartesian = sortRows . cartesian
-- best written as hylomorphism
sortCartesian = ana headsOut . cata (sort1 . cartesian1)
This is readily extended to handle the explode function as well. And
thanks to lazy evaluation, I expect this to run with a much better
memory footprint.
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Haskell-Cafe
mailing list