[Haskell-beginners] Unexpected Space Leak (despite using
external-sort)
Stephen Blackheath [to Haskell-Beginners]
mutilating.cauliflowers.stephen at blacksapphire.com
Mon Jan 4 00:41:56 EST 2010
Peter,
See below...
> Thanks Steve. I think I understand your point. It seems like
> external-sort needs patching to make block comparison more efficient for
> everyone and that my case is just particularly extreme/degenerate. Just
> right now, I'm not yet confident enough about my Haskell abilities to go
> and create a new type for 'block' with more efficient Ord and then
> modify the rest of the external-sort code accordingly. Posted program
> was the 2nd Haskell prog I've every written. I'll get there in time,
> though :).
Laziness and space analysis make up a significant part of Haskell's
learning curve, I believe, so you're doing well.
Here's how I'm suggesting you change kMerge
> kMerge :: (Ord a) => [[a]] -> [a]
> kMerge [] = []
> kMerge l =
> let h = Splay.fromSeq l in
> kM (Splay.minElem h) (Splay.deleteMin h)
> where
> kM :: (Ord a) => [a] -> Splay.Heap [a] -> [a]
> kM l h
> | h == Splay.empty = l
> | otherwise =
> let next = Splay.minElem h
> (f, b) = span (\x -> x <= head next) l
> in
> f ++ (kM next (if null b then Splay.deleteMin h
> else (Splay.insert b $ Splay.deleteMin h)))
becomes something like... (with lots of wrapping and unwrapping)...
> newtype HeadCompList a = HeadCompList [a]
>
> instance Ord a => Ord (HeadCompList a) where
> HeadCompList (x:_) `compare` HeadCompList (y:_) = x `compare` y
-- compare head elts only
>
> kMerge :: (Ord a) => [[a]] -> [a]
> kMerge [] = []
> kMerge l =
> let wrappedL = map HeadCompList l
> h = Splay.fromSeq wrappedL in
> kM (Splay.minElem h) (Splay.deleteMin h)
> where
> kM :: (Ord a) => HeadCompList a -> Splay.Heap (HeadCompList a) -> [a]
> kM (HeadCompList l) h
> | h == Splay.empty = l
> | otherwise =
> let (HeadCompList next) = Splay.minElem h
> (f, b) = span (\x -> x <= head next) l
> in
> f ++ (kM next (if null b then Splay.deleteMin h
> else (Splay.insert (HeadCompList b) $
Splay.deleteMin h)))
> I wonder if there is another way of making my code work better with the
> existing externalSort function:
>
> I am sorting [[Int]], an example of which might be:
>
> [[8, 11, 7, 10],
> [8, 11, 13, 10],
> [12, 11, 7, 10],
> [12, 11, 13, 10],
> [1, 1, 3, 4],
> .
> .
> [3, 9, 6, 4]]
>
> I need numerical lex sorting, so I can't just map directly to ByteString
> as this would result in "12" collating before "1", etc.
>
> I know that each sublist I am sorting never contains an integer greater
> than 20. I could encode each sublist as a ByteString where 1 maps to a,
> 2 maps to B, 3, maps to C, etc.
>
> So would have [1,1,3,4] -> "aacd" (ByteString), etc.
>
> Hopefully Ord comparisons between ByteStrings are much less expensive
> than comparisons between instances of [Int] and I will see nicer
> behaviour since external-sort does work as advertised on lists of
> millions of Ints?
Well, if I'm right, then the problem is that kMerge is evaluating too
much of each of its ... in your case .. [[Int]] values. You're
proposing to change it to [ByteString]. Again, if I'm right, that would
improve it by exactly the factor of space usage by ByteString / [Int],
without solving the fundamental problem.
> So I could sort these, then map back to underlying Int data when I
> output. Schwartzian transform type stuff.
>
> Anyway, I'll give it a go.
I may be wrong!
Steve
More information about the Beginners
mailing list