[Haskell-cafe] sort and lazyness (?)

Bayley, Alistair Alistair.Bayley at invesco.com
Fri Dec 19 09:41:49 EST 2008


> Currently, I'm experiencing what I would call "strange behaviour":
> 
> I've got a data-type
> 
> data Fraction = Fraction Int Int
> 
> to hold rational numbers (maybe there's already some built-in 
> type for this in Haskell,

http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Ratio.html


> This list has up to 3 million elements.  If I do
> 
> main = print $ length $ points
> 
> main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
> 
> main = print $ length $ reverse points
> 
> However, trying to do
> 
> import List
> main = print $ length $ sort points
> 
> makes memory usage go up and the program does not finish in 15m, also 
> spending most time waiting for swapped out memory.  What am I doing 
> wrong, why is sort this expensive in this case?  I would assume that 
> computing and holding the whole list does not take too much memory, 
> given its size and data type; doing the very same calculation in C 
> should be straight forward.  And sort should be O(n * log n) for time 
> and also not much more expensive in memory, right?

Not having looked at your code, I think you are benefiting from
fusion/deforestation in the first three main functions. In this case,
although you may appear to be evaluating the entire list, in fact the
list elements can be discarded as they are generated, so functions like
length and reverse can run using constant space, rather than O(n) space.

The sort function, however, requires that the entire list is retained,
hence greater memory usage. I also think you are optimistic in the
memory requirements of your 3 million element list. A list of Ints will
take a lot more than 4 bytes per element (on 32-bit machines) because
there's overhead for the list pointers, plus possibly the boxes for the
Ints themselves. I think there are 3 machine words for each list entry
(pointer to this element, pointer to next element, info-table pointer),
and 2 words for each Int, but I'm just guessing:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObje
cts

You might get some mileage by suggesting to GHC that your Fraction type
is strict e.g.

> data Fraction = Fraction !Int !Int

which might persuade it to unbox the Ints, giving some space savings.

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************



More information about the Haskell-Cafe mailing list