[Haskell-cafe] Re: sort and lazyness (?)
Daniel Kraft
d at domob.eu
Fri Dec 19 09:58:52 EST 2008
Bayley, Alistair wrote:
>> Currently, I'm experiencing what I would call "strange behaviour":
>>
>> I've got a data-type
>>
>> data Fraction = Fraction Int Int
>>
>> to hold rational numbers (maybe there's already some built-in
>> type for this in Haskell,
>
> http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Ratio.html
Thanks for the pointer, I knew there would be something already there :)
>> This list has up to 3 million elements. If I do
>>
>> main = print $ length $ points
>>
>> main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
>>
>> main = print $ length $ reverse points
>>
>> However, trying to do
>>
>> import List
>> main = print $ length $ sort points
>>
>> makes memory usage go up and the program does not finish in 15m, also
>> spending most time waiting for swapped out memory. What am I doing
>> wrong, why is sort this expensive in this case? I would assume that
>> computing and holding the whole list does not take too much memory,
>> given its size and data type; doing the very same calculation in C
>> should be straight forward. And sort should be O(n * log n) for time
>> and also not much more expensive in memory, right?
>
> Not having looked at your code, I think you are benefiting from
> fusion/deforestation in the first three main functions. In this case,
> although you may appear to be evaluating the entire list, in fact the
> list elements can be discarded as they are generated, so functions like
> length and reverse can run using constant space, rather than O(n) space.
How does reverse work in constant space? At the moment I can't imagine
it doing so; that's why I tried it, but of course you could be right.
> The sort function, however, requires that the entire list is retained,
> hence greater memory usage. I also think you are optimistic in the
> memory requirements of your 3 million element list. A list of Ints will
> take a lot more than 4 bytes per element (on 32-bit machines) because
> there's overhead for the list pointers, plus possibly the boxes for the
> Ints themselves. I think there are 3 machine words for each list entry
> (pointer to this element, pointer to next element, info-table pointer),
> and 2 words for each Int, but I'm just guessing:
> http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObje
> cts
Of course that's the case, but the list being 3 million elements, and
not, say 100 (which would still fit into memory for a simple C array of
ints) I thought would make it possible. Otherwise, how can one handle
such amounts in data anyway? Only using arrays?
> You might get some mileage by suggesting to GHC that your Fraction type
> is strict e.g.
>
>> data Fraction = Fraction !Int !Int
>
> which might persuade it to unbox the Ints, giving some space savings.
I already tried so, but this doesn't change anything to the performance.
I will however try now to use the provided rational type, maybe this
helps.
Thanks for the answers,
Daniel
More information about the Haskell-Cafe
mailing list