[Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

Manlio Perillo manlio_perillo at libero.it
Thu Feb 26 16:14:56 EST 2009


Kenneth Hoste ha scritto:
> [...]
> However, as I posted yesterday, I've been able to circumvent the issue 
> by rethinking my data type, i.e. using
> the ~18K movie IDs as key instead of the 480K user IDs, which radically 
> limits the overhead...

Well, but what if you really need the original data structure, for 
better data processing?

> That way, I'm able to fit the data set in <700M of memory, without 
> having to reorganize the raw data.
> 
>> The uvector package implements a vector of unboxed types, and has an 
>> snocU operation, to append an element to the array.
>>
>> I don't know how efficient it is, however.
> 
> 
>> By the way, about uvector: it has a Stream data type, and you can 
>> build a vector from a stream.
> 
> Thanks for letting me know, I'll keep this in mind.
> 

Let me know if there are performance improvements.

Arrays are one of the few things I dislike in Haskell, and all the 
available array/vector packages cause me some confusion.




Regards   Manlio


More information about the Haskell-Cafe mailing list