[Haskell-cafe] memory-efficient data type for Netflix data -
UArray Int Int vs UArray Int Word8
Manlio Perillo
manlio_perillo at libero.it
Thu Feb 26 16:14:56 EST 2009
Kenneth Hoste ha scritto:
> [...]
> However, as I posted yesterday, I've been able to circumvent the issue
> by rethinking my data type, i.e. using
> the ~18K movie IDs as key instead of the 480K user IDs, which radically
> limits the overhead...
Well, but what if you really need the original data structure, for
better data processing?
> That way, I'm able to fit the data set in <700M of memory, without
> having to reorganize the raw data.
>
>> The uvector package implements a vector of unboxed types, and has an
>> snocU operation, to append an element to the array.
>>
>> I don't know how efficient it is, however.
>
>
>> By the way, about uvector: it has a Stream data type, and you can
>> build a vector from a stream.
>
> Thanks for letting me know, I'll keep this in mind.
>
Let me know if there are performance improvements.
Arrays are one of the few things I dislike in Haskell, and all the
available array/vector packages cause me some confusion.
Regards Manlio
More information about the Haskell-Cafe
mailing list