[Haskell-cafe] newbie question about list performance
Jules Bean
jules at jellybean.co.uk
Mon Oct 29 03:08:38 EDT 2007
John Lato wrote:
> I'm working
> with moderate-sized files (tens to hundreds of MBs) that have some
> ascii header data followed by a bunch of 32-bit ints.
> but I don't know if [Int32] is actually the best choice. It seems to me
> that something like a lazy list of strict arrays (analogous to a lazy
> bytestring) would be better.
Depends on your data access pattern. If you access the words strictly
linearly, from the beginning of the file to the end, and that's all,
then [Int32] is absolutely fine. A list is a data-structure equivalent
of a for loop; it's the correct structure if you are dealing with things
linearly or nearly-linearly. If you were using adjacent words together,
that would be fine too (as in, e.g., zip xs (tail xs)).
If your data access pattern is more scattered or random-access in style,
then [Int32] does not scale well to 10s of MBs. If you keep the data
around, the overhead for [] is inappropriate (around 600-800% memory
usage overhead on [Int32]) and its performance guarantees are not good
either, for random access. In this case, as a first approximation, I
would be inclined to try a library which simple backended onto lazy
bytestring. For example the 'index' operation to fetch a single word
would fetch four bytes and bit-twiddle them into a word. If that doesn't
give the high speed you're after, then perhaps something *like* LBS,
i.e. foreignptr behind the scenes, but directly accessing word-at-a-time.
Jules
More information about the Haskell-Cafe
mailing list