[Haskell-cafe] newbie question about list performance

Jules Bean jules at jellybean.co.uk
Mon Oct 29 03:08:38 EDT 2007


John Lato wrote:
> I'm working
> with moderate-sized files (tens to hundreds of MBs) that have some
> ascii header data followed by a bunch of 32-bit ints.

> but I don't know if [Int32] is actually the best choice.  It seems to me
> that something like a lazy list of strict arrays (analogous to a lazy
> bytestring) would be better.  

Depends on your data access pattern. If you access the words strictly 
linearly, from the beginning of the file to the end, and that's all, 
then [Int32] is absolutely fine. A list is a data-structure equivalent 
of a for loop; it's the correct structure if you are dealing with things 
linearly or nearly-linearly. If you were using adjacent words together, 
that would be fine too (as in, e.g., zip xs (tail xs)).

If your data access pattern is more scattered or random-access in style, 
then [Int32] does not scale well to 10s of MBs. If you keep the data 
around, the overhead for [] is inappropriate (around 600-800% memory 
usage overhead on [Int32]) and its performance guarantees are not good 
either, for random access. In this case, as a first approximation, I 
would be inclined to try a library which simple backended onto lazy 
bytestring. For example the 'index' operation to fetch a single word 
would fetch four bytes and bit-twiddle them into a word. If that doesn't 
give the high speed you're after, then perhaps something *like* LBS, 
i.e. foreignptr behind the scenes, but directly accessing word-at-a-time.

Jules


More information about the Haskell-Cafe mailing list