[Haskell-cafe] Space leak

Mon Mar 15 05:16:42 EDT 2010

Arnoldo Muller <arnoldomuller at gmail.com> writes:

> I am trying to use haskell in the analysis of bio data. One of the main
> reasons I wanted to use haskell is because lazy I/O allows you to see a
> large bio-sequence as if it was a string in memory.

Funny you should mention it.  I've written a bioinformatics library¹ that
(naturally) supports reading and writing various file formats for
sequences and alignments and stuff.

Some of these files can be substantial in size (i.e., larger than my
laptop's memory), so most IO of potentially large files (Fasta, BLAST
XMl output, 454 SFF files...) are read lazily, and large Fasta sequences
are read as lazy bytestrings.

This works nicely for a lot of use cases (well, my use cases, at any
rate, wich quite often boils down to streaming through the data).  One
thing to look out for is O(n) indexed access to lazy bytestrings, so
there's a defragment operation that converts a sequence to a single
chunk (which gives O(1) access, but of course must fit into memory). 

I guess the most annoying thing about laziness is that small test cases
always work, you need Real Data to stress test your programs for
excessive memory use.

Lazy IO always worked well for me, so althouhg I feel I should look more
deeply into "real" solutions, like Iteratee, my half-hearted attemts to
do so have only resulted in the conclusion that it was more complicated,
and thus postponed for some rainy day... lazy IO for lazy programmers, I
guess. 

-k

¹ Stuff's on Hackage in the bioinformatics section and also on
http://blog.malde.org and http//malde.org/~ketil/bioinformatics.
-- 
If I haven't seen further, it is by standing in the footprints of giants