[Haskell-cafe] Interesting folds over bytestring lists?

Thu Sep 20 17:29:36 EDT 2007

On 9/20/07, Justin Bailey <jgbailey at gmail.com> wrote:
> Your email makes me think I should work directly with a list of strict
> bytestrings, but in that case what happens when I want to take a large
> chunk of strings out of the middle of the list? Would that be an O(n)
> operation?

I used a list of strict bytestrings for this task and got pretty good
performance (40+k iterations per second, around 40 seconds to run
endo.dna)

A strict bytestring looks like this:

import qualified Data.ByteString.Base as B
B.PS fptr offset length
(PS means Packed String)

fptr :: ForeignPtr Word8
   fptr holds a safe pointer an immutable block of memory allocated
for a bytestring.  This buffer can be shared between multiple
bytestrings.

offset :: Int
    offset is the offset from the beginning of the bytestring.

length :: Int
    length is the length of the bytestring (in bytes)

This makes the take and drop functions on bytestrings extremely fast:

take n s@(B.PS fptr offset length)
  | n <= 0 = B.empty
  | n >= length = s
  | otherwise = B.PS fptr offset n

drop n s@(B.PS fptr offset length)
  | n <= 0 = s
  | n >= length = B.empty
  | otherwise = B.PS fptr (offset + n) (length - n)

(spoiler follows)

...

...

...

...

...

...

What you will find, however, is that a list of bytestrings isn't
-quite- good enough; the first thing the DNA does is create a huge
number of tiny copies of a single base.  You can solve this by setting
a threshold number of chunks in the DNA list, and "garbage collecting"
the result into a single bytestring every time the number of chunks
grows too high.

  -- ryan