[Haskell-cafe] Interesting folds over bytestring lists?
Ryan Ingram
ryani.spam at gmail.com
Thu Sep 20 17:29:36 EDT 2007
On 9/20/07, Justin Bailey <jgbailey at gmail.com> wrote:
> Your email makes me think I should work directly with a list of strict
> bytestrings, but in that case what happens when I want to take a large
> chunk of strings out of the middle of the list? Would that be an O(n)
> operation?
I used a list of strict bytestrings for this task and got pretty good
performance (40+k iterations per second, around 40 seconds to run
endo.dna)
A strict bytestring looks like this:
import qualified Data.ByteString.Base as B
B.PS fptr offset length
(PS means Packed String)
fptr :: ForeignPtr Word8
fptr holds a safe pointer an immutable block of memory allocated
for a bytestring. This buffer can be shared between multiple
bytestrings.
offset :: Int
offset is the offset from the beginning of the bytestring.
length :: Int
length is the length of the bytestring (in bytes)
This makes the take and drop functions on bytestrings extremely fast:
take n s@(B.PS fptr offset length)
| n <= 0 = B.empty
| n >= length = s
| otherwise = B.PS fptr offset n
drop n s@(B.PS fptr offset length)
| n <= 0 = s
| n >= length = B.empty
| otherwise = B.PS fptr (offset + n) (length - n)
(spoiler follows)
...
...
...
...
...
...
What you will find, however, is that a list of bytestrings isn't
-quite- good enough; the first thing the DNA does is create a huge
number of tiny copies of a single base. You can solve this by setting
a threshold number of chunks in the DNA list, and "garbage collecting"
the result into a single bytestring every time the number of chunks
grows too high.
-- ryan
More information about the Haskell-Cafe
mailing list