[Haskell-cafe] Mixing Unboxed Mutable Vectors and Parsers

Sat Apr 7 23:35:17 CEST 2012

CC: Maintainers of STMonadTrans, Vector, and JuicyPixels

Hello,
I am writing a Haskell Attoparsec parser which will modify 2-d arrays
of small values (Word8, Int8, etc.).

My first idea was to simply parse all the deltas, and later apply them
to the input list. However, I can't do that because the value of the
deltas depend on the value they're modifying.

My first pass at this program used a function of the form:

p :: [[Word8]] -> Parser [[Word8]]

This approach works, however, the program uses far too much memory.
Some quick testing shows that lists of Word8s are ~52.6x larger than
unboxed vectors of Word8s, and boxed vectors of Word8s are ~7.5x
larger than unboxed vectors of Word8s. A better approach would be to
use Data.Vector.Unboxed.Mutable and do the mutations inline with the
parser. Because mutable vectors require a monad in PrimMonad to do the
mutations inside of, I'd have to use a monad transformer to combine
Parser and something in PrimMonad. Attoparsec doesn't support being
used as a monad transformer, so I can't say something like

p :: (PrimMonad m, UM.Unbox a) => M.MVector (PrimState m) (UM.MVector
(PrimState m) a) -> ParserT m ()

I can't use Parsec (instead of Attoparsec) because I require streaming
semantics -- eventually I'm going to hook this up to Data.Conduit and
parse directly from the net.

There is STT (in the package STMonadTrans), however, so I could
potentially make the function result in STT Parser (). However, STT
doesn't work with Data.Vector.Mutable or Data.Vector.Unboxed.Mutable,
because STT isn't a member of the PrimMonad class (as far as I can
tell). STT itself doesn't define unboxed mutable vectors (only boxed
mutable vectors), but I feel that giving up unboxing isn't really an
option because of the memory footprint.

As a general observation, it seems silly to have two different mutable
vector implementations, one for STT and the other for PrimMonad.

So here are my questions:
1. Is it possible to implement PrimMonad with STT? I looked around for
a little while, but couldn't find anything that did this.
2. Otherwise, is it reasonable to try to implement unboxed mutable
vectors in STT? I feel this is probably going down the wrong path.
3. Are there any parsers that support streaming semantics and being
used as a monad transformer? This would require rewriting my whole
program to use this new parser, but if that's what I have to do, then
so be it.

Thanks,
Myles C. Maxfield