[Haskell-cafe] Re: Abstraction leak

Donald Bruce Stewart dons at cse.unsw.edu.au
Wed Jul 4 07:32:52 EDT 2007


drtomc:
> >Anyone trying to do any of this?
> 
> I've done some work in this area. I'm particularly interested in
> manipulating ASN.1 in haskell. Actually, my first use of Parsec was an
> ASN.1 parser. I'd done one previously in Spirit (the Boost C++ rip-off
> of parsec), but semantic actions were horrible in the extreme. Mmmm
> Parsec.
> 
> In the indexing system I'm currently building in Haskell for my day
> job, I'm serializing several data structures, and using Data.Bits and
> Data.ByteString heavily.
> 
> I was using HaXml, but I found it was very slow. So instead, I'm using
> an internal (within the indexing system) representation that is more
> akin to WBXML:
> 
> import Data.ByteString as ByteString
> import Data.List as List
> import Data.Sequence as Seq
> 
> data DocTree
>    = DocElem ByteString [(ByteString,ByteString)] [DocTree]
>    | DocText ByteString
> 
> serialize tree = ByteString.concat $ Seq.toList $ execState
> (serialize' tree) Seq.empty
> serialize' (DocText txt) = do
>    stuff <- get
>    put (stuff |> pack [0])
>    putStr txt
> serialize' (DocElem name attrs kids) = do
>    stuff <- get
>    put (stuff |> pack [1])
>    putStr name
>    putNum (List.length attrs)
>    mapM_ (putPair putStr putStr) attrs
>    putNum (List.length kids)
>    mapM_ serialize' kids
> 
> putStr ....
> 
> You get the idea. Actually, the *real* code is trickier - it grovels
> first to find all the element names and numbers them. Likewise with
> attribute names (per element). The extra grovel is well worth it - it
> takes a little longer to serialize, but is more compact and
> deserializes quicker.
> 
> Also worth noting - whether you compile a dictionary of element names
> or not, the result is much much much more space efficient than using
> HaXml, since it can all be decoded out of a single ByteString
> containing the document tree, with no actual string copying at all.
> That's the kind of [de]serialization I like. :-) Mind you, I still
> have to use HaXml when I first read documents into the system, and a
> very nice job it does too.

Can we do a cheap bytestring binding to libxml, to avoid any initial
String processing?

-- Don


More information about the Haskell-Cafe mailing list