[Haskell-cafe] Re: Abstraction leak
Donald Bruce Stewart
dons at cse.unsw.edu.au
Wed Jul 4 07:32:52 EDT 2007
drtomc:
> >Anyone trying to do any of this?
>
> I've done some work in this area. I'm particularly interested in
> manipulating ASN.1 in haskell. Actually, my first use of Parsec was an
> ASN.1 parser. I'd done one previously in Spirit (the Boost C++ rip-off
> of parsec), but semantic actions were horrible in the extreme. Mmmm
> Parsec.
>
> In the indexing system I'm currently building in Haskell for my day
> job, I'm serializing several data structures, and using Data.Bits and
> Data.ByteString heavily.
>
> I was using HaXml, but I found it was very slow. So instead, I'm using
> an internal (within the indexing system) representation that is more
> akin to WBXML:
>
> import Data.ByteString as ByteString
> import Data.List as List
> import Data.Sequence as Seq
>
> data DocTree
> = DocElem ByteString [(ByteString,ByteString)] [DocTree]
> | DocText ByteString
>
> serialize tree = ByteString.concat $ Seq.toList $ execState
> (serialize' tree) Seq.empty
> serialize' (DocText txt) = do
> stuff <- get
> put (stuff |> pack [0])
> putStr txt
> serialize' (DocElem name attrs kids) = do
> stuff <- get
> put (stuff |> pack [1])
> putStr name
> putNum (List.length attrs)
> mapM_ (putPair putStr putStr) attrs
> putNum (List.length kids)
> mapM_ serialize' kids
>
> putStr ....
>
> You get the idea. Actually, the *real* code is trickier - it grovels
> first to find all the element names and numbers them. Likewise with
> attribute names (per element). The extra grovel is well worth it - it
> takes a little longer to serialize, but is more compact and
> deserializes quicker.
>
> Also worth noting - whether you compile a dictionary of element names
> or not, the result is much much much more space efficient than using
> HaXml, since it can all be decoded out of a single ByteString
> containing the document tree, with no actual string copying at all.
> That's the kind of [de]serialization I like. :-) Mind you, I still
> have to use HaXml when I first read documents into the system, and a
> very nice job it does too.
Can we do a cheap bytestring binding to libxml, to avoid any initial
String processing?
-- Don
More information about the Haskell-Cafe
mailing list