[Haskell-cafe] No copy XML parser (rough idea only)

John Millikin jmillikin at gmail.com
Fri May 14 11:57:42 EDT 2010


The primary problem I see with this is that XML content is
fundamentally text, not bytes. Using your types, two XML documents
with identical content but different encodings will have different
Haskell values (and thus be incorrect regarding Eq, Ord, etc).

Additionally, since the original bytestring is shared in your types,
potentially very large buffers could be locked in memory due to
references held by only a small portion of the document. Chopping a
document up into events or nodes creates some overhead due to the
extra pointers, but allows unneeded portions to be freed.

If you'd like memory-efficient text storage, using Bryan O'Sullivan's
"text" package[1] is probably the best option. It uses packed Word16
buffers to store text as UTF-16. Probably not as efficient as a type
backed by UTF-8, but it's much much better than String.

I know the data types you specified are just examples, but they're
leaving out some important XML features -- namespaces, entity
references, etc. Consider either reading the XML spec, or perhaps use
my package "xml-types"[2] as a starting point for designing your type
hierarchy.

[1] http://hackage.haskell.org/package/text
[2] http://hackage.haskell.org/package/xml-types


More information about the Haskell-Cafe mailing list