[Haskell-cafe] ANN: dtd-text DTD parser, V0.1.0.0

Yitzchak Gale gale at sefer.org
Sun Jun 5 19:13:31 CEST 2011

The dtd-text package[1] provides a parser for XML DTDs. It implements
most of the parts of the W3C XML specification relating to DTDs,
and is compatible with versions 1.0 and 1.1 of the specification.[2]

The result of the parse is a Haskell DTD object from the
dtd-types[3] package. This first preliminary version of dtd-text,
version, requires at least version of dtd-types.


  -- Parse a DTD from a Data.Text.Lazy:
  dtdParse :: L.Text -> DTD

That should usually be all you need.

  -- Or, for advanced users, if the DTD contains external
  -- parameter entities and you want to supply their values:
  dtdParseWithExtern :: SymTable -> L.Text -> DTD

  -- where type SymTable = M.Map Text L.Text

I really should have edited the Cabal description of this package
before I uploaded it. It promises an attoparsec-text parser
and blaze-builder renderer for DTDs. First of all, the renderer
is vaporware - I haven't written it yet. Just the parser was quite
a bit of work, so I decided to release it before even starting on
the renderer.

Second, although dtd-text does use attoparsec-text, and does
export parsers for all of the significant components of a DTD,
those parsers are of limited usefulness on their own. It turns out
that in order to support the full algorithm specified in the spec for
parameter entity resolution, which is rather imperative in nature,
two layers of parsing are necessary. So the dtd-text package also
has some internal plumbing so that it can present a simple interface.

This is a very preliminary alpha release. All I can say so far is that
it compiles on my machine (GHC 7.0.2 on 64 bit Linux), and that
I tested it against a huge, extremely complicated DTD, and it seems
to have done the RIght Thing. Since there are likely to be bugs that
I will need to fix soon, I will wait until then to fix the package

More about external parameter entities, for advanced users:

As mentioned above, this parser does not attempt to go out
and fetch the values of external references for you from files
and URLs. If you need to extract information from the DTD before
you fetch them yourself, such as system IDs and public IDs,
you might be able to get them by applying parseDTD to all or
part of the DTD as an initial parse. The parser tries very hard
to give partial results when things are missing, while still doing
its best to avoid problems like looping references. So if your DTD
has many deeply intertwined external parameter entities, this
parser may not be very practical for you; on the other hand,
I personally have never seen such a DTD in the wild.

A final caveat: this version of dtd-text does not yet support
conditional sections.


[1] http://hackage.haskell.org/package/dtd-text
[2] http://www.w3.org/TR/2008/REC-xml-20081126/
[3] http://hackage.haskell.org/package/dtd-types

More information about the Haskell-Cafe mailing list