[Haskell-cafe] ANN: vxml (validating xml lib) - proof of concept -
need some guidance - bad type level performance (ghc)
Marc Weber
marco-oweber at gmx.de
Fri Aug 29 21:58:46 EDT 2008
Hi @ll.
This shouldn't have been an announce yet.. it's more crying for help to
get to know how / wether to continue :-)
I had an idea: a xml generating library validating the result against a
given dtd (to be used in web frameworks etc .. )
After one week of coding I got the result
git clone git://mawercer.de/vxml (1) (see README)
git hash as of writing : 4dc53
A minimal example looks like this (taken from test.hs):
import Text.XML.Validated.TH
$( dtdToTypes "dtds/xhtml1-20020801/DTD/xhtml1-strict_onefile.dtd"
(XmlIds (Just "-//W3C//DTD XHTML 1.0 Strict//EN")
(Just "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd") ) )
main = putStrLn $ xml $ ((html << (head << (title <<< "vxml hello world")))
<< (body << (div <<< "PCDATA text" )) )
Which takes about 7 secs to compile on my (fast) 2.5 Ghz machine.
Before Oleg telling me about how to use the TypeEq implementation found
in HList it took several hours (switch the cabal configure flag to see
experience the incredible slow TypeToNat implementation)
However type checking the simple small document (2) already takes 35
seconds, duplicating the tags within the body raises compilation
time to 74 seconds. (Have a look at test/XmlToQ.hs to automatically
create the code generating such a given xml file (its used in
run-testcases.hs)
The dtd representation used is very easy to understand:
(3a) shows allowed subelements of the head tag
Seq = sequence
Star = zero or more
...
(3b)
shows the list of allowed attributes
for a more commonly used tag such as div the element list (3a) is 324 lines
long and there are approximately 15 attributes.
Use a line such as
putStrLn $ xml $ debugEl div
to get this view.
All the work is done by
class Consume st el r | st el -> r -- result is on of C,CS,R,F
class Retry elType st el st' | st el -> st'
st = state as given in (3a)
el = child to be added (Elem Html_T) or PCDATA
C = element consumed, end
CS a = element consumed, continue with state a
R a = retry with given state (can happen after removing a Star = ()*
on a no match )
F a = no match, show failure a
instance Retry elType C el C
instance Retry elType (CS st) el st
instance ( -- retry
Consume st el r
, Retry elType r el st'
) => Retry elType (R st) el st'
Maybe you knowing much more about ghc internals have some more
ideas how to optimize? I only came up with
a) implement
xml HTrue $ xmldoc -- type checking variant
xml HFalse $ xmldoc -- unchecked but faster variant
so you can run xml HTrue once a day only while having lunch..
( doubt Maybe you have to take lunch two or more times on intermediate
web projects .. )
b) change (A attrType) to attrType only
and (Elem e) to e
which does not work for the same reason as
class TypeEq a b c | a b -> c
instance TypeEq a a HTrue
instance TypeEq a b HFalse
doesn't.
Of course a a will match a b as well, but the result is totally
different..
So would it be possible to either tell ghc to ignore this happily
taking the result of the better matching instance?
Or to explicitely tell ghc when an instance matches the way it does in
current implementations make it a no match if some constraints can or
cannot be satisfied such as this:
class TypeEq a b c | a b -> c
instance TypeEq a a HTrue
instance [ NoMatchOn a b -- if this can be satisfied ignore this instance
] => TypeEq a b HFalse
class NoMatchOn a b
instance NoMatchOn a a
c) Having a reduction rule such as this:
If the left side of (Or a b) returnns C (consumed) tell ghc to not
evaluate the result of (Consume b element result) which must be done
to see wether there is closer instance match (Is this correct ?)
Of course not everything is implemented yet.. but except speed I
consider it beeing quite usable.
He, thanks for reading till the end. Any feedback and suggestions
are appreciated. If you have any trouble contact me on irc.
I've been using
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.2
Sincerly
Marc Weber
(1) If you are interested and either don't have git or are not familiar
using it I can send you a tarball.
(2)
<!DOCTYPE html SYSTEM "/tmp/dtd.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>Example 6 - XHTML 1.0 Strict as application/xhtml+xml</title>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>Example 6 - XHTML 1.0 Strict as application/xhtml+xml</h1>
<p>
This document is valid XHTML 1.0 Strict served as
<code>application/xhtml+xml</code>.
</p>
<p>
This document references CSS rules contained in an external
stylesheet via <code>link</code>.
</p>
<p>
Note how the CSS rules for the background are applied in Netscape 7.x,
Mozilla, Opera 7 but that Internet Explorer can not display the page at all.
</p>
<p>
<a href="http://validator.w3.org/check/referer"><img
src="http://www.w3.org/Icons/valid-xhtml10"
alt="Valid XHTML 1.0!" height="31" width="88" /></a>
</p>
</body>
</html>
(3a)
(Seq
(Star
(Or
(Elem Script_T)
(Or
(Elem Style_T)
(Or (Elem Meta_T) (Or (Elem Link_T) (Elem Object_T))))))
(Or
(Seq
(Elem Title_T)
(Seq
(Star
(Or
(Elem Script_T)
(Or
(Elem Style_T)
(Or
(Elem Meta_T)
(Or (Elem Link_T) (Elem Object_T))))))
(Query
(Seq
(Elem Base_T)
(Star
(Or
(Elem Script_T)
(Or
(Elem Style_T)
(Or
(Elem Meta_T)
(Or
(Elem Link_T) (Elem Object_T))))))))))
(Seq
(Elem Base_T)
(Seq
(Star
(Or
(Elem Script_T)
(Or
(Elem Style_T)
(Or
(Elem Meta_T)
(Or (Elem Link_T) (Elem Object_T))))))
(Seq
(Elem Title_T)
(Star
(Or
(Elem Script_T)
(Or
(Elem Style_T)
(Or
(Elem Meta_T)
(Or (Elem Link_T) (Elem Object_T)))))))))))
(3b)
(HCons (A Lang_A)
(HCons (A Xml:lang_A)
(HCons (A Dir_A)
(HCons (A Id_A)
(HCons (A Profile_A) HNil)))))
More information about the Haskell-Cafe
mailing list