[Haskell-cafe] ANN: vxml (validating xml lib) - maybe usable now, quadratic compilation time ?

Marc Weber marco-oweber at gmx.de
Sun Aug 31 17:50:54 EDT 2008


Oleg has pointed me into the right direction:
He has suggested to use kind of 
  class AttrOk elTrype attr 
for attributes (result 35s -> 30s)
and do something similar with the state.
(30s -> 4,5s).
After doing this I was quite happy:
The 4,5s do include reading the dtd and generating about 1500  DecQ
declarations.
There might still be some duplication.. in state transformation steps.
However trying to typecheck the same file repeating the body 400 times
didn't end.. why?

Have a quick glance at the file data which shows quadratic behaviour.
That's bad.. I would like to have some linear behaviour.
Any ideas what is causing this? Is there a way to get linear
scalability?
However disabling all validation stuff (see cabal flag) no longer
improves performance much. The data below proofs that right now
validation increases compilation time by a factor 1.35- 1.45
(within the range of 5-30 replications of the body)

If you'd like to play with the library and give some feedback
I'd be happy. Read the README.
The benchpress dependency is only needed for this benchmark test.
You may want to remove it from the cabal file.
So don't try to compile 4000 lines long xml files or be prepared to wait
days.
I should expand the benchmark to also offer results for xhtml lib.

Sincerly
Marc Weber

============= compilation times with validation  ==============================================

body replication count | compilation time [ms]

1 4146.477
2 4292.153
3 4508.56
4 4654.244000000001
5 4788.195
6 5041.674999999999
7 5347.1140000000005
8 6134.5960000000005
9 6019.624000000001
10 6459.544
11 7054.433999999999
12 7614.197
13 8489.003
14 8529.610999999999
15 9271.491
16 10058.419
17 12290.142
18 13736.074999999999
19 14863.893
20 15944.82
21 17856.611999999997
22 17977.841
23 17686.297
24 19279.314
25 20960.785
26 22750.754
27 24407.506
28 26342.242
29 28423.79
30 30932.777000000002
31 48478.841
32 45609.897
33 40574.255
34 41220.062
35 43952.545999999995
36 47437.922
37 50584.12100000001
38 53983.848000000005
39 57935.593
=============  =======================================================
eg starting gnuplot
entering

f(x)=(x-b)**2*c+a
fit f(x) 'data' via a, b, c
plot 'data', f(x)

============= compilation times without validation ================================
body replication count | compilation time [ms]

1 3939.887 << Import.hs has been recompiled. thats why the first took longer then the next
2 3138.127
3 3080.317
4 3179.19
5 3279.339
6 3604.609
7 3736.829
8 4094.6179999999995
9 4252.377
10 4767.588
11 5070.188
12 5537.007
13 5840.085
14 6478.276
15 6916.67
16 7568.444
17 8301.047
18 9211.368999999999
19 10025.886
20 10872.749
21 12086.263
22 12975.372
23 14366.282
24 15640.214
=============  =======================================================


More information about the Haskell-Cafe mailing list