Compiling HXML toolbox under Hugs/Windows

Uwe Schmidt uwe at fh-wedel.de
Fri Jan 16 16:12:38 EST 2004


Graham Klyne wrote:

> I've been trying to compile the HXML toolbox, version 3.01
> (http://www.fh-wedel.de/~si/HXmlToolbox/HXmlToolbox-3.01.tar.gz), using the
> experimental Unicode version of Hugs, and have encountered a few source
> code problems that I think are maybe not specific to Hugs.
>
> (1) hdom/xmltreefilter.hs, incorrect section syntax:
> line 555, changed to:
> del1Attr an = processAttrl ((none `when` isAttr an) $$)
> line 613, changed to:
>      = processAttrl ((modifyValue `when` isAttr an) $$)

we've changed this in our development version

> (2) hparser/unicode.hs, questionable range of unicode characters in:
> [[
> -- |
> -- test for a legal multi byte XML char
>
> isMultiByteXmlChar      :: Unicode -> Bool
> isMultiByteXmlChar i
>      = ( i >= '\x00000080' && i <= '\x0000D7FF' )
>
>        ( i >= '\x0000E000' && i <= '\x0000FFFD' )
>
>        ( i >= '\x00010000' && i <= '\x0010FFFF' )
> ]]
> Should that be \x0010FFFF or \x0010FFFD?  (The Hugs Unicode code had
> \x0010FFFD for the upper bound.)

the XML 1.0 Standard says:

"http://www.w3.org/TR/REC-xml#charsets"

2.2 Characters

[Definition: A parsed entity contains text, a sequence of characters, which 
may represent markup or character data.] [Definition: A character is an 
atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] (see also 
[ISO/IEC 10646-2000]). Legal characters are tab, carriage return, line feed, 
and the legal characters of Unicode and ISO/IEC 10646. The versions of these 
standards cited in A.1 Normative References were current at the time this 
document was prepared. New characters may be added to these standards by 
amendments or new editions. Consequently, XML processors must accept any 
character in the range specified for Char. The use of "compatibility 
characters", as defined in section 6.8 of [Unicode] (see also D21 in section 
3.6 of [Unicode3]), is discouraged.]

Character Range

[2] Char   ::=   
  #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */


>
> (3) hparser/xmlinput.hs, line 38:  spurious ','

removed

>
> (4) Missing module 'Socket'.
> [[
> Reading file "..\hparser\XmlInput.hs":
> Parsing
> ERROR "..\hparser\XmlInput.hs" - Can't find imported module "Socket"
> ]]
> Is this a GHC/Hugs library difference?  Should this be Network.Socket?  I
> tried using that and it seemed to be accepted.

we've never tried sockets with Hugs

>
> (5) Ditto for module URI.
>
> (6) I think a probem with the build instructions in README:
> [[
> Just add the modules from the directories "hdom", "hparser", "hvalidator",
> "hxpath", "http" and

sorry, and "popen"

> "parsec" to the path of your compiler or interpreter, the Makefile contains
> an example.
> It is planned to provide a GHC package of the Haskell XML Toolbox in the
> near future.
> An example ghci project file ".ghci" can be found in the examples
> directory. ]]
> does not mention directory popen.

does not mean popen in ghc lib, but popen source from Jens Petersen
for calling external program curl for better http support than with the http 
module
 
> (7) in POpen.hs:
> Module Posix should be Text.Regex.Posix?
>
> At this point, I get:
> [[
> Reading file "..\popen\POpen.hs":
> ERROR "..\popen\POpen.hs":39 - Undefined type constructor "ProcessID"
> ]]
> and give up chasing down this problem.  I'm wondering if the HXML toolbox
> library has been tested under MS Windows?

we do not develop under MS,
our internal coding rule for Haskell is:
use ghc without any glasgow extensions
and compile own modules with all warnings on
and no warnings detected. this is a moving target,
because ghc becomes better and better in emitting usefull warnings.
this rule does not apply to http, nor to popen nor to parsec modules,
there ghc detects a lot of warnings

the http access is a weak point with haskell, a portable library
that supports stable access, timeouts, proxies, cookies, ... like e.g. the 
curl library would remove a lot of the problems found here.

thanks for your hints

  uwe schmidt




More information about the Haskell mailing list