[Haskell] ANNOUNCE: Haskell XML Toolbox Version 4.00

Graham Klyne GK at ninebynine.org
Wed Apr 7 19:48:49 EDT 2004


At 11:37 06/04/04 +0200, Uwe Schmidt wrote:
>Haskell XML Toolbox 4.00
>
>I would like to announce a new version of  the Haskell XML Toolbox for
>processing XML including a validating parser and a new XPath module.

Testing the new kit using Hugs under Windows, with test program 
Examples/HUnit/HUnitExample.hs.  Because I'm starting from a clean 
distribution, I'm hoping that this will show up all of the Windows/Hugs 
incompatibilities that I may have missed reporting previously.

This message is constructed in a stream-of-consciousness style, as I wanted 
to capture every step I took to get the code running.  (I *did* get the 
code running with Hugs under Windows, with just a little butchery ;-)

My first step was to unpack the distribution kit into a directory:
   D:\Cvs\DEV\HaskellUtils\HXmlToolbox-4.00

I then created a batch file to run Hugs from the Examples directory:
[[
rem Use Hugs version compiled with Unicode support

set HUGS=C:\DEV\Hugs98\hugs-20040109.exe
set LIBS=..\hdom;..\hparser;..\hvalidator;..\hxpath;..\http;..\popen

%HUGS% -P"%LIBS%;"

PAUSE
]]

(I should check if the HXML Toolbox requires Haskell support for 'char's 
with codepoints >255.  Later.)

...

[[
Reading file "..\http\HTTP.hs":
Parsing
ERROR "..\http\HTTP.hs":132 - Syntax error in import declaration 
(unexpected symbol "partition")
Network.Socket>
]]

import Data.List ({- isPrefixOf -},partition,elemIndex) -- uwe
changed to:
import Data.List ({- isPrefixOf, -} partition,elemIndex)        -- uwe

...

I experienced a problem with http.hs, because my version of Network.URI has 
a different internal structure.  It's not clear to me why there's a need to 
force a '/' at the front of the path (i.e. if a relative path is supplied 
here, isn't it an error in any case?):
[[
             alt_uri = show $ if null (path u) || head (path u) /= '/'
                         then u { uriPath = '/' : path u }
                         else u
]]
(this is modified to use uriPath, so it works with my Network.URI)

In the longer run, I'm hoping that the HTTP module will become part of the 
Network library module, based on work by Bjorn Bringert.

...

[[
Reading file "..\http\Browser.hs":
Parsing
ERROR "..\http\Browser.hs":70 - Syntax error in import declaration 
(unexpected comma)
HTTP>
]]

import Control.Monad (foldM,filterM, {- liftMf -},when)
changed to:
import Control.Monad (foldM,filterM, {- liftMf, -} when)

...

I also had a problem with functions trimHost and formToRequest in 
Browser.hs and my URI module, similar to that noted above.  Again, I hope 
this problem can be avoided when http is part of the common library.

...

[[
Reading file "..\hparser\ProtocolHandlerHttpCurl.hs":
Parsing
ERROR "..\hparser\ProtocolHandlerHttpCurl.hs" - Can't find imported module 
"Posix"
ProtocolHandlerHttpNative>
]]

This is the old well-known POpen problem.  I've copied my version of POpen 
into the HXMLtoolbox code tree (it compiles, but still doesn't work), and 
changed the import Posix(popen) to import POPen(popen).

...

With the above changes, the code runs but reports 8 failures:
[[
Cases: 110  Tried: 110  Errors: 0  Failures: 8
Counts{cases=110,tried=110,errors=0,failures=8}
]]

Failures are:
[[
fatal error: illegal URI for input: "mini1.xml"
  :
### Failure in: 3:input tests:3
expected: "<x/>"
but got: ""
Cases: 110  Tried: 36  Errors: 0  Failures: 1
fatal error: illegal URI for input: "mini1.xml"
### Failure in: 3:input tests:4
expected: "0"
but got: "3"
Cases: 110  Tried: 37  Errors: 0  Failures: 2
fatal error: illegal URI for input: "mini1.xml"
  :
### Failure in: 3:input tests:5
expected: "4"
but got: "0"
Cases: 110  Tried: 38  Errors: 0  Failures: 3
fatal error: illegal URI for input: "mini2.xml"
### Failure in: 3:input tests:6
expected: "0"
but got: "3"
Cases: 110  Tried: 39  Errors: 0  Failures: 4
fatal error: illegal URI for input: "notThere.xml"
  :
fatal error: illegal URI for input: "mini2.xml"
  :
### Failure in: 4:document transformation tests:0
expected: " ignore unknown important important "
but got: ""
Cases: 110  Tried: 42  Errors: 0  Failures: 5
fatal error: illegal URI for input: "mini2.xml"
  :
### Failure in: 4:document transformation tests:1
expected: "ignoreunknownimportantimportant"
but got: ""
Cases: 110  Tried: 43  Errors: 0  Failures: 6
fatal error: illegal URI for input: "mini2.xml"
  :
### Failure in: 4:document transformation tests:2
expected: "   important important "
but got: ""
Cases: 110  Tried: 44  Errors: 0  Failures: 7
fatal error: illegal URI for input: "mini2.xml"
  :
### Failure in: 4:document transformation tests:3
expected: " ignore  important  "
but got: ""
Cases: 110  Tried: 110  Errors: 0  Failures: 8
Counts{cases=110,tried=110,errors=0,failures=8}

Main>
]]

Thus it looks as if all of the failures are due to an invalid URI for a 
local file.   This is a re-run of an old problem, traced back to:
[[
setDefaultURI   :: XState state ()
setDefaultURI
     = do
       wd <- io getCurrentDirectory
       setSysParam transferDefaultURI ("file://localhost" ++ wd ++ "/")
]]
-- XmlInput.hs

The value returned by getCurrentDirectory on Windows is not valid as part 
of a URI.  I previously fixed this by replacing the above with the following:

[[
{- original:
setDefaultURI   :: XState state ()
setDefaultURI
     = do
       wd <- io getCurrentDirectory
       setSysParam transferDefaultURI ("file://localhost" ++ wd ++ "/")
-}

--  Revised version to allow Windows directory strings.   [[[GK]]]
--
--  If the current directory starts with 'd:', it is assumed to be a Windows
--  directory, and all '\' characters are mapped to '/'.
--
--  In any case, any non-URI or reserved character is escaped.

setDefaultURI   :: XState state ()
setDefaultURI
     = do
       wd <- io getCurrentDirectory
       let wd1 = case wd of
               d:':':_ | driveLetter d -> '/':concatMap win32ToUriChar wd
               otherwise               ->     concatMap escapeNonUriChar wd
       setSysParam transferDefaultURI ("file://localhost" ++ wd1 ++ "/")
       -- [[[I'd prefer to leave 'localhost' as null, but this raises
       --    awkward questions about whether it's OK to remove a
       --    null authority from a URI]]]
       where
         win32ToUriChar '\\' = "/"
         win32ToUriChar c    = escapeNonUriChar c
         escapeNonUriChar c  = escapeChar isUnescapedInURI c   -- from 
Network.URI

driveLetter d       = d `elem` ['A'..'Z']

-- to test:
--   run () $ do { uri <- getDefaultURI ; io (putStrLn uri) }

{-  Excerpt from:
     http://www.ietf.org/internet-drafts/draft-hoffman-rfc1738bis-01.txt

2.7 FILES

The file URL scheme is used to designate files accessible on a
particular host computer. This scheme, unlike most other URL schemes,
does not designate a resource that is universally accessible over the
Internet.

A file URL takes the form:

    file://<host>/<path>

where <host> is the fully qualified domain name of the system on
which the <path> is accessible, and <path> is a hierarchical
directory path of the form <directory>/<directory>/.../<name>.

As a special case, <host> can be the string "localhost" or the empty
string; this is interpreted as "the machine from which the URL is
being interpreted". However, this part of the syntax has been
ignored on many systems. That is, for some systems, the following
are considered equal, while on others they are not:

    file://localhost/path/to/file.txt
    file:///path/to/file.txt

Some systems allow URLs to point to directories. In this case, there
is usually (but not always) a terminating "/" character, such as
in:

    file://usr/local/bin/

On systems running some versions of Microsoft Windows, the local drive
specification is preceded by a "/" character. Thus, for a file called
"example.ini" in the "windows" directory on the "c:" drive, the URL
would be:

    file:///c:/windows/example.ini

For Windows shares, there is an additional "/" prepended to the name.
Thus, the file "example.doc" on the shared directory "department" would
have the URL:

    file:////department/example.doc

The file URL scheme is unusual in that it does not specify an
Internet protocol or access method for such files; as such, its
utility in network protocols between hosts is limited.
-}
]]

I also needed to add an import to XmlInput.hs:
[[
import Network.URI
     ( isUnescapedInURI, escapeChar
     )
]]

...

I also find I must modify ProtocolHandlerFile.hs to strip off the leading 
'/' from the path to work on Windows.  The following change is made to 
function getFileContents:
[[
{- Original:
     source      = path uri
     readErr msg = addFatal msg n
-}
     -- [[[GK]]] strip off leading '/' from Windows drive name
     source      = fileuripath (path uri)
     fileuripath ('/':file@(d:':':more)) | driveLetter d = file
     fileuripath file = file
     driveLetter d = d `elem` ['A'..'Z']
     readErr msg = addFatal msg n
]]

...

Finally, I must copy mini1.xml and mini2.xml into the directory from which 
I'm running Hugs (i.e. HXmlToolbox-4.00/examples), and the test program 
runs with no errors:

[[
Cases: 110  Tried: 110  Errors: 0  Failures: 0
Counts{cases=110,tried=110,errors=0,failures=0}
]]

...

Well, that wasn't too hard, second time around.  Maybe we should abstract 
out the URI<->filename mapping logic into another module that is 
specifically to handle the file: URI scheme.  Dopes it make sense to add 
such a module to the network branch of the library hierarchy, to sit 
alongside the HTTP modules?

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact



More information about the Libraries mailing list