[Haskell] ANNOUNCE: Haskell XML Toolbox Version 4.00
Graham Klyne
GK at ninebynine.org
Wed Apr 7 19:48:49 EDT 2004
At 11:37 06/04/04 +0200, Uwe Schmidt wrote:
>Haskell XML Toolbox 4.00
>
>I would like to announce a new version of the Haskell XML Toolbox for
>processing XML including a validating parser and a new XPath module.
Testing the new kit using Hugs under Windows, with test program
Examples/HUnit/HUnitExample.hs. Because I'm starting from a clean
distribution, I'm hoping that this will show up all of the Windows/Hugs
incompatibilities that I may have missed reporting previously.
This message is constructed in a stream-of-consciousness style, as I wanted
to capture every step I took to get the code running. (I *did* get the
code running with Hugs under Windows, with just a little butchery ;-)
My first step was to unpack the distribution kit into a directory:
D:\Cvs\DEV\HaskellUtils\HXmlToolbox-4.00
I then created a batch file to run Hugs from the Examples directory:
[[
rem Use Hugs version compiled with Unicode support
set HUGS=C:\DEV\Hugs98\hugs-20040109.exe
set LIBS=..\hdom;..\hparser;..\hvalidator;..\hxpath;..\http;..\popen
%HUGS% -P"%LIBS%;"
PAUSE
]]
(I should check if the HXML Toolbox requires Haskell support for 'char's
with codepoints >255. Later.)
...
[[
Reading file "..\http\HTTP.hs":
Parsing
ERROR "..\http\HTTP.hs":132 - Syntax error in import declaration
(unexpected symbol "partition")
Network.Socket>
]]
import Data.List ({- isPrefixOf -},partition,elemIndex) -- uwe
changed to:
import Data.List ({- isPrefixOf, -} partition,elemIndex) -- uwe
...
I experienced a problem with http.hs, because my version of Network.URI has
a different internal structure. It's not clear to me why there's a need to
force a '/' at the front of the path (i.e. if a relative path is supplied
here, isn't it an error in any case?):
[[
alt_uri = show $ if null (path u) || head (path u) /= '/'
then u { uriPath = '/' : path u }
else u
]]
(this is modified to use uriPath, so it works with my Network.URI)
In the longer run, I'm hoping that the HTTP module will become part of the
Network library module, based on work by Bjorn Bringert.
...
[[
Reading file "..\http\Browser.hs":
Parsing
ERROR "..\http\Browser.hs":70 - Syntax error in import declaration
(unexpected comma)
HTTP>
]]
import Control.Monad (foldM,filterM, {- liftMf -},when)
changed to:
import Control.Monad (foldM,filterM, {- liftMf, -} when)
...
I also had a problem with functions trimHost and formToRequest in
Browser.hs and my URI module, similar to that noted above. Again, I hope
this problem can be avoided when http is part of the common library.
...
[[
Reading file "..\hparser\ProtocolHandlerHttpCurl.hs":
Parsing
ERROR "..\hparser\ProtocolHandlerHttpCurl.hs" - Can't find imported module
"Posix"
ProtocolHandlerHttpNative>
]]
This is the old well-known POpen problem. I've copied my version of POpen
into the HXMLtoolbox code tree (it compiles, but still doesn't work), and
changed the import Posix(popen) to import POPen(popen).
...
With the above changes, the code runs but reports 8 failures:
[[
Cases: 110 Tried: 110 Errors: 0 Failures: 8
Counts{cases=110,tried=110,errors=0,failures=8}
]]
Failures are:
[[
fatal error: illegal URI for input: "mini1.xml"
:
### Failure in: 3:input tests:3
expected: "<x/>"
but got: ""
Cases: 110 Tried: 36 Errors: 0 Failures: 1
fatal error: illegal URI for input: "mini1.xml"
### Failure in: 3:input tests:4
expected: "0"
but got: "3"
Cases: 110 Tried: 37 Errors: 0 Failures: 2
fatal error: illegal URI for input: "mini1.xml"
:
### Failure in: 3:input tests:5
expected: "4"
but got: "0"
Cases: 110 Tried: 38 Errors: 0 Failures: 3
fatal error: illegal URI for input: "mini2.xml"
### Failure in: 3:input tests:6
expected: "0"
but got: "3"
Cases: 110 Tried: 39 Errors: 0 Failures: 4
fatal error: illegal URI for input: "notThere.xml"
:
fatal error: illegal URI for input: "mini2.xml"
:
### Failure in: 4:document transformation tests:0
expected: " ignore unknown important important "
but got: ""
Cases: 110 Tried: 42 Errors: 0 Failures: 5
fatal error: illegal URI for input: "mini2.xml"
:
### Failure in: 4:document transformation tests:1
expected: "ignoreunknownimportantimportant"
but got: ""
Cases: 110 Tried: 43 Errors: 0 Failures: 6
fatal error: illegal URI for input: "mini2.xml"
:
### Failure in: 4:document transformation tests:2
expected: " important important "
but got: ""
Cases: 110 Tried: 44 Errors: 0 Failures: 7
fatal error: illegal URI for input: "mini2.xml"
:
### Failure in: 4:document transformation tests:3
expected: " ignore important "
but got: ""
Cases: 110 Tried: 110 Errors: 0 Failures: 8
Counts{cases=110,tried=110,errors=0,failures=8}
Main>
]]
Thus it looks as if all of the failures are due to an invalid URI for a
local file. This is a re-run of an old problem, traced back to:
[[
setDefaultURI :: XState state ()
setDefaultURI
= do
wd <- io getCurrentDirectory
setSysParam transferDefaultURI ("file://localhost" ++ wd ++ "/")
]]
-- XmlInput.hs
The value returned by getCurrentDirectory on Windows is not valid as part
of a URI. I previously fixed this by replacing the above with the following:
[[
{- original:
setDefaultURI :: XState state ()
setDefaultURI
= do
wd <- io getCurrentDirectory
setSysParam transferDefaultURI ("file://localhost" ++ wd ++ "/")
-}
-- Revised version to allow Windows directory strings. [[[GK]]]
--
-- If the current directory starts with 'd:', it is assumed to be a Windows
-- directory, and all '\' characters are mapped to '/'.
--
-- In any case, any non-URI or reserved character is escaped.
setDefaultURI :: XState state ()
setDefaultURI
= do
wd <- io getCurrentDirectory
let wd1 = case wd of
d:':':_ | driveLetter d -> '/':concatMap win32ToUriChar wd
otherwise -> concatMap escapeNonUriChar wd
setSysParam transferDefaultURI ("file://localhost" ++ wd1 ++ "/")
-- [[[I'd prefer to leave 'localhost' as null, but this raises
-- awkward questions about whether it's OK to remove a
-- null authority from a URI]]]
where
win32ToUriChar '\\' = "/"
win32ToUriChar c = escapeNonUriChar c
escapeNonUriChar c = escapeChar isUnescapedInURI c -- from
Network.URI
driveLetter d = d `elem` ['A'..'Z']
-- to test:
-- run () $ do { uri <- getDefaultURI ; io (putStrLn uri) }
{- Excerpt from:
http://www.ietf.org/internet-drafts/draft-hoffman-rfc1738bis-01.txt
2.7 FILES
The file URL scheme is used to designate files accessible on a
particular host computer. This scheme, unlike most other URL schemes,
does not designate a resource that is universally accessible over the
Internet.
A file URL takes the form:
file://<host>/<path>
where <host> is the fully qualified domain name of the system on
which the <path> is accessible, and <path> is a hierarchical
directory path of the form <directory>/<directory>/.../<name>.
As a special case, <host> can be the string "localhost" or the empty
string; this is interpreted as "the machine from which the URL is
being interpreted". However, this part of the syntax has been
ignored on many systems. That is, for some systems, the following
are considered equal, while on others they are not:
file://localhost/path/to/file.txt
file:///path/to/file.txt
Some systems allow URLs to point to directories. In this case, there
is usually (but not always) a terminating "/" character, such as
in:
file://usr/local/bin/
On systems running some versions of Microsoft Windows, the local drive
specification is preceded by a "/" character. Thus, for a file called
"example.ini" in the "windows" directory on the "c:" drive, the URL
would be:
file:///c:/windows/example.ini
For Windows shares, there is an additional "/" prepended to the name.
Thus, the file "example.doc" on the shared directory "department" would
have the URL:
file:////department/example.doc
The file URL scheme is unusual in that it does not specify an
Internet protocol or access method for such files; as such, its
utility in network protocols between hosts is limited.
-}
]]
I also needed to add an import to XmlInput.hs:
[[
import Network.URI
( isUnescapedInURI, escapeChar
)
]]
...
I also find I must modify ProtocolHandlerFile.hs to strip off the leading
'/' from the path to work on Windows. The following change is made to
function getFileContents:
[[
{- Original:
source = path uri
readErr msg = addFatal msg n
-}
-- [[[GK]]] strip off leading '/' from Windows drive name
source = fileuripath (path uri)
fileuripath ('/':file@(d:':':more)) | driveLetter d = file
fileuripath file = file
driveLetter d = d `elem` ['A'..'Z']
readErr msg = addFatal msg n
]]
...
Finally, I must copy mini1.xml and mini2.xml into the directory from which
I'm running Hugs (i.e. HXmlToolbox-4.00/examples), and the test program
runs with no errors:
[[
Cases: 110 Tried: 110 Errors: 0 Failures: 0
Counts{cases=110,tried=110,errors=0,failures=0}
]]
...
Well, that wasn't too hard, second time around. Maybe we should abstract
out the URI<->filename mapping logic into another module that is
specifically to handle the file: URI scheme. Dopes it make sense to add
such a module to the network branch of the library hierarchy, to sit
alongside the HTTP modules?
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
More information about the Libraries
mailing list