XMLParser
Ron de Bruijn
rondebruijn@yahoo.com
Tue, 29 Jul 2003 04:04:10 -0700 (PDT)
Hi there,
I used the PARSEC parser combinator library to create
an following XML-Parser. It's not exactly XML, but
it's a usefull format for saving various data.
The big difference there is, is that I don't have
attributes, but those could easily be added, although
I don't need them.
Below is the complete code (20! lines of code for the
core). I was excited when I saw it also worked like I
had imagined, but then I tested it sometimes with
increasingly big data.
Using my test function below I noticed that there's
some exponentional behaviour. I don't see a reason for
this. Although I use the try combinator, it should at
max double the time, because the parser uses the other
parser when there occurs a parse error in the one.
My definition of xmlParser looks also a bit odd, but
without at least one of the functions
of(subEnTop,topEnSub) it will fail parsing.
I also don't understand why.
Then I question myself: why does my program work with
the downloaded version, and not with the PARSEC
library that ships with GHC 5.0.4.2, I atleast thought
that Parsec hasn't changed for a long time?
Does anyone have an explanation for the above
problems?
Greets Ron
module XMLCreator where
--this module can parse XML
import Parsec
import GHC.Show
import System.Time
--datatype om XML te representeren: XML bestaat altijd
uit een topelement en een aantal nestingen
data XML a = TopElement String [XML a]
| SubElement String a deriving Show
type Level = Int
--not finished Pretty Printer
ppXML::XML a->Level->String
ppXML (TopElement name (xml:xmls)) level = concat(take
level (repeat " "))++openTag name
ppXML _ _ = "hallo"
--ppXML (NestedElement name info) level =
openTag::String->String
openTag s = "<"++s++">"
--run :: Show a => Parser a -> String -> IO ()
run p input= do printTijd
case (parse p "" input) of
Left err -> do{ putStr "parse error
at ";
print err
}
Right x -> do print x
printTijd
--prints time(used above for printing the start time
and the endtime of the algoritm
printTijd = do time<-getClockTime
(putStrLn.show) time
test n = run xmlParser (concat(replicate n str))
--test n = parseTest xmlParser (concat(replicate n
str))
str::String
str =
"<TopicList><TopicItem><Readed>1</Readed><ForumTitle>forumTitle</ForumTitle><TopicTitle>topicTitle</TopicTitle><TopicID>100100</TopicID><TopicStarter>someOne</TopicStarter><LastReplier>someOtherOne</LastReplier><ReplierID>1234</ReplierID><MessageID>messageID</MessageID><SubForumTitle>PW</SubForumTitle><SubForumID>14</SubForumID><Replies>2</Replies><ReplyTime>1050767736</ReplyTime><StartTime>1050762676</StartTime></TopicItem><FileInfo><Version>5</Version><TimeCreated>1059155053</TimeCreated><Application>SomeProgram</Application><About>aboutText</About></FileInfo></TopicList>
";
--determine which sign can not be in the elementname
or in the informationpiece
signs = noneOf ['<','>']
--parses a piece of xml
xmlParser::Parser ([XML String])
xmlParser = choice [try (many1 sub),try topEnSub,try
subEnTop, many1 top]
--xmlParser = choice[try(many1 sub),many1 top]
getName::Parser String
getName = do teken '<'
name<-many1 signs
teken '>'
return name
where teken t = do skipMany space
char t
skipMany space
end::String->Parser String
end name = do skipMany space
string ("</"++name++">")
sub::Parser (XML String)
sub = do name<-getName
info<-many1 signs
end name
return (SubElement name info)
top::Parser (XML String)
top = do name<-getName
moreXml<-xmlParser
end name
return (TopElement name moreXml)
subEnTop::Parser [(XML String)]
subEnTop = do x<-many1 sub
y<-many1 top
return (x++y)
topEnSub::Parser [(XML String)]
topEnSub = do x<-many1 top
y<-many1 sub
return (x++y)
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com