[Haskell-cafe] Need help with Parsec

Sterling Clover s.clover at gmail.com
Sun Jan 20 12:24:57 EST 2008


Here's a handy simple function I've found very useful. You'll  
obviously also need to import Debug.Trace:

pTrace s = pt <|> return ()
     where pt = try $
                do
                  x <- try $ many1 anyChar
                  trace (s++": " ++x) $ try $ char 'z'
                  fail x

It could perhaps be cleaner, but it does the job for me fine. Just  
insert a line like pTrace "label" anywhere in your parsing functions  
and whenever parsec hits that line you get a nice line of output:  
"label: <rest of string to be parsed>" This tends to help track down  
just where your code goes wrong. Try works like it should in my  
experience, but that doesn't necessarily mean it works how you expect.

Regards,
s

On Jan 20, 2008, at 12:12 PM, Nicu Ionita wrote:

> Hi,
>
> I'm playing since a few hours with Parsec and trying to write a  
> small html
> (fragment) parser, but I'm stuck in a point which I really can't  
> understand.
>
> The problem seem to be either in "parseProperCont" or in  
> "closing" (see code
> below). It looks like closing does not work (but it is a very simple
> function!) or (also hard to believe) function "try" from Parsec has  
> some
> problems.
>
> Anyway I get this answer:
>
> Prelude ParseSHtml> pf parseHtmlFrg "ptest.txt"
> Left "ptest.txt" (line 5, column 2):
> unexpected "/"
> expecting element name
>
> when I'm parsing this file:
>
> <div id="normtext">
> one line with break<br />
> another line <br /><br />
> Mail: <a href="mailto:user at dom.at">user at dom.at</a>
> </div>
>
> with this code (sorry for the longer mail):
>
> import Text.ParserCombinators.Parsec hiding (label)
> import Text.XHtml.Strict
>
> -- Helper function: parse a string up to one of the given chars
> upTo :: [Char] -> Parser [Char]
> upTo ds = many1 (noneOf ds)
>
> parseHtmlFrg :: Parser Html
> parseHtmlFrg = do many space
>                   choice [parseElem, parseText]
>                <?> "html fragment"
>
> parseElem :: Parser Html
> parseElem = do en <- parseElTag
>                many1 space
>                (ats, cnt) <- restElem en
>                return $ compElem en cnt ! ats
>             <?> "html element"
>
> -- Compose a html element from tag name and content
> compElem en cnt = if isNoHtml cnt then itag en else tag en cnt
>
> parseElTag :: Parser String
> parseElTag = do char '<'
>                 en <- elemName
>                 return en
>              <?> "element tag"
>
> elemName :: Parser String
> elemName = many1 lower <?> "element name"
>
> restElem :: String -> Parser ([HtmlAttr], Html)
> restElem nm = do ats <- parseAttList
>                  ht <- (restElNoCont <|> restElCont nm)
>                  return (ats, ht)
>               <?> ("> or /> to close the tag " ++ nm)
>
> -- Rest element with no content
> restElNoCont = do char '/'
>                   char '>'
>                   return noHtml
>                <?> "/>"
>
> -- Rest element with content
> restElCont nm = do char '>'
>                    many space
>                    els <- parseProperCont nm
>                    return $ concatHtml els
>                 <?> "element with content"
>
> -- Parse closing tag or proper content(s)
> parseProperCont :: String -> Parser [Html]
> parseProperCont nm = try (do closing nm
>                              return []
>                           )
>                      <|> (do h <- parseHtmlFrg
>                              hs <- parseProperCont nm
>                              return (h:hs)
>                           )
>                      -- <|> return []
>                      <?> "proper element content"
>
> closing nm = do char '<'
>                 char '/'
>                 nm1 <- elemName
>                 char '>'
>                 if nm1 == nm
>                    then return ()
>                    else fail $ nm ++ ", encountered " ++ nm1
>              <?> ("closing of " ++ nm)
>
> -- Parse a html attribute
> parseAttr :: Parser HtmlAttr
> parseAttr = do at <- many1 lower
>                char '='
>                va <- parseQuote
>                many space
>                return $ strAttr at va
>             <?> "Attribut"
> parseAttList = many1 parseAttr <|> return [] <?> "attribute list"
>
> -- Parse a quoted string
> parseQuote :: Parser String
> parseQuote = do char '"'
>                 cs <- upTo ['"']
>                 char '"'
>                 return cs
>
> -- Parse a text element
> parseText :: Parser Html
> parseText = do s <- upTo "<"
>                return (stringToHtml s)
>             <?> "some text"
>
> -- For tests:
> pf p file = parseFromFile p file
>
> Nicu
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list