[Haskell-cafe] Parsing words with parsec

paolino paolo.veronelli at gmail.com
Thu Mar 29 23:43:34 EDT 2007

        I had a bad time trying to parse the words of a text.
I suspect I miss some parsec knowledge.

In the end it seems working, though I haven't tested much and this example 
contains the main features I was looking.

*Main> parseTest (parseLine eof) "paolo at gmail sara,mimmo! 9ab a9b ab9 cd\n"
["paolo at gmail","sara","mimmo","cd"]

manyTillT body terminator joiner = liftM2 joiner (manyTill body (lookAhead  
terminator)) terminator

wordChar = letter <|> oneOf "_@" <?> "a valid word character"

nonSeparator = wordChar <|> digit

wordEnd = do 
             x <- wordChar
             notFollowedBy nonSeparator
             return x

word = manyTillT wordChar (try wordEnd) (\b t -> b ++ [t]) <?> "a word"

wordStart = do 
               (try nonSeparator >> unexpected "non separator") <|> anyChar
               lookAhead wordChar

nextWord =  manyTill anyChar (try wordStart) >> (try word <|> nextWord)

parseLine end = do 
                   f <- option [] $ return `fmap` try word
                   r <- many $ try nextWord
                   manyTill anyChar end
                   return (f ++ r)               


Any comment to simplify this code is welcome.


More information about the Haskell-Cafe mailing list