[Haskell-cafe] please comment on my parser, can I do this cleaner?

Daniel Fischer daniel.is.fischer at web.de
Tue Jun 9 15:43:26 EDT 2009


Am Dienstag 09 Juni 2009 20:29:09 schrieb Thomas Hartman:
> All I want to do is split on commas, but not the commas inside () or <>
> tags.
>
> I have been wanting to master parsec for a long time and this simple
> exercise looked like a good place to start.
>
> The code below does the right thing. Am I missing any tricks to make
> this simpler/neater?
>
> Thanks, thomas.
>
> thartman at ubuntu:~/perlArena>cat splitEm.
> splitEm.hs   splitEm.hs~  splitEm.pl   splitEm.pl~
> thartman at ubuntu:~/perlArena>cat splitEm.hs
> {-# LANGUAGE ScopedTypeVariables #-}
> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Char
> import Text.PrettyPrint (vcat, render, text)
> import Data.List.Split hiding (sepBy, chunk)
> import Text.ParserCombinators.Parsec.Token
>
> import Debug.Trace
> import Debug.Trace.Helpers
>
> -- this works, but is there a neater/cleaner way?
> main = ripInputsXs (toEof splitter) "splitter" [ goodS, badS ]
>
> -- I need a way to split on commas, but not the commas inside '<>' or
> '()' characters
> goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
> badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
> -- the first < matches a ), so reject this
>
>
> splitter = do
>   chunks :: [String] <- toEof (many chunk)
>   let pieces = map concat $ splitOn [","] chunks
>   return pieces -- chunks
>   where
>     atom = string ","
>            <|> ( many1 $ noneOf "()<>" )
>     chunk = parenExpr <|>  atom

I think that does not do what you want.

For input "FOO,BAR,BAZ", chunks is ["FOO,BAR,BAZ"], that won't be split; as far as I can 
see, it splits only on commas directly following a parenExpr (or at the beginning of the 
input or directly following another splitting comma).

>     parenExpr :: GenParser Char st [Char]
>     parenExpr = let paren p = betweenInc (char '(' ) (char ')' ) p
>                                 <|> betweenInc (char '<' ) (char '>' )
> p
>                 in paren $ option "" $ do ps <- many1 $ parenExpr <|> atom
>                                           return . concat $ ps
>
> betweenInc o' c' p' = do
>   o <- o'
>   p <- p'
>   c <- c'
>   return $ [o] ++ p ++ [c]
>
> toEof p' = do
>         r <- p'
>         eof
>         return r
>
>
>
>
>
>
> ripInputs prs prsName xs = mapM_ (putStrLn . show . parse prs prsName ) xs
> ripInputsXs prs prsName xs = mapM_ (putStrLn . showXs . parse prs prsName )
> xs where showXs v = case v of
>           Left e -> show e
>           Right xs -> render . vcat . map text $ xs

I can offer (sorry for the names, and I don't know if what that does is really what you 
want):


keepSepBy :: Parser a -> Parser a -> Parser [a]
keepSepBy p sep = (do
    r <- p
    (do s <- sep
        xs <- keepSepBy p sep
        return (r:s:xs)) <|> return [r])
    <|> return []

twain :: Parser a -> Parser a -> Parser [a] -> Parser [a]
twain open close list = do
    o <- open
    l <- list
    c <- close
    return (o:l++[c])

comma :: Parser String
comma = string ","

simpleChar :: Parser Char
simpleChar = noneOf "<>(),"

suite :: Parser String
suite = many1 simpleChar

atom :: Parser String
atom = fmap concat $ many1 (parenExp <|> suite)

parenGroup :: Parser String
parenGroup = fmap concat $ keepSepBy atom comma

parenExp :: Parser String
parenExp = twain (char '<') (char '>') parenGroup
            <|> twain (char '(') (char ')') parenGroup

chunks :: Parser [String]
chunks = sepBy atom comma

splitter = do
    cs <- chunks
    eof
    return cs

goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"

goodRes = parse splitter "splitter" goodS
badRes = parse splitter "splitter" badS




More information about the Haskell-Cafe mailing list