[Haskell-cafe] please comment on my parser, can I do this cleaner?
Daniel Fischer
daniel.is.fischer at web.de
Tue Jun 9 15:43:26 EDT 2009
Am Dienstag 09 Juni 2009 20:29:09 schrieb Thomas Hartman:
> All I want to do is split on commas, but not the commas inside () or <>
> tags.
>
> I have been wanting to master parsec for a long time and this simple
> exercise looked like a good place to start.
>
> The code below does the right thing. Am I missing any tricks to make
> this simpler/neater?
>
> Thanks, thomas.
>
> thartman at ubuntu:~/perlArena>cat splitEm.
> splitEm.hs splitEm.hs~ splitEm.pl splitEm.pl~
> thartman at ubuntu:~/perlArena>cat splitEm.hs
> {-# LANGUAGE ScopedTypeVariables #-}
> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Char
> import Text.PrettyPrint (vcat, render, text)
> import Data.List.Split hiding (sepBy, chunk)
> import Text.ParserCombinators.Parsec.Token
>
> import Debug.Trace
> import Debug.Trace.Helpers
>
> -- this works, but is there a neater/cleaner way?
> main = ripInputsXs (toEof splitter) "splitter" [ goodS, badS ]
>
> -- I need a way to split on commas, but not the commas inside '<>' or
> '()' characters
> goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
> badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
> -- the first < matches a ), so reject this
>
>
> splitter = do
> chunks :: [String] <- toEof (many chunk)
> let pieces = map concat $ splitOn [","] chunks
> return pieces -- chunks
> where
> atom = string ","
> <|> ( many1 $ noneOf "()<>" )
> chunk = parenExpr <|> atom
I think that does not do what you want.
For input "FOO,BAR,BAZ", chunks is ["FOO,BAR,BAZ"], that won't be split; as far as I can
see, it splits only on commas directly following a parenExpr (or at the beginning of the
input or directly following another splitting comma).
> parenExpr :: GenParser Char st [Char]
> parenExpr = let paren p = betweenInc (char '(' ) (char ')' ) p
> <|> betweenInc (char '<' ) (char '>' )
> p
> in paren $ option "" $ do ps <- many1 $ parenExpr <|> atom
> return . concat $ ps
>
> betweenInc o' c' p' = do
> o <- o'
> p <- p'
> c <- c'
> return $ [o] ++ p ++ [c]
>
> toEof p' = do
> r <- p'
> eof
> return r
>
>
>
>
>
>
> ripInputs prs prsName xs = mapM_ (putStrLn . show . parse prs prsName ) xs
> ripInputsXs prs prsName xs = mapM_ (putStrLn . showXs . parse prs prsName )
> xs where showXs v = case v of
> Left e -> show e
> Right xs -> render . vcat . map text $ xs
I can offer (sorry for the names, and I don't know if what that does is really what you
want):
keepSepBy :: Parser a -> Parser a -> Parser [a]
keepSepBy p sep = (do
r <- p
(do s <- sep
xs <- keepSepBy p sep
return (r:s:xs)) <|> return [r])
<|> return []
twain :: Parser a -> Parser a -> Parser [a] -> Parser [a]
twain open close list = do
o <- open
l <- list
c <- close
return (o:l++[c])
comma :: Parser String
comma = string ","
simpleChar :: Parser Char
simpleChar = noneOf "<>(),"
suite :: Parser String
suite = many1 simpleChar
atom :: Parser String
atom = fmap concat $ many1 (parenExp <|> suite)
parenGroup :: Parser String
parenGroup = fmap concat $ keepSepBy atom comma
parenExp :: Parser String
parenExp = twain (char '<') (char '>') parenGroup
<|> twain (char '(') (char ')') parenGroup
chunks :: Parser [String]
chunks = sepBy atom comma
splitter = do
cs <- chunks
eof
return cs
goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
goodRes = parse splitter "splitter" goodS
badRes = parse splitter "splitter" badS
More information about the Haskell-Cafe
mailing list