[Haskell-cafe] please comment on my parser, can I do this
cleaner?
Thomas Hartman
tphyahoo at gmail.com
Tue Jun 9 19:21:31 EDT 2009
Thanks. It seems my original parser also works against FOO,BAR,BAZ if
you only modify
atom = string ","
<|> ( many1 $ noneOf "()<>," ) -- add ,
Indeed, what to call the "thingies" in a parser is a source of some
personal consternation.
What is a token, what is an atom, what is an expr? It all seems to be
somewhat ad hoc.
> 2009/6/9 Daniel Fischer <daniel.is.fischer at web.de>:
>> Am Dienstag 09 Juni 2009 20:29:09 schrieb Thomas Hartman:
>>> All I want to do is split on commas, but not the commas inside () or <>
>>> tags.
>>>
>>> I have been wanting to master parsec for a long time and this simple
>>> exercise looked like a good place to start.
>>>
>>> The code below does the right thing. Am I missing any tricks to make
>>> this simpler/neater?
>>>
>>> Thanks, thomas.
>>>
>>> thartman at ubuntu:~/perlArena>cat splitEm.
>>> splitEm.hs splitEm.hs~ splitEm.pl splitEm.pl~
>>> thartman at ubuntu:~/perlArena>cat splitEm.hs
>>> {-# LANGUAGE ScopedTypeVariables #-}
>>> import Text.ParserCombinators.Parsec
>>> import Text.ParserCombinators.Parsec.Char
>>> import Text.PrettyPrint (vcat, render, text)
>>> import Data.List.Split hiding (sepBy, chunk)
>>> import Text.ParserCombinators.Parsec.Token
>>>
>>> import Debug.Trace
>>> import Debug.Trace.Helpers
>>>
>>> -- this works, but is there a neater/cleaner way?
>>> main = ripInputsXs (toEof splitter) "splitter" [ goodS, badS ]
>>>
>>> -- I need a way to split on commas, but not the commas inside '<>' or
>>> '()' characters
>>> goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
>>> badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
>>> -- the first < matches a ), so reject this
>>>
>>>
>>> splitter = do
>>> chunks :: [String] <- toEof (many chunk)
>>> let pieces = map concat $ splitOn [","] chunks
>>> return pieces -- chunks
>>> where
>>> atom = string ","
>>> <|> ( many1 $ noneOf "()<>" )
>>> chunk = parenExpr <|> atom
>>
>> I think that does not do what you want.
>>
>> For input "FOO,BAR,BAZ", chunks is ["FOO,BAR,BAZ"], that won't be split; as far as I can
>> see, it splits only on commas directly following a parenExpr (or at the beginning of the
>> input or directly following another splitting comma).
>>
>>> parenExpr :: GenParser Char st [Char]
>>> parenExpr = let paren p = betweenInc (char '(' ) (char ')' ) p
>>> <|> betweenInc (char '<' ) (char '>' )
>>> p
>>> in paren $ option "" $ do ps <- many1 $ parenExpr <|> atom
>>> return . concat $ ps
>>>
>>> betweenInc o' c' p' = do
>>> o <- o'
>>> p <- p'
>>> c <- c'
>>> return $ [o] ++ p ++ [c]
>>>
>>> toEof p' = do
>>> r <- p'
>>> eof
>>> return r
>>>
>>>
>>>
>>>
>>>
>>>
>>> ripInputs prs prsName xs = mapM_ (putStrLn . show . parse prs prsName ) xs
>>> ripInputsXs prs prsName xs = mapM_ (putStrLn . showXs . parse prs prsName )
>>> xs where showXs v = case v of
>>> Left e -> show e
>>> Right xs -> render . vcat . map text $ xs
>>
>> I can offer (sorry for the names, and I don't know if what that does is really what you
>> want):
>>
>>
>> keepSepBy :: Parser a -> Parser a -> Parser [a]
>> keepSepBy p sep = (do
>> r <- p
>> (do s <- sep
>> xs <- keepSepBy p sep
>> return (r:s:xs)) <|> return [r])
>> <|> return []
>>
>> twain :: Parser a -> Parser a -> Parser [a] -> Parser [a]
>> twain open close list = do
>> o <- open
>> l <- list
>> c <- close
>> return (o:l++[c])
>>
>> comma :: Parser String
>> comma = string ","
>>
>> simpleChar :: Parser Char
>> simpleChar = noneOf "<>(),"
>>
>> suite :: Parser String
>> suite = many1 simpleChar
>>
>> atom :: Parser String
>> atom = fmap concat $ many1 (parenExp <|> suite)
>>
>> parenGroup :: Parser String
>> parenGroup = fmap concat $ keepSepBy atom comma
>>
>> parenExp :: Parser String
>> parenExp = twain (char '<') (char '>') parenGroup
>> <|> twain (char '(') (char ')') parenGroup
>>
>> chunks :: Parser [String]
>> chunks = sepBy atom comma
>>
>> splitter = do
>> cs <- chunks
>> eof
>> return cs
>>
>> goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
>> badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
>>
>> goodRes = parse splitter "splitter" goodS
>> badRes = parse splitter "splitter" badS
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
More information about the Haskell-Cafe
mailing list