[Haskell-cafe] parsing exercise

Sebastian Fischer fischer at nii.ac.jp
Sun Jan 23 10:39:39 CET 2011


On Sun, Jan 23, 2011 at 4:31 PM, Chung-chieh Shan
<ccshan at post.harvard.edu>wrote:

> Maybe Text.Show.Pretty.parseValue in the pretty-show package can help?
>

That's what I was looking for, thanks!

On Sun, Jan 23, 2011 at 5:23 PM, Stephen Tetley <stephen.tetley at gmail.com>
 wrote:

> I don't think you can do this "simply" as you think you would always
> have to build a parse tree.


Isn't it enough to maintain a stack of open parens, brackets, char- and
string-terminators and escape chars? Below is my attempt at solving the
problem without an expression parser.

In practice, if you follow the skeleton syntax tree style you might
> find "not caring" about the details of syntax is almost as much work
> as caring about them. I've tried a couple of times to make a skeleton
> parser that does paren nesting and little else, but always given up
> and just used a proper parser as the skeleton parser never seemed
> robust.
>

Indeed I doubt that the implementation below is robust and it's too tricky
to be easily maintainable. I include it for reference. Let me know if you
spot an obvious mistake..

Sebastian

splitTLC :: String -> [String]
splitTLC = parse ""

type Stack  = String

parse :: Stack -> String -> [String]
parse _  ""     = []
parse st (c:cs) = next c st $ parse (updStack c st) cs

next :: Char -> Stack -> [String] -> [String]
next c []    xs = if c==',' then [] : xs else c <: xs
next c (_:_) xs = c <: xs

infixr 0 <:

(<:) :: Char -> [String] -> [String]
c <: []     = [[c]]
c <: (x:xs) = (c:x):xs

updStack :: Char -> Stack -> Stack
updStack char stack =
  case (char,stack) of
    -- char is an escaped character
    (_   ,'\\':xs) -> xs      -- the next character is not

    -- char is the escape character
    ('\\',     xs) -> '\\':xs -- push it on the stack

    -- char is the string terminator
    ('"' , '"':xs) -> xs      -- closes current string literal
    ('"' , ''':xs) -> ''':xs  -- ignored inside character
    ('"' ,     xs) -> '"':xs  -- opens a new string

    -- char is the character terminator
    (''' , ''':xs) -> xs      -- closes current character literal
    (''' , '"':xs) -> '"':xs  -- ignored inside string
    (''' ,     xs) -> ''':xs  -- opens a new character

    -- parens and brackets
    (_   , '"':xs) -> '"':xs  -- are ignored inside strings
    (_   , ''':xs) -> ''':xs  -- and characters
    ('(' ,     xs) -> '(':xs  -- new opening paren
    (')' , '(':xs) -> xs      -- closing paren
    ('[' ,     xs) -> '[':xs  -- opening bracket
    (']' , '[':xs) -> xs      -- closing bracket

    -- other character don't modify the stack (ignoring record syntax)
    (_   ,     xs) -> xs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110123/92d6d3a3/attachment.htm>


More information about the Haskell-Cafe mailing list