[Haskell-cafe] simple parsec question

Andrey Chudnov achudnov at gmail.com
Mon Mar 4 06:19:29 CET 2013


Immanuel,
I tried but I couldn't figure it out. Here's a gist with my attempts and
results so far: https://gist.github.com/achudnov/f3af65f11d5162c73064
There, 'test' uses my attempt at specifying the parser, 'test2' uses
yours. Note that your attempt wouldn't parse multiple sections -- for
that you need to use 'many section' instead of just 'section' in 'parse'
('parseFromFile' in the original).
I think what's going on is the lookahead is wrong, but I'm not sure how
exactly. I'll give it another go tomorrow if I have time.

/Andrey

On 03/03/2013 05:16 PM, Immanuel Normann wrote:
> Andrey,
>
> Thanks for your attempt, but it doesn't seem to work. The easy part is
> the headline, but the content makes trouble.
>
> Let me write the code a bit more explicit, so you can copy and paste it:
>
> ------------------------------------------
> {-# LANGUAGE FlexibleContexts #-}
>
> module Main where
>
> import Text.Parsec
>
> data Top = Top String deriving (Show)
> data Content = Content String deriving (Show)
> data Section = Section Top Content deriving (Show)
>
> headline :: Stream s m Char => ParsecT s u m Top
> headline = manyTill anyChar (char ':' >> newline) >>= return . Top
>
> content :: Stream s m Char => ParsecT s u m Content
> content = manyTill anyChar (try headline) >>= return . Content
>
> section :: Stream s m Char => ParsecT s u m Section
> section = do {h <- headline; c <- content; return (Section h c)}
> ------------------------------------------
>
>
> Assume the following example text is stored in  "/tmp/test.txt":
> ---------------------------
> top 1:
>
> some text ... bla
>
> top 2:
>
> more text ... bla bla
> ---------------------------
>
> Now I run the section parser in ghci against the above mentioned
> example text stored in "/tmp/test.txt":
>
> *Main> parseFromFile section "/tmp/test.txt"
> Right (Section (Top "top 1") (Content ""))
>
> I don't understand the behaviour of the content parser here. Why does
> it return ""? Or perhaps more generally, I don't understand the
> manyTill combinator (though I read the docs).
>
> Side remark: of cause for this little task it is probably to much
> effort to use parsec. However, my content in fact has an internal
> structure which I would like to parse further, but I deliberately
> abstracted from these internals as they don't effect my above stated
> problem.
>
> Immanuel
>
>
> 2013/3/3 Andrey Chudnov <achudnov at gmail.com <mailto:achudnov at gmail.com>>
>
>     Immanuel,
>     Since a heading always starts with a new line (and ends with a
>     colon followed by a carriage return or just a colon?), I think it
>     might be useful to first separate the input into lines and then
>     classify them depending on whether it's a heading or not and
>     reassemble them into the value you need. You don't even need
>     parsec for that.
>
>     However, if you really want to use parsec, you can write something
>     like (warning, not tested):
>     many $ liftM2 Section headline content
>        where headline = anyChar `manyTill` (char ':' >> spaces >> newline)
>                    content  = anyChar `manyTill` (try $ newline >>
>     headline)
>
>     /Andrey
>
>
>     On 3/3/2013 10:44 AM, Immanuel Normann wrote:
>
>         I am trying to parse a semi structured text with parsec that
>         basically should identify sections. Each section starts with a
>         headline and has an unstructured content - that's all. For
>         instance, consider the following example text (inside the
>         dashed lines):
>
>         ---------------------------
>
>         top 1:
>
>         some text ... bla
>
>         top 2:
>
>         more text ... bla bla
>
>
>         ---------------------------
>
>         This should be parsed into a structure like this:
>
>         [Section (Top 1) (Content "some text ... bla"), Section (Top
>         1) (Content "more text ... bla")]
>
>         Say, I have a parser "headline", but the content after a
>         headline could be anything that is different from what
>         "headline" parses.
>         How could the "section" parser making use of "headline" look like?
>         My idea would be to use the "manyTill" combinator, but I don"t
>         find an easy solution.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130304/351d82ba/attachment.htm>


More information about the Haskell-Cafe mailing list