[Haskell-cafe] simple parsec question
Andrey Chudnov
achudnov at gmail.com
Mon Mar 4 06:19:29 CET 2013
Immanuel,
I tried but I couldn't figure it out. Here's a gist with my attempts and
results so far: https://gist.github.com/achudnov/f3af65f11d5162c73064
There, 'test' uses my attempt at specifying the parser, 'test2' uses
yours. Note that your attempt wouldn't parse multiple sections -- for
that you need to use 'many section' instead of just 'section' in 'parse'
('parseFromFile' in the original).
I think what's going on is the lookahead is wrong, but I'm not sure how
exactly. I'll give it another go tomorrow if I have time.
/Andrey
On 03/03/2013 05:16 PM, Immanuel Normann wrote:
> Andrey,
>
> Thanks for your attempt, but it doesn't seem to work. The easy part is
> the headline, but the content makes trouble.
>
> Let me write the code a bit more explicit, so you can copy and paste it:
>
> ------------------------------------------
> {-# LANGUAGE FlexibleContexts #-}
>
> module Main where
>
> import Text.Parsec
>
> data Top = Top String deriving (Show)
> data Content = Content String deriving (Show)
> data Section = Section Top Content deriving (Show)
>
> headline :: Stream s m Char => ParsecT s u m Top
> headline = manyTill anyChar (char ':' >> newline) >>= return . Top
>
> content :: Stream s m Char => ParsecT s u m Content
> content = manyTill anyChar (try headline) >>= return . Content
>
> section :: Stream s m Char => ParsecT s u m Section
> section = do {h <- headline; c <- content; return (Section h c)}
> ------------------------------------------
>
>
> Assume the following example text is stored in "/tmp/test.txt":
> ---------------------------
> top 1:
>
> some text ... bla
>
> top 2:
>
> more text ... bla bla
> ---------------------------
>
> Now I run the section parser in ghci against the above mentioned
> example text stored in "/tmp/test.txt":
>
> *Main> parseFromFile section "/tmp/test.txt"
> Right (Section (Top "top 1") (Content ""))
>
> I don't understand the behaviour of the content parser here. Why does
> it return ""? Or perhaps more generally, I don't understand the
> manyTill combinator (though I read the docs).
>
> Side remark: of cause for this little task it is probably to much
> effort to use parsec. However, my content in fact has an internal
> structure which I would like to parse further, but I deliberately
> abstracted from these internals as they don't effect my above stated
> problem.
>
> Immanuel
>
>
> 2013/3/3 Andrey Chudnov <achudnov at gmail.com <mailto:achudnov at gmail.com>>
>
> Immanuel,
> Since a heading always starts with a new line (and ends with a
> colon followed by a carriage return or just a colon?), I think it
> might be useful to first separate the input into lines and then
> classify them depending on whether it's a heading or not and
> reassemble them into the value you need. You don't even need
> parsec for that.
>
> However, if you really want to use parsec, you can write something
> like (warning, not tested):
> many $ liftM2 Section headline content
> where headline = anyChar `manyTill` (char ':' >> spaces >> newline)
> content = anyChar `manyTill` (try $ newline >>
> headline)
>
> /Andrey
>
>
> On 3/3/2013 10:44 AM, Immanuel Normann wrote:
>
> I am trying to parse a semi structured text with parsec that
> basically should identify sections. Each section starts with a
> headline and has an unstructured content - that's all. For
> instance, consider the following example text (inside the
> dashed lines):
>
> ---------------------------
>
> top 1:
>
> some text ... bla
>
> top 2:
>
> more text ... bla bla
>
>
> ---------------------------
>
> This should be parsed into a structure like this:
>
> [Section (Top 1) (Content "some text ... bla"), Section (Top
> 1) (Content "more text ... bla")]
>
> Say, I have a parser "headline", but the content after a
> headline could be anything that is different from what
> "headline" parses.
> How could the "section" parser making use of "headline" look like?
> My idea would be to use the "manyTill" combinator, but I don"t
> find an easy solution.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130304/351d82ba/attachment.htm>
More information about the Haskell-Cafe
mailing list