[Haskell-cafe] simple parsec question

Immanuel Normann immanuel.normann at googlemail.com
Sun Mar 3 23:16:24 CET 2013


Andrey,

Thanks for your attempt, but it doesn't seem to work. The easy part is the
headline, but the content makes trouble.

Let me write the code a bit more explicit, so you can copy and paste it:

------------------------------------------
{-# LANGUAGE FlexibleContexts #-}

module Main where

import Text.Parsec

data Top = Top String deriving (Show)
data Content = Content String deriving (Show)
data Section = Section Top Content deriving (Show)

headline :: Stream s m Char => ParsecT s u m Top
headline = manyTill anyChar (char ':' >> newline) >>= return . Top

content :: Stream s m Char => ParsecT s u m Content
content = manyTill anyChar (try headline) >>= return . Content

section :: Stream s m Char => ParsecT s u m Section
section = do {h <- headline; c <- content; return (Section h c)}
------------------------------------------


Assume the following example text is stored in  "/tmp/test.txt":
---------------------------
top 1:

some text ... bla

top 2:

more text ... bla bla
---------------------------

Now I run the section parser in ghci against the above mentioned example
text stored in "/tmp/test.txt":

*Main> parseFromFile section "/tmp/test.txt"
Right (Section (Top "top 1") (Content ""))

I don't understand the behaviour of the content parser here. Why does it
return ""? Or perhaps more generally, I don't understand the manyTill
combinator (though I read the docs).

Side remark: of cause for this little task it is probably to much effort to
use parsec. However, my content in fact has an internal structure which I
would like to parse further, but I deliberately abstracted from these
internals as they don't effect my above stated problem.

Immanuel


2013/3/3 Andrey Chudnov <achudnov at gmail.com>

> Immanuel,
> Since a heading always starts with a new line (and ends with a colon
> followed by a carriage return or just a colon?), I think it might be useful
> to first separate the input into lines and then classify them depending on
> whether it's a heading or not and reassemble them into the value you need.
> You don't even need parsec for that.
>
> However, if you really want to use parsec, you can write something like
> (warning, not tested):
> many $ liftM2 Section headline content
>    where headline = anyChar `manyTill` (char ':' >> spaces >> newline)
>                content  = anyChar `manyTill` (try $ newline >> headline)
>
> /Andrey
>
>
> On 3/3/2013 10:44 AM, Immanuel Normann wrote:
>
>> I am trying to parse a semi structured text with parsec that basically
>> should identify sections. Each section starts with a headline and has an
>> unstructured content - that's all. For instance, consider the following
>> example text (inside the dashed lines):
>>
>> ---------------------------
>>
>> top 1:
>>
>> some text ... bla
>>
>> top 2:
>>
>> more text ... bla bla
>>
>>
>> ---------------------------
>>
>> This should be parsed into a structure like this:
>>
>> [Section (Top 1) (Content "some text ... bla"), Section (Top 1) (Content
>> "more text ... bla")]
>>
>> Say, I have a parser "headline", but the content after a headline could
>> be anything that is different from what "headline" parses.
>> How could the "section" parser making use of "headline" look like?
>> My idea would be to use the "manyTill" combinator, but I don"t find an
>> easy solution.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130303/46182e4a/attachment-0001.htm>


More information about the Haskell-Cafe mailing list