[Haskell-cafe] simple parsec question
Immanuel Normann
immanuel.normann at googlemail.com
Sun Mar 3 23:16:24 CET 2013
Andrey,
Thanks for your attempt, but it doesn't seem to work. The easy part is the
headline, but the content makes trouble.
Let me write the code a bit more explicit, so you can copy and paste it:
------------------------------------------
{-# LANGUAGE FlexibleContexts #-}
module Main where
import Text.Parsec
data Top = Top String deriving (Show)
data Content = Content String deriving (Show)
data Section = Section Top Content deriving (Show)
headline :: Stream s m Char => ParsecT s u m Top
headline = manyTill anyChar (char ':' >> newline) >>= return . Top
content :: Stream s m Char => ParsecT s u m Content
content = manyTill anyChar (try headline) >>= return . Content
section :: Stream s m Char => ParsecT s u m Section
section = do {h <- headline; c <- content; return (Section h c)}
------------------------------------------
Assume the following example text is stored in "/tmp/test.txt":
---------------------------
top 1:
some text ... bla
top 2:
more text ... bla bla
---------------------------
Now I run the section parser in ghci against the above mentioned example
text stored in "/tmp/test.txt":
*Main> parseFromFile section "/tmp/test.txt"
Right (Section (Top "top 1") (Content ""))
I don't understand the behaviour of the content parser here. Why does it
return ""? Or perhaps more generally, I don't understand the manyTill
combinator (though I read the docs).
Side remark: of cause for this little task it is probably to much effort to
use parsec. However, my content in fact has an internal structure which I
would like to parse further, but I deliberately abstracted from these
internals as they don't effect my above stated problem.
Immanuel
2013/3/3 Andrey Chudnov <achudnov at gmail.com>
> Immanuel,
> Since a heading always starts with a new line (and ends with a colon
> followed by a carriage return or just a colon?), I think it might be useful
> to first separate the input into lines and then classify them depending on
> whether it's a heading or not and reassemble them into the value you need.
> You don't even need parsec for that.
>
> However, if you really want to use parsec, you can write something like
> (warning, not tested):
> many $ liftM2 Section headline content
> where headline = anyChar `manyTill` (char ':' >> spaces >> newline)
> content = anyChar `manyTill` (try $ newline >> headline)
>
> /Andrey
>
>
> On 3/3/2013 10:44 AM, Immanuel Normann wrote:
>
>> I am trying to parse a semi structured text with parsec that basically
>> should identify sections. Each section starts with a headline and has an
>> unstructured content - that's all. For instance, consider the following
>> example text (inside the dashed lines):
>>
>> ---------------------------
>>
>> top 1:
>>
>> some text ... bla
>>
>> top 2:
>>
>> more text ... bla bla
>>
>>
>> ---------------------------
>>
>> This should be parsed into a structure like this:
>>
>> [Section (Top 1) (Content "some text ... bla"), Section (Top 1) (Content
>> "more text ... bla")]
>>
>> Say, I have a parser "headline", but the content after a headline could
>> be anything that is different from what "headline" parses.
>> How could the "section" parser making use of "headline" look like?
>> My idea would be to use the "manyTill" combinator, but I don"t find an
>> easy solution.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130303/46182e4a/attachment-0001.htm>
More information about the Haskell-Cafe
mailing list