[Haskell-cafe] simple parsec question
Immanuel Normann
immanuel.normann at googlemail.com
Tue Mar 5 09:50:03 CET 2013
Carlo,
Thanks a lot! This looks very promising (though I have to test it for my
purpose more in depth). As you mention, the key seems to be the optionMaybe
combinator. Thanks for pointing to it.
Immanuel
2013/3/5 Carlo Hamalainen <carlo.hamalainen at gmail.com>
> On Mon, Mar 4, 2013 at 1:44 AM, Immanuel Normann <
> immanuel.normann at googlemail.com> wrote:
>
>> I am trying to parse a semi structured text with parsec that basically
>> should identify sections. Each section starts with a headline and has an
>> unstructured content - that's all.
>>
>
> Here's my attempt: https://gist.github.com/carlohamalainen/5087207
>
> {-# LANGUAGE FlexibleContexts #-}
>
> import Text.Parsec
> import Control.Applicative hiding ((<|>),many)
>
> -- Example input:
>
> {-
> top 1:
>
> some text ... bla
>
> top 2:
>
> more text ... bla bla
>
> -}
>
> data Top = Top String deriving (Show)
> data Content = Content [String] deriving (Show)
> data Section = Section Top Content deriving (Show)
>
> headline = do
> t <- many1 (noneOf ":\n")
> char ':'
> newline
>
> return $ Top t
>
> contentLine = do
> x <- many (noneOf ":\n")
> newline
> return x
>
> content = do
> line <- optionMaybe (try contentLine)
>
> case line of Just x -> do xs <- content
> return (x:xs)
> _ -> return []
>
> section = do
> h <- headline
> c <- Content <$> content
> return $ Section h c
>
> main = do
> x <- readFile "simple.txt"
> print $ parse (many section) "" x
>
>
> Example run using your sample data:
>
> $ runhaskell Simple.hs
> Right [Section (Top "top 1") (Content ["","some text ... bla",""]),Section
> (Top "top 2") (Content ["","more text ... bla bla",""])]
>
> Notes:
>
> * I had to assume that a content line does not contain a ':', because that
> is the only way to distinguish a head-line (correct me if I'm wrong).
>
> * The key was to use optionMaybe along with try; see the definition of
> content.
>
> * I haven't tested this code on very large inputs.
>
> * I slightly changed the definition of Content to have a list of Strings,
> one for each line. I'm sure this could be altered if you wanted to retain
> all whitespace.
>
> * I am still new to Parsec, so don't take this as the definitive answer ;-)
>
> --
> Carlo Hamalainen
> http://carlo-hamalainen.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130305/b4288250/attachment.htm>
More information about the Haskell-Cafe
mailing list