[Haskell-cafe] simple parsec question
Carlo Hamalainen
carlo.hamalainen at gmail.com
Tue Mar 5 02:18:50 CET 2013
On Mon, Mar 4, 2013 at 1:44 AM, Immanuel Normann <
immanuel.normann at googlemail.com> wrote:
> I am trying to parse a semi structured text with parsec that basically
> should identify sections. Each section starts with a headline and has an
> unstructured content - that's all.
>
Here's my attempt: https://gist.github.com/carlohamalainen/5087207
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Control.Applicative hiding ((<|>),many)
-- Example input:
{-
top 1:
some text ... bla
top 2:
more text ... bla bla
-}
data Top = Top String deriving (Show)
data Content = Content [String] deriving (Show)
data Section = Section Top Content deriving (Show)
headline = do
t <- many1 (noneOf ":\n")
char ':'
newline
return $ Top t
contentLine = do
x <- many (noneOf ":\n")
newline
return x
content = do
line <- optionMaybe (try contentLine)
case line of Just x -> do xs <- content
return (x:xs)
_ -> return []
section = do
h <- headline
c <- Content <$> content
return $ Section h c
main = do
x <- readFile "simple.txt"
print $ parse (many section) "" x
Example run using your sample data:
$ runhaskell Simple.hs
Right [Section (Top "top 1") (Content ["","some text ... bla",""]),Section
(Top "top 2") (Content ["","more text ... bla bla",""])]
Notes:
* I had to assume that a content line does not contain a ':', because that
is the only way to distinguish a head-line (correct me if I'm wrong).
* The key was to use optionMaybe along with try; see the definition of
content.
* I haven't tested this code on very large inputs.
* I slightly changed the definition of Content to have a list of Strings,
one for each line. I'm sure this could be altered if you wanted to retain
all whitespace.
* I am still new to Parsec, so don't take this as the definitive answer ;-)
--
Carlo Hamalainen
http://carlo-hamalainen.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130305/d43358c9/attachment-0001.htm>
More information about the Haskell-Cafe
mailing list