[Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages
Costello, Roger L.
costello at mitre.org
Fri Jun 28 18:40:01 CEST 2013
Hi Tim,
Ø I realize you've already finished with the project ...
Actually, your message comes at an excellent time. I am not finished with the project. I have only finished one of the email headers -- the From Header.
Just today I was wondering how to proceed next:
- Should I extend my parser so that it deals with each of the other email headers? That is, create one monolithic parser for the entire email message? That doesn't seem very modular. I don't think that Happy supports importing other Happy parsers. Ideally I would create a parser for the From Header, a parser for the To Header, a parser for the Subject Header, and so forth. Then I would import each of them to create one unified email parser. If Happy doesn't support importing, I figured it might be better to switch to something that can combine parsers - a parser combinator - such as parsec. Unfortunately, I don't know anything about Parsec, but am eager to learn.
- I wonder if I can use Happy to generate individual parsers - a parser for the From Header, a parser for the To Header, a parser for the Subject Header - and then use Parsec to combine them?
As you see Tim, your suggestion to use parsec falls on receptive ears. I welcome all suggestions.
/Roger
From: beginners-bounces at haskell.org [mailto:beginners-bounces at haskell.org] On Behalf Of Tim Holland
Sent: Friday, June 28, 2013 12:18 PM
To: beginners at haskell.org
Subject: Re: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages
Hi Roger,
I realize you've already finished with the project, but for the future I think its a lot easier to use a parser combinator with Text.Parsec and Text.Parsec.String to do a similar thing. For example, if you were parsing XML to get a parse a single tag, you would try something like this:
parseTag :: Parser Tag
parseTag = many1 alphanum <?> "tag"
To get a tagged form, try
parseTagged :: Parser (Tag, [Elem])
parseTagged = do
char '<'
name <- parseTag
char '>'
content <- many (try parseElem)
string "</"
parseTag
char '>'
return (name, content)
<?> "tagged form"
and so one. I haven't tried this out, but a parser similar to yours would go something like this:
--Datatypes
type DisplayName = String
type EmailAddress = String
data Mailbox = Mailbox DisplayName EmailAddress deriving (Show)
parseFromHeader :: Parser [Mailbox]
parseFromHeader = do
string "From: "
mailboxes = many (try parseMailbox)
return mailboxes
parseMailbox :: Parser Mailbox
parseMailbox = do
parseComments
-- Names are optional
parseComments
name <- try parseDisplayName
parseComments
address <- parseEmailAddress
parseComments
try char ','
return Mailbox name address
<?> "Parse an indidivuals mailbox"
parseEmailAddress :: Parser EmailAddress
parseEmailAddress = do
try char '<'
handle <- many1 (noneof "@") -- Or whatever is valid here
char '@'
domain <- parseDomain
try char '<'
return handle++ at ++domain
parseDomain :: Parser String
parseDomain =
(char '[' >> parseDomain >>= (\domainName -> do char ']'
return domainName))
<|> parseWebsiteName >>= return
And so on. Again, I've tested none of the Email header bits but the XML bit works. It requires some level of comfort with monadic operations, but beyond that I think it's a much simpler may to parse.
Regards,
Tim Holland
On 28 June 2013 03:00, <beginners-request at haskell.org<mailto:beginners-request at haskell.org>> wrote:
Send Beginners mailing list submissions to
beginners at haskell.org<mailto:beginners at haskell.org>
To subscribe or unsubscribe via the World Wide Web, visit
http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
beginners-request at haskell.org<mailto:beginners-request at haskell.org>
You can reach the person managing the list at
beginners-owner at haskell.org<mailto:beginners-owner at haskell.org>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."
Today's Topics:
1. data declaration using other type's names? (Patrick Redmond)
2. Re: data declaration using other type's names? (Brandon Allbery)
3. Re: data declaration using other type's names? (Nikita Danilenko)
4. Re: what to do about excess memory usage (Chadda? Fouch?)
5. Re: what to do about excess memory usage (James Jones)
6. How to Lex, Parse, and Serialize-to-XML email messages
(Costello, Roger L.)
----------------------------------------------------------------------
Message: 1
Date: Thu, 27 Jun 2013 11:24:51 -0400
From: Patrick Redmond <plredmond at gmail.com<mailto:plredmond at gmail.com>>
Subject: [Haskell-beginners] data declaration using other type's
names?
To: beginners at haskell.org<mailto:beginners at haskell.org>
Message-ID:
<CAHUea4FfBP8L1kU+tS1-2cVPvAB4h22j35JcNwRC-jGds0=v6g at mail.gmail.com<mailto:v6g at mail.gmail.com>>
Content-Type: text/plain; charset=UTF-8
Hey Haskellers,
I noticed that ghci lets me do this:
> data Foo = Int Int | Float
> :t Int
Int :: Int -> Foo
> :t Float
Float :: Foo
> :t Int 4
Int 4 :: Foo
It's confusing to have type constructors that use names of existing
types. It's not intuitive that the name "Int" could refer to two
different things, which brings me to:
> data Bar = Bar Int
> :t Bar
Bar :: Int -> Bar
Yay? I can have a simple type with one constructor named the same as the type.
Why is this allowed? Is it useful somehow?
--Patrick
------------------------------
Message: 2
Date: Thu, 27 Jun 2013 11:37:46 -0400
From: Brandon Allbery <allbery.b at gmail.com<mailto:allbery.b at gmail.com>>
Subject: Re: [Haskell-beginners] data declaration using other type's
names?
To: The Haskell-Beginners Mailing List - Discussion of primarily
beginner-level topics related to Haskell <beginners at haskell.org<mailto:beginners at haskell.org>>
Message-ID:
<CAKFCL4U-E4B_+cts0vpNX8Ar9wccQDjgzWOYHLXLsLAv+Qn_cg at mail.gmail.com<mailto:CAKFCL4U-E4B_%2Bcts0vpNX8Ar9wccQDjgzWOYHLXLsLAv%2BQn_cg at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"
On Thu, Jun 27, 2013 at 11:24 AM, Patrick Redmond <plredmond at gmail.com<mailto:plredmond at gmail.com>>wrote:
> I noticed that ghci lets me do this:
>
Not just ghci, but ghc as well.
> Yay? I can have a simple type with one constructor named the same as the
> type.
> Why is this allowed? Is it useful somehow?
>
It's convenient for pretty much the situation you showed, where the type
constructor and data constructor have the same name. A number of people do
advocate that it not be used, though, because it can be confusing for
people. (Not for the compiler; data and type constructors can't be used in
the same places, it never has trouble keeping straight which is which.)
It might be best to consider this as "there is no good reason to *prevent*
it from happening, from a language standpoint".
--
brandon s allbery kf8nh sine nomine associates
allbery.b at gmail.com<mailto:allbery.b at gmail.com> ballbery at sinenomine.net<mailto:ballbery at sinenomine.net>
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130627/ea0e9cc5/attachment-0001.htm>
------------------------------
Message: 3
Date: Thu, 27 Jun 2013 18:02:00 +0200
From: Nikita Danilenko <nda at informatik.uni-kiel.de<mailto:nda at informatik.uni-kiel.de>>
Subject: Re: [Haskell-beginners] data declaration using other type's
names?
To: beginners at haskell.org<mailto:beginners at haskell.org>
Message-ID: <51CC61F8.9020506 at informatik.uni-kiel.de<mailto:51CC61F8.9020506 at informatik.uni-kiel.de>>
Content-Type: text/plain; charset=ISO-8859-1
Hi, Patrick,
the namespaces for types and constructors are considered disjoint, i.e.
you can use a name in both contexts. A simple example of this feature is
your last definition
> data Bar = Bar Int
or even shorter
> data A = A
This is particularly useful for single-constructor types ? la
> data MyType a = MyType a
Clearly, using "Int" or "Float" as constructor names may seem odd, but
when dealing with a simple grammar it is quite natural to write
> data Exp = Num Int | Add Exp Exp
although "Num" is a type class in Haskell.
Best regards,
Nikita
On 27/06/13 17:24, Patrick Redmond wrote:
> Hey Haskellers,
>
> I noticed that ghci lets me do this:
>
>> data Foo = Int Int | Float
>> :t Int
> Int :: Int -> Foo
>> :t Float
> Float :: Foo
>> :t Int 4
> Int 4 :: Foo
>
> It's confusing to have type constructors that use names of existing
> types. It's not intuitive that the name "Int" could refer to two
> different things, which brings me to:
>
>> data Bar = Bar Int
>> :t Bar
> Bar :: Int -> Bar
>
> Yay? I can have a simple type with one constructor named the same as the type.
>
> Why is this allowed? Is it useful somehow?
>
> --Patrick
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org<mailto:Beginners at haskell.org>
> http://www.haskell.org/mailman/listinfo/beginners
------------------------------
Message: 4
Date: Thu, 27 Jun 2013 18:23:25 +0200
From: Chadda? Fouch? <chaddai.fouche at gmail.com<mailto:chaddai.fouche at gmail.com>>
Subject: Re: [Haskell-beginners] what to do about excess memory usage
To: The Haskell-Beginners Mailing List - Discussion of primarily
beginner-level topics related to Haskell <beginners at haskell.org<mailto:beginners at haskell.org>>
Message-ID:
<CANfjZRbGTvoECTMsriNDAUozbow1fUGt-9FRtG-XwRJ+DamiAw at mail.gmail.com<mailto:CANfjZRbGTvoECTMsriNDAUozbow1fUGt-9FRtG-XwRJ%2BDamiAw at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"
First 2MB isn't a lot of RAM nowadays, do you mean 2GB or is that just
compared to the rest of the program ?
Second, your powersOfTen should probably be :
> powersOfTen = iterate (10*) 1
Or maybe even a Vector (if you can guess the maximum value asked of it) or
a MemoTrie (if you can't) since list indexing is slow as hell.
That could help with memoPair which should definitely be a Vector and not a
list.
Good luck (on the other hand, maybe your program is already "good enough"
and you could just switch to another project)
--
Jedai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130627/f2da75ff/attachment-0001.htm>
------------------------------
Message: 5
Date: Thu, 27 Jun 2013 18:28:27 -0500
From: James Jones <jejones3141 at gmail.com<mailto:jejones3141 at gmail.com>>
Subject: Re: [Haskell-beginners] what to do about excess memory usage
To: The Haskell-Beginners Mailing List - Discussion of primarily
beginner-level topics related to Haskell <beginners at haskell.org<mailto:beginners at haskell.org>>
Message-ID: <51CCCA9B.40807 at gmail.com<mailto:51CCCA9B.40807 at gmail.com>>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 06/27/2013 11:23 AM, Chadda? Fouch? wrote:
> First 2MB isn't a lot of RAM nowadays, do you mean 2GB or is that just
> compared to the rest of the program ?
It's a lot compared to the rest of the program... not to mention that
I'm a fossil from the days of 8-bit microprocessors, so 2 MB seems like
a lot of RAM to me. :)
> Second, your powersOfTen should probably be :
>
> > powersOfTen = iterate (10*) 1
>
> Or maybe even a Vector (if you can guess the maximum value asked of
> it) or a MemoTrie (if you can't) since list indexing is slow as hell.
> That could help with memoPair which should definitely be a Vector and
> not a list.
Thanks!
>
> Good luck (on the other hand, maybe your program is already "good
> enough" and you could just switch to another project)
> --
> Jedai
>
I do want to find a better way to keep the list of positions for ones
around than a [Int], and I want to save them only as long as I need to,
i.e. until I have both the 2 * k and 2 * k + 1 digit palindromes. Once
that's done, I will move on. Thanks again!
------------------------------
Message: 6
Date: Fri, 28 Jun 2013 09:30:30 +0000
From: "Costello, Roger L." <costello at mitre.org<mailto:costello at mitre.org>>
Subject: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML
email messages
To: "beginners at haskell.org<mailto:beginners at haskell.org>" <beginners at haskell.org<mailto:beginners at haskell.org>>
Message-ID:
<B5FEE00B53CF054AA8439027E8FE17751EFA9005 at IMCMBX04.MITRE.ORG<mailto:B5FEE00B53CF054AA8439027E8FE17751EFA9005 at IMCMBX04.MITRE.ORG>>
Content-Type: text/plain; charset="us-ascii"
Hi Folks,
I am working toward being able to input any email message and output an equivalent XML encoding.
I am starting small, with one of the email headers -- the "From Header"
Here is an example of a From Header:
From: John Doe <john at doe.org<mailto:john at doe.org>>
I have successfully transformed it into this XML:
<From>
<Mailbox>
<DisplayName>John Doe</DisplayName>
<Address>john at doe.org<mailto:john at doe.org></Address>
</Mailbox>
</From>
I used the lexical analyzer "Alex" [1] to break apart (tokenize) the From Header.
I used the parser "Happy" [2] to process the tokens and generate a parse tree.
Then I used a serializer to walk the parse tree and output XML.
I posted to stackoverflow a complete description of how to lex, parse, and serialize-to-XML email From Headers:
http://stackoverflow.com/questions/17354442/how-to-lex-parse-and-serialize-to-xml-email-messages-using-alex-and-happy
/Roger
[1] The Alex User's Guide may be found at this URL: http://www.haskell.org/alex/doc/html/
[2] The Happy User's Guide may be found at this URL: http://www.haskell.org/happy/
------------------------------
_______________________________________________
Beginners mailing list
Beginners at haskell.org<mailto:Beginners at haskell.org>
http://www.haskell.org/mailman/listinfo/beginners
End of Beginners Digest, Vol 60, Issue 38
*****************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130628/50dc9101/attachment-0001.htm>
More information about the Beginners
mailing list