[Haskell-beginners] Parsing email headers (alex + happy vs parsec)

Tim Holland th0114nd at gmail.com
Mon Jul 1 05:46:20 CEST 2013


Hi Roger,
On the contrary, I think that a parser combinator is a very modular
solution. Each type of parseable string can be
recursively defined with the monadic functions of the combinator. These
functions, of course, don't care about context and can be used wherever
appropriate. If you send me a test file of headers and related outputs, I
might see if I can whip something up to try to change your mind.

Regards,
Tim Holland

On 28 June 2013 09:40, <beginners-request at haskell.org> wrote:

> Send Beginners mailing list submissions to
>         beginners at haskell.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.haskell.org/mailman/listinfo/beginners
> or, via email, send a message with subject or body 'help' to
>         beginners-request at haskell.org
>
> You can reach the person managing the list at
>         beginners-owner at haskell.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Beginners digest..."
>
>
> Today's Topics:
>
>    1. Re:  How to Lex, Parse, and Serialize-to-XML email messages
>       (Costello, Roger L.)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 28 Jun 2013 16:40:01 +0000
> From: "Costello, Roger L." <costello at mitre.org>
> Subject: Re: [Haskell-beginners] How to Lex, Parse, and
>         Serialize-to-XML email messages
> To: "The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell" <beginners at haskell.org>
> Message-ID:
>         <B5FEE00B53CF054AA8439027E8FE17751EFA91FE at IMCMBX04.MITRE.ORG>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Tim,
>
>
> ?  I realize you've already finished with the project ...
>
> Actually, your message comes at an excellent time. I am not finished with
> the project. I have only finished one of the email headers -- the From
> Header.
>
> Just today I was wondering how to proceed next:
>
>
> -          Should I extend my parser so that it deals with each of the
> other email headers? That is, create one monolithic parser for the entire
> email message? That doesn't seem very modular. I don't think that Happy
> supports importing other Happy parsers. Ideally I would create a parser for
> the From Header, a parser for the To Header, a parser for the Subject
> Header,  and so forth. Then I would import each of them to create one
> unified email parser. If Happy doesn't support importing, I figured it
> might be better to switch to something that can combine parsers - a parser
> combinator - such as parsec. Unfortunately, I don't know anything about
> Parsec, but am eager to learn.
>
> -          I wonder if I can use Happy to generate individual parsers - a
> parser for the From Header, a parser for the To Header, a parser for the
> Subject Header - and then use Parsec to combine them?
>
> As you see Tim, your suggestion to use parsec falls on receptive ears. I
> welcome all suggestions.
>
> /Roger
>
> From: beginners-bounces at haskell.org [mailto:beginners-bounces at haskell.org]
> On Behalf Of Tim Holland
> Sent: Friday, June 28, 2013 12:18 PM
> To: beginners at haskell.org
> Subject: Re: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML
> email messages
>
> Hi Roger,
>
> I realize you've already finished with the project, but for the future I
> think its a lot easier to use a parser combinator with Text.Parsec and
> Text.Parsec.String to  do a similar thing. For example, if you were parsing
> XML to get a parse a single tag, you would try something like this:
>
> parseTag :: Parser Tag
> parseTag = many1 alphanum <?> "tag"
>
> To get a tagged form, try
> parseTagged :: Parser (Tag, [Elem])
> parseTagged = do
>   char '<'
>   name <- parseTag
>   char '>'
>   content <- many (try parseElem)
>   string "</"
>   parseTag
>   char '>'
>   return (name, content)
>   <?> "tagged form"
>
> and so one. I haven't tried this out, but a parser similar to yours would
> go something like this:
>
> --Datatypes
> type DisplayName = String
> type EmailAddress = String
> data Mailbox = Mailbox DisplayName EmailAddress deriving (Show)
>
> parseFromHeader :: Parser [Mailbox]
> parseFromHeader = do
>   string "From: "
>   mailboxes = many (try parseMailbox)
>   return mailboxes
>
> parseMailbox :: Parser Mailbox
> parseMailbox = do
>   parseComments
>   -- Names are optional
>   parseComments
>   name <- try parseDisplayName
>   parseComments
>   address <- parseEmailAddress
>   parseComments
>   try char ','
>   return Mailbox name address
>   <?> "Parse an indidivuals mailbox"
>
> parseEmailAddress :: Parser EmailAddress
> parseEmailAddress = do
>   try char '<'
>   handle <- many1 (noneof "@") -- Or whatever is valid here
>   char '@'
>   domain <- parseDomain
>   try char '<'
>   return handle++ at ++domain
>
> parseDomain :: Parser String
> parseDomain =
>   (char '[' >> parseDomain >>= (\domainName -> do char ']'
>     return domainName))
> <|> parseWebsiteName >>= return
>
> And so on. Again, I've tested none of the Email header bits but the XML
> bit works. It requires some level of comfort with monadic operations, but
> beyond that I think it's a much simpler may to parse.
>
> Regards,
> Tim Holland
>
>
>
>
>
> On 28 June 2013 03:00, <beginners-request at haskell.org<mailto:
> beginners-request at haskell.org>> wrote:
> Send Beginners mailing list submissions to
>         beginners at haskell.org<mailto:beginners at haskell.org>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.haskell.org/mailman/listinfo/beginners
> or, via email, send a message with subject or body 'help' to
>         beginners-request at haskell.org<mailto:beginners-request at haskell.org
> >
>
> You can reach the person managing the list at
>         beginners-owner at haskell.org<mailto:beginners-owner at haskell.org>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Beginners digest..."
>
>
> Today's Topics:
>
>    1.  data declaration using other type's names? (Patrick Redmond)
>    2. Re:  data declaration using other type's names? (Brandon Allbery)
>    3. Re:  data declaration using other type's names? (Nikita Danilenko)
>    4. Re:  what to do about excess memory usage (Chadda? Fouch?)
>    5. Re:  what to do about excess memory usage (James Jones)
>    6.  How to Lex, Parse,       and Serialize-to-XML email messages
>       (Costello, Roger L.)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 27 Jun 2013 11:24:51 -0400
> From: Patrick Redmond <plredmond at gmail.com<mailto:plredmond at gmail.com>>
> Subject: [Haskell-beginners] data declaration using other type's
>         names?
> To: beginners at haskell.org<mailto:beginners at haskell.org>
> Message-ID:
>         <CAHUea4FfBP8L1kU+tS1-2cVPvAB4h22j35JcNwRC-jGds0=
> v6g at mail.gmail.com<mailto:v6g at mail.gmail.com>>
> Content-Type: text/plain; charset=UTF-8
>
> Hey Haskellers,
>
> I noticed that ghci lets me do this:
>
> > data Foo = Int Int | Float
> > :t Int
> Int :: Int -> Foo
> > :t Float
> Float :: Foo
> > :t Int 4
> Int 4 :: Foo
>
> It's confusing to have type constructors that use names of existing
> types. It's not intuitive that the name "Int" could refer to two
> different things, which brings me to:
>
> > data Bar = Bar Int
> > :t Bar
> Bar :: Int -> Bar
>
> Yay? I can have a simple type with one constructor named the same as the
> type.
>
> Why is this allowed? Is it useful somehow?
>
> --Patrick
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 27 Jun 2013 11:37:46 -0400
> From: Brandon Allbery <allbery.b at gmail.com<mailto:allbery.b at gmail.com>>
> Subject: Re: [Haskell-beginners] data declaration using other type's
>         names?
> To: The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell <beginners at haskell.org
> <mailto:beginners at haskell.org>>
> Message-ID:
>         <
> CAKFCL4U-E4B_+cts0vpNX8Ar9wccQDjgzWOYHLXLsLAv+Qn_cg at mail.gmail.com<mailto:
> CAKFCL4U-E4B_%2Bcts0vpNX8Ar9wccQDjgzWOYHLXLsLAv%2BQn_cg at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
>
> On Thu, Jun 27, 2013 at 11:24 AM, Patrick Redmond <plredmond at gmail.com
> <mailto:plredmond at gmail.com>>wrote:
>
> > I noticed that ghci lets me do this:
> >
>
> Not just ghci, but ghc as well.
>
>
> > Yay? I can have a simple type with one constructor named the same as the
> > type.
> > Why is this allowed? Is it useful somehow?
> >
>
> It's convenient for pretty much the situation you showed, where the type
> constructor and data constructor have the same name. A number of people do
> advocate that it not be used, though, because it can be confusing for
> people. (Not for the compiler; data and type constructors can't be used in
> the same places, it never has trouble keeping straight which is which.)
>
> It might be best to consider this as "there is no good reason to *prevent*
> it from happening, from a language standpoint".
>
> --
> brandon s allbery kf8nh                               sine nomine
> associates
> allbery.b at gmail.com<mailto:allbery.b at gmail.com>
>        ballbery at sinenomine.net<mailto:ballbery at sinenomine.net>
> unix, openafs, kerberos, infrastructure, xmonad
> http://sinenomine.net
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://www.haskell.org/pipermail/beginners/attachments/20130627/ea0e9cc5/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 27 Jun 2013 18:02:00 +0200
> From: Nikita Danilenko <nda at informatik.uni-kiel.de<mailto:
> nda at informatik.uni-kiel.de>>
> Subject: Re: [Haskell-beginners] data declaration using other type's
>         names?
> To: beginners at haskell.org<mailto:beginners at haskell.org>
> Message-ID: <51CC61F8.9020506 at informatik.uni-kiel.de<mailto:
> 51CC61F8.9020506 at informatik.uni-kiel.de>>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi, Patrick,
>
> the namespaces for types and constructors are considered disjoint, i.e.
> you can use a name in both contexts. A simple example of this feature is
> your last definition
>
> > data Bar = Bar Int
>
> or even shorter
>
> > data A = A
>
> This is particularly useful for single-constructor types ? la
>
> > data MyType a = MyType a
>
> Clearly, using "Int" or "Float" as constructor names may seem odd, but
> when dealing with a simple grammar it is quite natural to write
>
> > data Exp = Num Int | Add Exp Exp
>
> although "Num" is a type class in Haskell.
>
> Best regards,
>
> Nikita
>
> On 27/06/13 17:24, Patrick Redmond wrote:
> > Hey Haskellers,
> >
> > I noticed that ghci lets me do this:
> >
> >> data Foo = Int Int | Float
> >> :t Int
> > Int :: Int -> Foo
> >> :t Float
> > Float :: Foo
> >> :t Int 4
> > Int 4 :: Foo
> >
> > It's confusing to have type constructors that use names of existing
> > types. It's not intuitive that the name "Int" could refer to two
> > different things, which brings me to:
> >
> >> data Bar = Bar Int
> >> :t Bar
> > Bar :: Int -> Bar
> >
> > Yay? I can have a simple type with one constructor named the same as the
> type.
> >
> > Why is this allowed? Is it useful somehow?
> >
> > --Patrick
> >
> > _______________________________________________
> > Beginners mailing list
> > Beginners at haskell.org<mailto:Beginners at haskell.org>
> > http://www.haskell.org/mailman/listinfo/beginners
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 27 Jun 2013 18:23:25 +0200
> From: Chadda? Fouch? <chaddai.fouche at gmail.com<mailto:
> chaddai.fouche at gmail.com>>
> Subject: Re: [Haskell-beginners] what to do about excess memory usage
> To: The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell <beginners at haskell.org
> <mailto:beginners at haskell.org>>
> Message-ID:
>         <
> CANfjZRbGTvoECTMsriNDAUozbow1fUGt-9FRtG-XwRJ+DamiAw at mail.gmail.com<mailto:
> CANfjZRbGTvoECTMsriNDAUozbow1fUGt-9FRtG-XwRJ%2BDamiAw at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
>
> First 2MB isn't a lot of RAM nowadays, do you mean 2GB or is that just
> compared to the rest of the program ?
> Second, your powersOfTen should probably be :
>
> > powersOfTen = iterate (10*) 1
>
> Or maybe even a Vector (if you can guess the maximum value asked of it) or
> a MemoTrie (if you can't) since list indexing is slow as hell.
> That could help with memoPair which should definitely be a Vector and not a
> list.
>
> Good luck (on the other hand, maybe your program is already "good enough"
> and you could just switch to another project)
> --
> Jedai
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://www.haskell.org/pipermail/beginners/attachments/20130627/f2da75ff/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 5
> Date: Thu, 27 Jun 2013 18:28:27 -0500
> From: James Jones <jejones3141 at gmail.com<mailto:jejones3141 at gmail.com>>
> Subject: Re: [Haskell-beginners] what to do about excess memory usage
> To: The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell <beginners at haskell.org
> <mailto:beginners at haskell.org>>
> Message-ID: <51CCCA9B.40807 at gmail.com<mailto:51CCCA9B.40807 at gmail.com>>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 06/27/2013 11:23 AM, Chadda? Fouch? wrote:
> > First 2MB isn't a lot of RAM nowadays, do you mean 2GB or is that just
> > compared to the rest of the program ?
>
> It's a lot compared to the rest of the program... not to mention that
> I'm a fossil from the days of 8-bit microprocessors, so 2 MB seems like
> a lot of RAM to me. :)
>
> > Second, your powersOfTen should probably be :
> >
> > > powersOfTen = iterate (10*) 1
> >
> > Or maybe even a Vector (if you can guess the maximum value asked of
> > it) or a MemoTrie (if you can't) since list indexing is slow as hell.
> > That could help with memoPair which should definitely be a Vector and
> > not a list.
>
> Thanks!
> >
> > Good luck (on the other hand, maybe your program is already "good
> > enough" and you could just switch to another project)
> > --
> > Jedai
> >
> I do want to find a better way to keep the list of positions for ones
> around than a [Int], and I want to save them only as long as I need to,
> i.e. until I have both the 2 * k and 2 * k + 1 digit palindromes. Once
> that's done, I will move on. Thanks again!
>
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 28 Jun 2013 09:30:30 +0000
> From: "Costello, Roger L." <costello at mitre.org<mailto:costello at mitre.org>>
> Subject: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML
>         email messages
> To: "beginners at haskell.org<mailto:beginners at haskell.org>" <
> beginners at haskell.org<mailto:beginners at haskell.org>>
> Message-ID:
>         <B5FEE00B53CF054AA8439027E8FE17751EFA9005 at IMCMBX04.MITRE.ORG
> <mailto:B5FEE00B53CF054AA8439027E8FE17751EFA9005 at IMCMBX04.MITRE.ORG>>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Folks,
>
> I am working toward being able to input any email message and output an
> equivalent XML encoding.
>
> I am starting small, with one of the email headers -- the "From Header"
>
> Here is an example of a From Header:
>
>         From: John Doe <john at doe.org<mailto:john at doe.org>>
>
> I have successfully transformed it into this XML:
>
>         <From>
>             <Mailbox>
>                 <DisplayName>John Doe</DisplayName>
>                 <Address>john at doe.org<mailto:john at doe.org></Address>
>             </Mailbox>
>         </From>
>
> I used the lexical analyzer "Alex" [1] to break apart (tokenize) the From
> Header.
>
> I used the parser "Happy" [2] to process the tokens and generate a parse
> tree.
>
> Then I used a serializer to walk the parse tree and output XML.
>
> I posted to stackoverflow a complete description of how to lex, parse, and
> serialize-to-XML email From Headers:
>
>
> http://stackoverflow.com/questions/17354442/how-to-lex-parse-and-serialize-to-xml-email-messages-using-alex-and-happy
>
> /Roger
>
> [1] The Alex User's Guide may be found at this URL:
> http://www.haskell.org/alex/doc/html/
>
> [2] The Happy User's Guide may be found at this URL:
> http://www.haskell.org/happy/
>
>
>
> ------------------------------
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org<mailto:Beginners at haskell.org>
> http://www.haskell.org/mailman/listinfo/beginners
>
>
> End of Beginners Digest, Vol 60, Issue 38
> *****************************************
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://www.haskell.org/pipermail/beginners/attachments/20130628/50dc9101/attachment.htm
> >
>
> ------------------------------
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
> End of Beginners Digest, Vol 60, Issue 40
> *****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130630/d8ab1526/attachment-0001.htm>


More information about the Beginners mailing list