[Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages

Tim Holland th0114nd at gmail.com
Fri Jun 28 18:18:30 CEST 2013


Hi Roger,

I realize you've already finished with the project, but for the future I
think its a lot easier to use a parser combinator with Text.Parsec and
Text.Parsec.String to  do a similar thing. For example, if you were parsing
XML to get a parse a single tag, you would try something like this:

parseTag :: Parser Tag
parseTag = many1 alphanum <?> "tag"

To get a tagged form, try
parseTagged :: Parser (Tag, [Elem])
parseTagged = do
  char '<'
  name <- parseTag
  char '>'
  content <- many (try parseElem)
  string "</"
  parseTag
  char '>'
  return (name, content)
  <?> "tagged form"

and so one. I haven't tried this out, but a parser similar to yours would
go something like this:

--Datatypes
type DisplayName = String
type EmailAddress = String
data Mailbox = Mailbox DisplayName EmailAddress deriving (Show)

parseFromHeader :: Parser [Mailbox]
parseFromHeader = do
  string "From: "
  mailboxes = many (try parseMailbox)
  return mailboxes

parseMailbox :: Parser Mailbox
parseMailbox = do
  parseComments
  -- Names are optional
  parseComments
  name <- try parseDisplayName
  parseComments
  address <- parseEmailAddress
  parseComments
  try char ','
  return Mailbox name address
  <?> "Parse an indidivuals mailbox"

parseEmailAddress :: Parser EmailAddress
parseEmailAddress = do
  try char '<'
  handle <- many1 (noneof "@") -- Or whatever is valid here
  char '@'
  domain <- parseDomain
  try char '<'
  return handle++ at ++domain

parseDomain :: Parser String
parseDomain =
  (char '[' >> parseDomain >>= (\domainName -> do char ']'
    return domainName))
<|> parseWebsiteName >>= return

And so on. Again, I've tested none of the Email header bits but the XML bit
works. It requires some level of comfort with monadic operations, but
beyond that I think it's a much simpler may to parse.

Regards,
Tim Holland





On 28 June 2013 03:00, <beginners-request at haskell.org> wrote:

> Send Beginners mailing list submissions to
>         beginners at haskell.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.haskell.org/mailman/listinfo/beginners
> or, via email, send a message with subject or body 'help' to
>         beginners-request at haskell.org
>
> You can reach the person managing the list at
>         beginners-owner at haskell.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Beginners digest..."
>
>
> Today's Topics:
>
>    1.  data declaration using other type's names? (Patrick Redmond)
>    2. Re:  data declaration using other type's names? (Brandon Allbery)
>    3. Re:  data declaration using other type's names? (Nikita Danilenko)
>    4. Re:  what to do about excess memory usage (Chadda? Fouch?)
>    5. Re:  what to do about excess memory usage (James Jones)
>    6.  How to Lex, Parse,       and Serialize-to-XML email messages
>       (Costello, Roger L.)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 27 Jun 2013 11:24:51 -0400
> From: Patrick Redmond <plredmond at gmail.com>
> Subject: [Haskell-beginners] data declaration using other type's
>         names?
> To: beginners at haskell.org
> Message-ID:
>         <CAHUea4FfBP8L1kU+tS1-2cVPvAB4h22j35JcNwRC-jGds0=
> v6g at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hey Haskellers,
>
> I noticed that ghci lets me do this:
>
> > data Foo = Int Int | Float
> > :t Int
> Int :: Int -> Foo
> > :t Float
> Float :: Foo
> > :t Int 4
> Int 4 :: Foo
>
> It's confusing to have type constructors that use names of existing
> types. It's not intuitive that the name "Int" could refer to two
> different things, which brings me to:
>
> > data Bar = Bar Int
> > :t Bar
> Bar :: Int -> Bar
>
> Yay? I can have a simple type with one constructor named the same as the
> type.
>
> Why is this allowed? Is it useful somehow?
>
> --Patrick
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 27 Jun 2013 11:37:46 -0400
> From: Brandon Allbery <allbery.b at gmail.com>
> Subject: Re: [Haskell-beginners] data declaration using other type's
>         names?
> To: The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell <beginners at haskell.org>
> Message-ID:
>         <
> CAKFCL4U-E4B_+cts0vpNX8Ar9wccQDjgzWOYHLXLsLAv+Qn_cg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Thu, Jun 27, 2013 at 11:24 AM, Patrick Redmond <plredmond at gmail.com
> >wrote:
>
> > I noticed that ghci lets me do this:
> >
>
> Not just ghci, but ghc as well.
>
>
> > Yay? I can have a simple type with one constructor named the same as the
> > type.
> > Why is this allowed? Is it useful somehow?
> >
>
> It's convenient for pretty much the situation you showed, where the type
> constructor and data constructor have the same name. A number of people do
> advocate that it not be used, though, because it can be confusing for
> people. (Not for the compiler; data and type constructors can't be used in
> the same places, it never has trouble keeping straight which is which.)
>
> It might be best to consider this as "there is no good reason to *prevent*
> it from happening, from a language standpoint".
>
> --
> brandon s allbery kf8nh                               sine nomine
> associates
> allbery.b at gmail.com
> ballbery at sinenomine.net
> unix, openafs, kerberos, infrastructure, xmonad
> http://sinenomine.net
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://www.haskell.org/pipermail/beginners/attachments/20130627/ea0e9cc5/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 27 Jun 2013 18:02:00 +0200
> From: Nikita Danilenko <nda at informatik.uni-kiel.de>
> Subject: Re: [Haskell-beginners] data declaration using other type's
>         names?
> To: beginners at haskell.org
> Message-ID: <51CC61F8.9020506 at informatik.uni-kiel.de>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi, Patrick,
>
> the namespaces for types and constructors are considered disjoint, i.e.
> you can use a name in both contexts. A simple example of this feature is
> your last definition
>
> > data Bar = Bar Int
>
> or even shorter
>
> > data A = A
>
> This is particularly useful for single-constructor types ? la
>
> > data MyType a = MyType a
>
> Clearly, using "Int" or "Float" as constructor names may seem odd, but
> when dealing with a simple grammar it is quite natural to write
>
> > data Exp = Num Int | Add Exp Exp
>
> although "Num" is a type class in Haskell.
>
> Best regards,
>
> Nikita
>
> On 27/06/13 17:24, Patrick Redmond wrote:
> > Hey Haskellers,
> >
> > I noticed that ghci lets me do this:
> >
> >> data Foo = Int Int | Float
> >> :t Int
> > Int :: Int -> Foo
> >> :t Float
> > Float :: Foo
> >> :t Int 4
> > Int 4 :: Foo
> >
> > It's confusing to have type constructors that use names of existing
> > types. It's not intuitive that the name "Int" could refer to two
> > different things, which brings me to:
> >
> >> data Bar = Bar Int
> >> :t Bar
> > Bar :: Int -> Bar
> >
> > Yay? I can have a simple type with one constructor named the same as the
> type.
> >
> > Why is this allowed? Is it useful somehow?
> >
> > --Patrick
> >
> > _______________________________________________
> > Beginners mailing list
> > Beginners at haskell.org
> > http://www.haskell.org/mailman/listinfo/beginners
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 27 Jun 2013 18:23:25 +0200
> From: Chadda? Fouch? <chaddai.fouche at gmail.com>
> Subject: Re: [Haskell-beginners] what to do about excess memory usage
> To: The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell <beginners at haskell.org>
> Message-ID:
>         <
> CANfjZRbGTvoECTMsriNDAUozbow1fUGt-9FRtG-XwRJ+DamiAw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> First 2MB isn't a lot of RAM nowadays, do you mean 2GB or is that just
> compared to the rest of the program ?
> Second, your powersOfTen should probably be :
>
> > powersOfTen = iterate (10*) 1
>
> Or maybe even a Vector (if you can guess the maximum value asked of it) or
> a MemoTrie (if you can't) since list indexing is slow as hell.
> That could help with memoPair which should definitely be a Vector and not a
> list.
>
> Good luck (on the other hand, maybe your program is already "good enough"
> and you could just switch to another project)
> --
> Jedai
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://www.haskell.org/pipermail/beginners/attachments/20130627/f2da75ff/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 5
> Date: Thu, 27 Jun 2013 18:28:27 -0500
> From: James Jones <jejones3141 at gmail.com>
> Subject: Re: [Haskell-beginners] what to do about excess memory usage
> To: The Haskell-Beginners Mailing List - Discussion of primarily
>         beginner-level topics related to Haskell <beginners at haskell.org>
> Message-ID: <51CCCA9B.40807 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 06/27/2013 11:23 AM, Chadda? Fouch? wrote:
> > First 2MB isn't a lot of RAM nowadays, do you mean 2GB or is that just
> > compared to the rest of the program ?
>
> It's a lot compared to the rest of the program... not to mention that
> I'm a fossil from the days of 8-bit microprocessors, so 2 MB seems like
> a lot of RAM to me. :)
>
> > Second, your powersOfTen should probably be :
> >
> > > powersOfTen = iterate (10*) 1
> >
> > Or maybe even a Vector (if you can guess the maximum value asked of
> > it) or a MemoTrie (if you can't) since list indexing is slow as hell.
> > That could help with memoPair which should definitely be a Vector and
> > not a list.
>
> Thanks!
> >
> > Good luck (on the other hand, maybe your program is already "good
> > enough" and you could just switch to another project)
> > --
> > Jedai
> >
> I do want to find a better way to keep the list of positions for ones
> around than a [Int], and I want to save them only as long as I need to,
> i.e. until I have both the 2 * k and 2 * k + 1 digit palindromes. Once
> that's done, I will move on. Thanks again!
>
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 28 Jun 2013 09:30:30 +0000
> From: "Costello, Roger L." <costello at mitre.org>
> Subject: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML
>         email messages
> To: "beginners at haskell.org" <beginners at haskell.org>
> Message-ID:
>         <B5FEE00B53CF054AA8439027E8FE17751EFA9005 at IMCMBX04.MITRE.ORG>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Folks,
>
> I am working toward being able to input any email message and output an
> equivalent XML encoding.
>
> I am starting small, with one of the email headers -- the "From Header"
>
> Here is an example of a From Header:
>
>         From: John Doe <john at doe.org>
>
> I have successfully transformed it into this XML:
>
>         <From>
>             <Mailbox>
>                 <DisplayName>John Doe</DisplayName>
>                 <Address>john at doe.org</Address>
>             </Mailbox>
>         </From>
>
> I used the lexical analyzer "Alex" [1] to break apart (tokenize) the From
> Header.
>
> I used the parser "Happy" [2] to process the tokens and generate a parse
> tree.
>
> Then I used a serializer to walk the parse tree and output XML.
>
> I posted to stackoverflow a complete description of how to lex, parse, and
> serialize-to-XML email From Headers:
>
>
> http://stackoverflow.com/questions/17354442/how-to-lex-parse-and-serialize-to-xml-email-messages-using-alex-and-happy
>
> /Roger
>
> [1] The Alex User's Guide may be found at this URL:
> http://www.haskell.org/alex/doc/html/
>
> [2] The Happy User's Guide may be found at this URL:
> http://www.haskell.org/happy/
>
>
>
> ------------------------------
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
> End of Beginners Digest, Vol 60, Issue 38
> *****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130628/12ce0ba1/attachment-0001.htm>


More information about the Beginners mailing list