[Haskell-cafe] types for parsing a tree
Jared Jennings
jjenning at gmail.com
Fri Sep 10 12:53:10 EDT 2010
Dear haskell-cafe:
I'm trying to parse an Open Financial eXchange (OFX) 1.x file. It
details my bank transactions, like debit card purchases. It's
SGML-based and it goes like:
<OFX>[...]
<STMTRS>[...]
<STMTTRN>[...]
<TRNUID>9223ry29r389
<NAME>THE GROCERY STORE BLABLABLA
<TRNAMT>234.99
</STMTTRN>
<STMTTRN>[...]
<TRNUID>1237tg832t
<NAME>SOME DUDE ON PAYPAL 4781487
<TRNAMT>2174.27
</STMTTRN>
</STMTRS>
</OFX>
I've left out a bunch, but as you can see it's tree-shaped, and the
only reason they didn't misuse XML as a data serialization language
instead of SGML was because it wasn't popular yet. (OFX 2.x uses XML
but my bank doesn't use OFX 2.x.)
When I imagine how to put this into a data structure, I think:
-- The '...' below is stuff like the date, info about the bank
data OFX = OFX { statement :: StatementResponse, ... }
-- The '...' below is stuff like the account number
data StatementResponse = StatementResponse { transactions:
[Transaction], ... }
data Transaction = Transaction { id :: String, name :: String,
amount :: Decimal, sic :: Maybe Int, ... }
Then I tried to make a parser to emit those data types and failed. I
come from Python, where there's no problem if a function returns
different types of values depending on its inputs, but that doesn't
fly in Haskell.
I've tried
data OFXThing = OFX { statement :: OFXThing } | StatementResponse
{ ... transactions :: [OFXThing] }
but that would let me make trees of things that make no sense in OFX,
like a transaction containing a statement.
I made a
data Tree k v = Branch k [Tree k v] | Leaf k v
type TextTree = Tree String String
and a tagsoup-parsec parser that returns Branches for tags like OFX,
and Leafs for tags like TRNUID. But now I just have a tree of strings.
That holds no useful type information.
I want my types to say that OFXes contain statements and statements
contain transactions - just like the OFX DTD says. How can I construct
the types so that they are tight enough to be meaningful and loose
enough that it's possible to write functions that emit them?
More information about the Haskell-Cafe
mailing list