[Haskell-cafe] Loading a csv file with ~200 columns into Haskell Record
Neil Mayhew
neil_mayhew at users.sourceforge.net
Mon Oct 2 03:24:26 UTC 2017
On 2017-10-01 07:55 PM, Guru Devanla wrote:
> Having to not have something which I can quickly start off on troubles
> me and makes me wonder if the reason is my lack of understanding or
> just the pain of using static typing.
Something, somewhere has to keep track of the type of each column, and
since the data doesn’t have that itself you have to store it somewhere
else. That could be in another data file of some kind which would be
loaded at runtime, but then you would lose the benefit of static type
checking by the compiler. So it’s better to have it in source code, even
if that’s generated by TH or some other process.
I recommend taking a look at the Cassava library. You can do some pretty
neat things with it, including defining your own mapping from rows to
records. In particular, if you need only a small subset of the 100
columns, you can provide a (de)serializer that looks at only the columns
it needs. The library reads the row into a vector of Text, and your
serialization code works with just the elements it needs. You could even
have different record types (and associated serializers) for different
tasks, all working off the same input record, since the serialization
methods are from a typeclass and each record type can be a different
instance of the class.
Cassava supports Applicative, which makes for some very succinct code,
and it can make use of a header record at the start of the data. Here’s
an example:
|data Account = Business | Visa | Personal | Cash | None deriving (Eq,
Ord, Show, Read, Enum, Bounded) instance FromField Account where
parseField f | f == "B" = pure Business | f == "V" = pure Visa | f ==
"P" = pure Personal | f == "C" = pure Cash | f == "CC" = pure Visa | f
== "" = pure None | otherwise = fail $ "Invalid account type: \"" ++
B.unpack f ++ "\"" instance ToField Account where toField Business = "B"
toField Visa = "V" toField Personal = "P" toField Cash = "C" toField
None = "" type Money = Centi data Transaction = Transaction { date ::
Day , description :: Text , category :: Text , account :: Account ,
debit :: Maybe Money , credit :: Maybe Money , business :: Money , visa
:: Money , personal :: Money , cash :: Money } deriving (Eq, Ord, Show,
Read) instance FromNamedRecord Transaction where parseNamedRecord r =
Transaction <$> r .: "Date" <*> r .: "Description" <*> r .: "Category"
<*> r .: "Account" <*> r .: "Debit" <*> r .: "Credit" <*> r .:
"Business" <*> r .: "Visa" <*> r .: "Personal" <*> r .: "Cash" instance
ToNamedRecord Transaction where toNamedRecord r = namedRecord [ "Date"
.= date r, "Description" .= description r, "Category" .= category r,
"Account" .= account r, "Debit" .= debit r, "Credit" .= credit r,
"Business" .= business r, "Visa" .= visa r, "Personal" .= personal r,
"Cash" .= cash r] |
Note that the code doesn’t assume fixed positions for the different
columns, nor a total number of columns in a row, because it indirects
through the column headers. There could be 1000 columns and the code
wouldn’t care.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20171001/b97ce622/attachment-0001.html>
More information about the Haskell-Cafe
mailing list