[Haskell-cafe] Ideas on a fast and tidy CSV library

Tue Jul 23 16:44:25 CEST 2013

Dear All,

Recently I have been doing a lot of CSV processing. I initially tried to
use the Data.Csv (cassava) library provided on Hackage, but I found this to
still be too slow for my needs. In the meantime I have reverted to hacking
something together in C, but I have been left wondering whether a tidy
solution might be possible to implement in Haskell.

I would like to build a library that satisfies the following:

1) Run a function <<f :: a_1 -> ... -> a_n -> m (Maybe (b_1, ..., b_n))>>,
with <<m>> some monad and the <<a>>s and <<b>>s being input and output.

2) Be able to specify a maximum record string length and output record
string length, so that the string buffers used for reading and outputting
lines can be reused, preventing the need for allocating new strings for
each record.

3) Allocate only once, the memory where the parsed input values, and output
values are put.

4) The library's main function should take some kind of data structure
describing the types of the function, the function itself and the filenames
of input and output (could also be stdin/stdout).

I am not sure yet what would be that best value of <<m>>. I would like to
most importantly efficiently, and if possible, purely allow changes in
state to a number of variables, such as an aggregation over a certain field
in the input. I do not currently have knowledge of the FFI, and how it
might be used in this case. I would appreciate any suggestions as to where
I should look further.

Regards,

Justin Paston-Cooper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130723/575144ce/attachment.htm>