[Haskell-cafe] Ideas on a fast and tidy CSV library

Johan Tibell johan.tibell at gmail.com
Wed Aug 21 14:14:28 CEST 2013


As I mentioned, you want to use the Streaming (or Incremental) module.
As the program now stands the call to `decode` causes 1.5 GB of CSV
data to be read as a `Vector (Vector Int)` before any encoding starts.

-- Johan


On Wed, Aug 21, 2013 at 1:09 PM, Justin Paston-Cooper
<paston.cooper at gmail.com> wrote:
> Dear All,
>
> I now have some example code. I have put it on: http://pastebin.com/D9MPmyVd
> .
>
> vectorBinner is simply of type Vector Int -> Int. I am inputting a 1.5GB CSV
> on stdin, and would like vectorBinner to run over every single record,
> outputting results as computed, thus running in constant memory. My
> programme instead quickly approaches full memory use. Is there any way to
> work around this?
>
> Justin
>
>
> On 25 July 2013 17:53, Johan Tibell <johan.tibell at gmail.com> wrote:
>>
>> You can use the Incremental or Streaming modules to get more fine
>> grained control over when new parsed records are produced.
>>
>> On Thu, Jul 25, 2013 at 11:02 AM, Justin Paston-Cooper
>> <paston.cooper at gmail.com> wrote:
>> > I hadn't yet tried profiling the programme. I actually deleted it a few
>> > days
>> > ago. I'm going to try to get something new running, and I will report
>> > back.
>> > On a slightly less related track: Is there any way to use cassava so
>> > that I
>> > can have pure state and also yield CSV lines while my computation is
>> > running
>> > instead of everything at the end as would be with the State monad?
>> >
>> >
>> > On 23 July 2013 22:13, Johan Tibell <johan.tibell at gmail.com> wrote:
>> >>
>> >> On Tue, Jul 23, 2013 at 5:45 PM, Ben Gamari <bgamari.foss at gmail.com>
>> >> wrote:
>> >> > Justin Paston-Cooper <paston.cooper at gmail.com> writes:
>> >> >
>> >> >> Dear All,
>> >> >>
>> >> >> Recently I have been doing a lot of CSV processing. I initially
>> >> >> tried
>> >> >> to
>> >> >> use the Data.Csv (cassava) library provided on Hackage, but I found
>> >> >> this to
>> >> >> still be too slow for my needs. In the meantime I have reverted to
>> >> >> hacking
>> >> >> something together in C, but I have been left wondering whether a
>> >> >> tidy
>> >> >> solution might be possible to implement in Haskell.
>> >> >>
>> >> > Have you tried profiling your cassava implementation? In my
>> >> > experience
>> >> > I've found it's quite quick. If you have an example of a slow path
>> >> > I'm
>> >> > sure Johan (cc'd) would like to know about it.
>> >>
>> >> I'm always interested in examples of code that is not running fast
>> >> enough. Send me a reproducible example (preferably as a bug on the
>> >> GitHub bug tracker) and I'll take a look.
>> >
>> >
>
>




More information about the Haskell-Cafe mailing list