[Haskell-cafe] file splitter with enumerator package

Eric Rasmussen ericrasmussen at gmail.com
Fri Jul 22 20:41:12 CEST 2011


Hi everyone,

A friend of mine recently asked if I knew of a utility to split a
large file (4gb in his case) into arbitrarily-sized files on Windows.
Although there are a number of file-splitting utilities, the catch was
it couldn't break in the middle of a line. When the standard "why
don't you use Linux?" response proved unhelpful, I took this as an
opportunity to write my first program using the enumerator package.

If anyone has time, I'm really interested in knowing if there's a
better way to take the incoming stream and output it directly to a
file. The basic steps I'm taking are:

1) Data.Enumerator.Binary.take -- grabs the user-specified number of
bytes, then (because it returns a lazy ByteString) I use
Data.ByteString.Lazy.hPut to output the chunk
2) Data.Enumerator.Binary.head -- after using take for the big chunk,
it inspects and outputs individual characters and stops after it
outputs the next newline character
3) I close the handle that steps 1&2 used to output the data and then
repeat 1&2 with the next handle (an infinite lazy list of filepaths
like part1.csv, part2.csv, and so on)

The full code is pasted here: http://hpaste.org/49366, and while I'd
like to get any other feedback on how to make it better, I want to
note that I'm not planning to release this as a utility so I wouldn't
want anyone to spend extra time performing a full code review.

Thanks!
Eric



More information about the Haskell-Cafe mailing list