[Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

John Lato jwlato at gmail.com
Fri Mar 8 09:48:59 CET 2013


I'd like to point out that it's entirely possible to get good performance
out of a handle.     The iteratee package has had both FD and Handle-based
IO for a while, and I've never observed any serious performance differences
between the two.  Also, if I may be so bold, Michael's supercharged copy
speeds are on par with iteratee's performance using Handles:
http://www.tiresiaspress.us/io-benchmarks.html

So while there's definitely something interesting going on here, I think it
needs a bit more investigation before suggesting that Handles should be
avoided.

For comparison, on my system I get
$ time cp input.dat output.dat

real 0m0.004s
user 0m0.000s
sys 0m0.000s

so the throughput observed on the faster times is entirely reasonable.

John L.


On Fri, Mar 8, 2013 at 4:36 PM, Gregory Collins <greg at gregorycollins.net>wrote:

> +Simon Marlow
> A couple of comments:
>
>    - maybe we shouldn't back the file by a Handle. io-streams does this
>    by default out of the box; I had a posix file interface for unix (guarded
>    by CPP) for a while but decided to ditch it for simplicity. If your results
>    are correct, given how slow going by Handle seems to be I may revisit this,
>    I figured it would be "good enough".
>    - io-streams turns Handle buffering off in withFileAsOutput. So the
>    difference shouldn't be as a result of buffering. Simon: is this an
>    expected result? I presume you did some Handle debugging?
>    - the IO manager should not have any bearing here because file code
>    doesn't actually ever use it (epoll() doesn't work for files)
>    - does the difference persist when the file size gets bigger?
>    - your file descriptor code doesn't handle EINTR properly, although
>    you said you checked that the file copy is being done?
>    - Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
>    methods have a more believable ~70MB/s throughput.
>
> G
>
>
> On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman <michael at snoyman.com>wrote:
>
>> Hi all,
>>
>> I'm turning to the community for some help understanding some benchmark
>> results[1]. I was curious to see how the new io-streams would work with
>> conduit, as it looks like a far saner low-level approach than Handles. In
>> fact, the API is so simple that the entire wrapper is just a few lines of
>> code[2].
>>
>> I then added in some basic file copy benchmarks, comparing conduit+Handle
>> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
>> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
>> and conduit+io-streams taking a slight lead. (I haven't analyzed that
>> enough to know if it means anything, however.)
>>
>> Then I decided to pull up the NoHandle code I wrote a while ago for
>> conduit. This code was written initially for Windows only, to work around
>> the fact that System.IO.openFile does some file locking. To avoid using
>> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
>> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
>> curiosity, I decided to expose it and include it in the benchmark.
>>
>> The results are extreme. I've confirmed multiple times that the copy
>> algorithm is in fact copying the file, so I don't think the test itself is
>> cheating somehow. But I don't know how to explain the massive gap. I've run
>> this on two different systems. The results you see linked are from my local
>> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
>> code was still 75% faster than the others.
>>
>> My initial guess is that I'm not properly tying into the IO manager, but
>> I wanted to see if the community had any thoughts. The relevant pieces of
>> code are [3][4][5].
>>
>> Michael
>>
>> [1] http://static.snoyman.com/streams.html
>> [2]
>> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
>> [3]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
>> [4]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
>> [5]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
>
>
> --
> Gregory Collins <greg at gregorycollins.net>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130308/8bf2a57d/attachment.htm>


More information about the Haskell-Cafe mailing list