[Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

Gregory Collins greg at gregorycollins.net
Fri Mar 8 09:36:27 CET 2013


+Simon Marlow
A couple of comments:

   - maybe we shouldn't back the file by a Handle. io-streams does this by
   default out of the box; I had a posix file interface for unix (guarded by
   CPP) for a while but decided to ditch it for simplicity. If your results
   are correct, given how slow going by Handle seems to be I may revisit this,
   I figured it would be "good enough".
   - io-streams turns Handle buffering off in withFileAsOutput. So the
   difference shouldn't be as a result of buffering. Simon: is this an
   expected result? I presume you did some Handle debugging?
   - the IO manager should not have any bearing here because file code
   doesn't actually ever use it (epoll() doesn't work for files)
   - does the difference persist when the file size gets bigger?
   - your file descriptor code doesn't handle EINTR properly, although you
   said you checked that the file copy is being done?
   - Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
   methods have a more believable ~70MB/s throughput.

G


On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman <michael at snoyman.com> wrote:

> Hi all,
>
> I'm turning to the community for some help understanding some benchmark
> results[1]. I was curious to see how the new io-streams would work with
> conduit, as it looks like a far saner low-level approach than Handles. In
> fact, the API is so simple that the entire wrapper is just a few lines of
> code[2].
>
> I then added in some basic file copy benchmarks, comparing conduit+Handle
> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
> and conduit+io-streams taking a slight lead. (I haven't analyzed that
> enough to know if it means anything, however.)
>
> Then I decided to pull up the NoHandle code I wrote a while ago for
> conduit. This code was written initially for Windows only, to work around
> the fact that System.IO.openFile does some file locking. To avoid using
> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
> curiosity, I decided to expose it and include it in the benchmark.
>
> The results are extreme. I've confirmed multiple times that the copy
> algorithm is in fact copying the file, so I don't think the test itself is
> cheating somehow. But I don't know how to explain the massive gap. I've run
> this on two different systems. The results you see linked are from my local
> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
> code was still 75% faster than the others.
>
> My initial guess is that I'm not properly tying into the IO manager, but I
> wanted to see if the community had any thoughts. The relevant pieces of
> code are [3][4][5].
>
> Michael
>
> [1] http://static.snoyman.com/streams.html
> [2]
> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
> [3]
> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
> [4]
> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
> [5]
> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


-- 
Gregory Collins <greg at gregorycollins.net>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130308/370c5aa1/attachment.htm>


More information about the Haskell-Cafe mailing list