I/O overhead in opening and writing files

Johan Tibell johan.tibell at gmail.com
Mon Aug 27 22:48:27 CEST 2012


On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 at hotmail.com> wrote:
> I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
> filesystem. This involves creating lots of little files. Unfortunately, it
> seems that Haskell's performance in this area is not comparable to that of
> C. I assume that this is because of the overhead involved in opening and
> closing files. Some cursory profiling confirmed this: most of the runtime of
> the program is in taken by openFile, hPutStr, and hClose.
>
> I thought that it might be faster to call the C library functions exposed as
> foreign imports in System.Posix.Internals, and thereby cut out some of
> Haskell's overhead. This indeed improved performance, but the program is
> still nearly twice as slow as the corresponding C program.
>
> I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
> filesystem, and write an integer into each of them. I did this in C, using
> the open; and twice in Haskell, using openFile and c_open. Here are the
> results:
>
> C program, using open and friends (gcc 4.4.3)
> real    0m4.614s
> user    0m0.380s
> sys     0m4.200s
>
> Haskell, using System.IO.openFile and friends (ghc 7.4.2)
> real    0m14.892s
> user    0m7.700s
> sys     0m6.890s
>
> Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
> real    0m7.372s
> user    0m2.390s
> sys     0m4.570s
>
> Why question is: why is this so slow? Could the culprit be the marshaling
> necessary to pass the parameters to the foreign functions? If I'm calling
> the low-level function c_open anyway, shouldn't performance be closer to C?
> Does anyone have suggestions for how to improve this?
>
> If anyone is interested, I can provide the code I used for these benchmarks.

Please do. You can paste them at http://hpaste.org/

Could you try using the Data.ByteString API. I don't have the code in
front of me so I don't know if the System.Posix API uses Strings. If
it does, that's most likely the issue.

-- Johan



More information about the Glasgow-haskell-users mailing list