I/O overhead in opening and writing files

J Baptist arc38813 at hotmail.com
Tue Aug 28 00:25:20 CEST 2012


Using ByteStrings and the C calls does indeed speed things up a bit, but not much.
real	0m6.053suser	0m1.480ssys	0m4.550s
For your interest:The original version (with Strings and openFile): http://hpaste.org/73803Faster (with Strings and c_open): http://hpaste.org/73802Even faster (with ByteStrings and c_open): http://hpaste.org/73801
The problem may be that even with ByteStrings, we are stuck using show, and thus Strings, at some point.
Ideas?

> From: johan.tibell at gmail.com
> Date: Mon, 27 Aug 2012 13:48:27 -0700
> Subject: Re: I/O overhead in opening and writing files
> To: arc38813 at hotmail.com
> CC: glasgow-haskell-users at haskell.org
> 
> On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 at hotmail.com> wrote:
> > I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
> > filesystem. This involves creating lots of little files. Unfortunately, it
> > seems that Haskell's performance in this area is not comparable to that of
> > C. I assume that this is because of the overhead involved in opening and
> > closing files. Some cursory profiling confirmed this: most of the runtime of
> > the program is in taken by openFile, hPutStr, and hClose.
> >
> > I thought that it might be faster to call the C library functions exposed as
> > foreign imports in System.Posix.Internals, and thereby cut out some of
> > Haskell's overhead. This indeed improved performance, but the program is
> > still nearly twice as slow as the corresponding C program.
> >
> > I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
> > filesystem, and write an integer into each of them. I did this in C, using
> > the open; and twice in Haskell, using openFile and c_open. Here are the
> > results:
> >
> > C program, using open and friends (gcc 4.4.3)
> > real    0m4.614s
> > user    0m0.380s
> > sys     0m4.200s
> >
> > Haskell, using System.IO.openFile and friends (ghc 7.4.2)
> > real    0m14.892s
> > user    0m7.700s
> > sys     0m6.890s
> >
> > Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
> > real    0m7.372s
> > user    0m2.390s
> > sys     0m4.570s
> >
> > Why question is: why is this so slow? Could the culprit be the marshaling
> > necessary to pass the parameters to the foreign functions? If I'm calling
> > the low-level function c_open anyway, shouldn't performance be closer to C?
> > Does anyone have suggestions for how to improve this?
> >
> > If anyone is interested, I can provide the code I used for these benchmarks.
> 
> Please do. You can paste them at http://hpaste.org/
> 
> Could you try using the Data.ByteString API. I don't have the code in
> front of me so I don't know if the System.Posix API uses Strings. If
> it does, that's most likely the issue.
> 
> -- Johan
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20120827/03b05ce6/attachment-0001.htm>


More information about the Glasgow-haskell-users mailing list