I/O overhead in opening and writing files

Don Stewart dons00 at gmail.com
Tue Aug 28 00:30:38 CEST 2012


Why are you using Show?

bytestring-show might be an option.

Remember: for speed, don't convert between String types.

Consider mmap-bytestring too.

On Monday, August 27, 2012, J Baptist wrote:

>  Using ByteStrings and the C calls does indeed speed things up a bit, but
> not much.
>
> real 0m6.053s
> user 0m1.480s
> sys 0m4.550s
>
> For your interest:
> The original version (with Strings and openFile): http://hpaste.org/73803
> Faster (with Strings and c_open): http://hpaste.org/73802
> Even faster (with ByteStrings and c_open): http://hpaste.org/73801
>
> The problem may be that even with ByteStrings, we are stuck using show,
> and thus Strings, at some point.
>
> Ideas?
>
>
> > From: johan.tibell at gmail.com <javascript:_e({}, 'cvml',
> 'johan.tibell at gmail.com');>
> > Date: Mon, 27 Aug 2012 13:48:27 -0700
> > Subject: Re: I/O overhead in opening and writing files
> > To: arc38813 at hotmail.com <javascript:_e({}, 'cvml',
> 'arc38813 at hotmail.com');>
> > CC: glasgow-haskell-users at haskell.org <javascript:_e({}, 'cvml',
> 'glasgow-haskell-users at haskell.org');>
> >
> > On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 at hotmail.com<javascript:_e({}, 'cvml', 'arc38813 at hotmail.com');>>
> wrote:
> > > I'm looking into high-performance I/O, particularly on a tmpfs
> (in-memory)
> > > filesystem. This involves creating lots of little files.
> Unfortunately, it
> > > seems that Haskell's performance in this area is not comparable to
> that of
> > > C. I assume that this is because of the overhead involved in opening
> and
> > > closing files. Some cursory profiling confirmed this: most of the
> runtime of
> > > the program is in taken by openFile, hPutStr, and hClose.
> > >
> > > I thought that it might be faster to call the C library functions
> exposed as
> > > foreign imports in System.Posix.Internals, and thereby cut out some of
> > > Haskell's overhead. This indeed improved performance, but the program
> is
> > > still nearly twice as slow as the corresponding C program.
> > >
> > > I took some benchmarks. I wrote a program to create 500.000 files on a
> tmpfs
> > > filesystem, and write an integer into each of them. I did this in C,
> using
> > > the open; and twice in Haskell, using openFile and c_open. Here are the
> > > results:
> > >
> > > C program, using open and friends (gcc 4.4.3)
> > > real 0m4.614s
> > > user 0m0.380s
> > > sys 0m4.200s
> > >
> > > Haskell, using System.IO.openFile and friends (ghc 7.4.2)
> > > real 0m14.892s
> > > user 0m7.700s
> > > sys 0m6.890s
> > >
> > > Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
> > > real 0m7.372s
> > > user 0m2.390s
> > > sys 0m4.570s
> > >
> > > Why question is: why is this so slow? Could the culprit be the
> marshaling
> > > necessary to pass the parameters to the foreign functions? If I'm
> calling
> > > the low-level function c_open anyway, shouldn't performance be closer
> to C?
> > > Does anyone have suggestions for how to improve this?
> > >
> > > If anyone is interested, I can provide the code I used for these
> benchmarks.
> >
> > Please do. You can paste them at http://hpaste.org/
> >
> > Could you try using the Data.ByteString API. I don't have the code in
> > front of me so I don't know if the System.Posix API uses Strings. If
> > it does, that's most likely the issue.
> >
> > -- Johan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20120827/e6b9f566/attachment.htm>


More information about the Glasgow-haskell-users mailing list