openFile and threads

Matthias Neubauer neubauer@informatik.uni-freiburg.de
13 Jan 2003 17:47:14 +0100


"Simon Marlow" <simonmar@microsoft.com> writes:

> > > You might consider bypassing the Handle interface and going 
> > to the bare
> > > metal using the Posix library, which will cut down on the 
> > overhead in
> > > openFile.
> > 
> > That's what I was fearing. Is the conversion from Haskell Strings to
> > C strings a performance problem?
> 
> Haskell Strings are a common performance bottleneck; for example when
> serving files in the Haskell web server I avoided the conversion to
> Haskell Strings altogether by reading/writing arrays of bytes (see the
> paper for details).

I was curious to see if this is also the case here. Therefore I just
pasted the GHC implementation of openFile into Peter's suspicious
module ('openFile' obtained from
http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/base/GHC/Handle.hs---I
hope this was the right one?) to be able to also profile the GHC
internal openfile code. Here are the relevant parts of the resulting
output of the profiler:

COST CENTRE                    MODULE               %time %alloc

withCString'                   MailStore             39.1   19.7
f1                             MailStore             26.1   40.9
f9                             MailStore             21.7    8.8
getBuffer                      MailStore              4.3    0.1
f6.2                           MailStore              4.3    4.0
f6                             MailStore              4.3    2.3
f6.3                           MailStore              0.0    1.6
allocateBuffer                 MailStore              0.0   19.4

...
COST CENTRE              MODULE                 no. entries %time %alloc  %time %alloc

        f6.1             MailStore               361     0   0.0    0.1    43.5   41.2
         openFile        MailStore               362  1154   0.0    0.1    43.5   41.1
          openFile'      MailStore               365  1154   0.0    0.0    43.5   40.9
           withCString'  MailStore               367     0  39.1   19.7    39.1   19.7
             openFd      MailStore               371  1154   0.0    0.7     4.3   20.9
              mkFileHandle MailStore             372  1154   0.0    0.3     4.3   20.2
               initBufferState MailStore         387  1154   0.0    0.0     0.0    0.0
               newFileHandle MailStore           376  1154   0.0    0.1     0.0    0.3
                handleFinalizer MailStore        377     0   0.0    0.1     0.0    0.2
                 flushWriteBufferOnly MailStore  389  1154   0.0    0.0     0.0    0.0
               getBuffer MailStore               373  1154   4.3    0.1     4.3   19.6
                allocateBuffer MailStore         374  1154   0.0   19.4     0.0   19.5
                 newEmptyBuffer MailStore        375     0   0.0    0.1     0.0    0.1

...

The cost centre "f6.1" is the location of the recurring call of
"openFile". As you can see almost all of the time is spent in the
function "withCString" translating Haskell strings representing the
file names to the C representation. 

I knew that Haskell strings are bad, but I really did not expect them
to cause such a huge time penalty ...

Cheers,

Matthias

-- 
Matthias Neubauer                                       |
Universität Freiburg, Institut für Informatik           | tel +49 761 203 8060
Georges-Köhler-Allee 79, 79110 Freiburg i. Br., Germany | fax +49 761 203 8052