ghc 5.02.2 FFI question

Bernard James POPE bjpop@cs.mu.OZ.AU
Tue, 9 Apr 2002 21:35:00 +1000 (EST)


Hi Simon,

(posted to glasgow-haskell-users@haskell.org in case anyone else is
reading this).

> I just tried your example and it seems to run in constant space here
> with 5.02.2.  The code looks fine - this isn't something we really
> envisaged people doing with the RTS API, but there's no real problem
> with it except that of course you don't get the benefits of type
> checking.  I'm sure you have very good reasons for building Haskell
> expressions in C :-)

My motivation is debugging. I was trying to implement a similar interface
to the HugsInternals one. I succeeded except the I noticed what looked to
me like a space leak, so I simplified the program a bit.

> Can you give us any more clues?  What were the symptoms when you ran it?

Okay, I've simplified the program even more. Now I just have a program that
calls a C function to build a Double, and then passes a stable pointer
back to it.

For comparison, I also made it possible to avoid calling C, ie just build
a Double in Haskell and make a stable pointer reference to it in Haskell.

My intuition, which maybe very wrong, is that the two ways of executing
should have very similar memory profiles and time performance. They run
in about the same time. Both processes grow bigger over time (as you will
see below), but the one that calls C grows much much faster. I'm presuming
that I can rely on the unix program 'top' to give me reasonably good
results.

First the code, then the profiles:

The Haskell code.

--------------------------------------------------------------------------------

   module Main where

   import Foreign

   -- returns a stable pointer to a Double (489.0923)
   foreign import "leak" leak :: IO (StablePtr Double)

   -- change to callLeak' to compare behaviour
   main = do repeatIO 500000 callLeak

   -- call into C to make a Double
   callLeak :: IO ()
   callLeak = do doubleSPtr <- leak
                 doubleVal  <- deRefStablePtr doubleSPtr
                 freeStablePtr doubleSPtr
                 print doubleVal


   -- don't call into C
   callLeak' :: IO ()
   callLeak' = do doubleSPtr <- newStablePtr (489.0923::Double)
                  doubleVal  <- deRefStablePtr doubleSPtr
                  freeStablePtr doubleSPtr
                  print doubleVal

--------------------------------------------------------------------------------

The C code:

--------------------------------------------------------------------------------

   #include "/home/bjpop/ghc-5.02.2/ghc/includes/Rts.h"
   #include "/home/bjpop/ghc-5.02.2/ghc/includes/RtsAPI.h"
   #include "LeakC.h"

   StgStablePtr leak (void)
   {
           StgClosure *num;
           num = rts_mkDouble(489.0923);
           return (getStablePtr ((StgPtr)(num)));
   }

--------------------------------------------------------------------------------

Okay, now for the tests:

If I compile the program so that it calls to C via the FFI 
then this is what top gives me over 30 second samples:

SIZE RSS  TIME
--------------
1708 1708 0:00
2748 2748 0:30
3772 3772 1:00
4788 4788 1:30
5828 5828 2:00
6812 6812 2:30

The process is definitely growing in size, and it looks linear.

--------------------------------------------------------------------------------

If I compile the program so that it does not call C
(by modifying the definition of main so that it uses callLeak')
then this is what top gives over 30 second samples:

SIZE RSS  TIME
--------------
1692 1692 0:00
1704 1704 0:30
1720 1720 1:00
1736 1736 1:30
1752 1752 2:00
1764 1764 2:30

The process is still growing, but very slowly. I wouldn't expect
it to grow at all, but I don't know the memory management of
ghc well enough to be sure.

--------------------------------------------------------------------------------

I thought I should look at the output from the garbage collector to see
if anything obvious came up. The verbose mode of '+RTS -S -RTS' spat out
lots of stuff as you would guess. I can post a snippet to you if it is of
interest, but I'll wait to see what you say.

The not-verbose output is as follows:

--------------------------------------------------------------------------------

When calling C:

   2,460,833,948 bytes allocated in the heap
     1,911,280 bytes copied during GC
        26,192 bytes maximum residency (1 sample(s))
   
          7500 collections in generation 0 (  1.79s)
             1 collections in generation 1 (  0.00s)
   
             1 Mb total memory in use
   
     INIT  time    0.01s  (  0.00s elapsed)
     MUT   time  160.84s  (410.71s elapsed)
     GC    time    1.79s  (  3.94s elapsed)
     EXIT  time    0.00s  (  0.00s elapsed)
     Total time  162.64s  (414.65s elapsed)
   
     %GC time       1.1%  (1.0% elapsed)
   
     Alloc rate    15,298,936 bytes per MUT second
   
     Productivity  98.9% of total user, 38.8% of total elapsed

--------------------------------------------------------------------------------

When not calling C:

   2,454,983,980 bytes allocated in the heap
     1,911,300 bytes copied during GC
        26,212 bytes maximum residency (1 sample(s))
   
          7500 collections in generation 0 (  1.45s)
             1 collections in generation 1 (  0.00s)
   
             1 Mb total memory in use
   
     INIT  time    0.00s  (  0.00s elapsed)
     MUT   time  162.93s  (444.38s elapsed)
     GC    time    1.45s  (  3.21s elapsed)
     EXIT  time    0.00s  (  0.00s elapsed)
     Total time  164.38s  (447.59s elapsed)
   
     %GC time       0.9%  (0.7% elapsed)
   
     Alloc rate    15,067,722 bytes per MUT second
   
     Productivity  99.1% of total user, 36.4% of total elapsed

--------------------------------------------------------------------------------

Final remarks.

There is little else I can think of mentioning about the performance of the
program. They both produce the same output, in almost exactly the same
amount of time.

When I was dumping the stable pointer table out I did notice something slightly
odd. I wrote a program that allocated a bunch (say 11) stable pointers, and then
deallocated them. I ran this in a big loop, say 100,000 times. The number of
live stable pointers is at most 11, but the table kept growing in size, up to
about 512 slots even though there were at most 11 pointers in it at any one
time. The first time around the loop the pointers were all allocated successively
in the table.  Next time around the loop they had one or two gaps between
some of them. Next time, the gaps were bigger and so on until the gaps got so big
that the table had to grow. I'm not sure if this is the intended behaviour.

At first I thought to myself this might be causing some memory to leak, but I
couldn't find out where. In any case the table seemed to grow logarithmically,
so that growth stopped around 512 slots. 

I am in the process of re-compiling my ghc just in case I accidentally
mucked up something in the garbage collector. I don't think I did because
I haven't been looking in that code. I'll let you know tomorrow if that
makes any difference. I'll also download the latest version of the compiler 
and try again. If that doesn't change anything, I'll try a different version
of gcc.

Thanks for your help. I really hope this isn't a wild goose chase.

Cheers,
Bernie.