jhc vs ghc and the surprising result involving ghc generated assembly.

skaller skaller at users.sourceforge.net
Tue Nov 1 12:50:07 EST 2005


On Tue, 2005-11-01 at 17:30 +0100, Florian Weimer wrote:

> > use C control constructs rather than gotos.
> 
> With GCC version 4, this will have no effect because the gimplifier
> converts everything to goto-style anyway.

Felix generates C with gotos. The result is FASTER
than native C using gcc 4.0 on x86_64.

http://felix.sourceforge.net/current/speed/en_flx_perf_0005.html

C code:

     4: int Ack(int M, int N) {
     5:   if (M==0) return N +1;
     6:   else if(N==0) return Ack(M-1,1);
     7:   else return Ack(M-1, Ack(M,N-1));
     8: }
     9: 

It is known gcc is correctly optimising tail calls here.

Felix:

     5: fun ack(x:int,y:int):int =>
     6:   if x == 0 then y + 1
     7:   elif y == 0 then ack(x-1, 1)
     8:   else ack(x-1, ack(x, y-1))
     9:   endif
    10: ;


Felix generated C(++) code -- compiled with same options:

int FLX_REGPARM _i1860_f1301_ack( 
int _i1864_v1303_x, int _i1865_v1304_y)
{
  _us2 _i1867_v1799_ack_mv_74;
  _us2 _i1868_v1821_ack_mv_84;
start_1828:;
  _i1867_v1799_ack_mv_74 = _i1864_v1303_x==0 ;
  if(!(_i1867_v1799_ack_mv_74==1))goto _1797;
  return _i1865_v1304_y+1 ;
_1797:;
  _i1868_v1821_ack_mv_84 = _i1865_v1304_y==0 ;
  if(!(_i1868_v1821_ack_mv_84==1))goto _1798;
  _i1865_v1304_y = 1;
  _i1864_v1303_x = _i1864_v1303_x-1 ;
  goto start_1828;
_1798:;
  _i1865_v1304_y = _i1860_f1301_ack(_i1864_v1303_x, _i1865_v1304_y-1 );
  _i1864_v1303_x = _i1864_v1303_x-1 ;
  goto start_1828;
}

[The FLX_REGPARM says __attribute__((regparm(3)) when
gcc is the compiler for i386 .. but it has no effect
on x86_64]

AFAICS gcc 4.x generates much better code if you just use
gotos everywhere instead of C control structures.

I have no real idea why the Felix generated C is faster.
Two guesses:

(a) the two 'mv' variables declared at the top are optimised
away, so the Felix version is only using 3 words of stack.

(b) the "parallel assigment in tail calls optimisation"
is saving one word on the stack (evaluating y before x
saves a temporary across the non-tail recursion).

but I don't really know.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net



More information about the Glasgow-haskell-users mailing list