inside the GHC code generator
Simon Marlow
simonmarhaskell at gmail.com
Tue Feb 28 04:56:00 EST 2006
Bulat Ziganshin wrote:
>>>* C++ calling stack should be used as much as possible
>>>* parameters are passed in C++ way: "factorial(int n, int r)"
>
>
> SM> This is unworkable, I'm afraid. Go and read Simon's paper on the STG:
>
> SM> http://citeseer.ist.psu.edu/peytonjones92implementing.html
>
> SM> there are two main reasons we don't use the C stack: garbage collection
> SM> and tail calls. What do you plan to do about GC?
>
> minimalistic solution i see - just use two stacks. unboxed values are
> passed using the C conventions and C compiler will be able to
> optimize using of this stack. boxed values are passed using the
> current Sp[] machinery and therefore GC will not be changed
How do you propose to do thread switching?
Incedentally, GHC had two stacks a long time ago (before 4.00) I rewrote
the runtime and code generator to replace it with one stack. It was
about 6 months work, if I remember correctly. I think you misused the
words "just" and "minimalistic" :-)
I know gcc does (some) tail-call optimisation these days. You don't get
any *guarantee* that a call will be implemented as a tail-call, though,
and the conditions are quite complex. The gcc folks treat it as an
optimisation, rather than a fundamental part of the operational
semantics, which is what you need for Haskell.
> SM> This is called semi-tagging, it was tried a long time ago. Certainly
> SM> worth trying again, but I very much doubt the gains would be anything
> SM> like you claim, and it increases code size.
>
> we will replace 2 unpredictable jumps with one that will not be even
> executed if value is in WHNF. at least, i'm very displeased by slowness
> of lazy values evaluation in GHC. for example, my binary serialization
> package writes 60 mb/s working with unboxed arrays and only 20 mb/s
> working with the lists
You need to *try* these things and measure the difference. Quite often
I'm surprised with the actual results I get from an optimisation that
looks on paper to be worthwhile. Today's processors with 3-level caches
are complicated beasts.
Semi tagging increases code size, so unless you can make some
significant gains to make up for it, you've lost already.
Let me give you another example: GHC puts info tables next to the entry
code for a closure. On paper this looks like it might be a bad idea: it
pollutes the instruction cache with data that is rarely accessed during
execution. It's hard to arrange (this is one thing the mangler does),
so we'd like to avoid it if possible. However, doing this eliminates an
indirection when entering a closure, reduces the size of info tables by
one word, and reduces code size. These effects together are more
beneficial than the code-cache-pollution effect - last time I measured
it the difference was ~10%. (turn off TABLES_NEXT_TO_CODE and try it
yourself).
> SM> There are other schemes, including this one that we particularly like:
> SM> use a spare bit in a pointer to indicate "points to an evaluated value".
> SM> Since there are two spare bits, you can go further and use the 4
> SM> values to mean:
>
> SM> 0: possibly unevaluated
> SM> 1: evaluated, tag 0
> SM> 2: evaluated, tag 1
> SM> 3: evaluated, tag 2
>
> SM> I believe Robert Ennals tried this as part of his spec eval
> SM> implementation, but IIRC he didn't get separate measurements for it (or
> SM> it didn't improve things much). Note that this increases code size too.
>
> to be exact, we had only 2^30-1 valid pointers, all other values are
> ready to use! :)
True, but I'm assuming you often want to store a pointer in addition to
the tag (eg. for a cons cell). For enumerations, you're quite right
that the whole word can be used for the tag as long as it is
distinguished from a pointer.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list