[Haskell-cafe] I want to write a compiler

Sun Mar 8 08:30:43 EDT 2009

OK, thank you all for your responses. I'm impressed how fast you were.

2009/3/8 Rick R <rick.richardson at gmail.com>:
> http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

I have it. However, Scheme is strict, which is the reason I didn't
complete my reading. Maybe I am wrong.

> Or go for the Real Thing:
> http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.html#%_sec_5.5

I just opened the page. I'll see what I can take. Thank you.

2009/3/8 Jeremy Shaw <jeremy at n-heptane.com>:
> I can say from experience that the spineless-tagless-gmachine paper
> you referenced does contain all the information needed to implement an
> STG->C compiler. But, you should expect to spend a lot of time
> alternating between trying to implement the thing and rereading the
> paper.

So I just didn't tried hard enough. Time for a re-read, then!

> Also, you may find that it is easier to do STG->Assembly at first (via
> harpy or something). With assembly it is easier to express exactly
> what you want to happen. With C you have to try to figure out how to
> trick the C compiler into doing your bidding.

Scary. I know only C.

> Also, maybe it is more satisfying to output assembly, because then you
> really feel like you are writing a compiler ;)

Well, maybe later.

> I am sure others will say that LLVM is a better option than C or
> assembly. I know almost nothing about LLVM, so they may be right.

Neither do I. But I definitely try this. One day.

> -- jeremy

2009/3/8 Austin Seipp <mad.one at gmail.com>:
>
> Excerpts from Loup Vaillant's message of Sat Mar 07 17:32:48 -0600 2009:
>> Ideally, I would very much like to compile to C.
>> [...]
>> -> have some kind of FFI with C (I suppose it imply some support for
>> unboxed values).
>
> Unfortunately, after working on LHC, I can verifiably say (like all
> the researchers who wrote the papers which I *didn't* read
> beforehand,) that for a lot of purposes, C is an unsuitable fit for a
> compilers' target language. It works pretty well if you can make the
> source language execution model fit well enough, but if you can't,
> you've a price to pay (both in sanity and/or performance.)

Yep.

> However, since your goal is *not* efficiency, you will be happy to
> know that the issues with compiling to C (like garbage collection and
> exception handling) can be worked around viably.

I hope so.

> For garbage collection, please see.
>
> "Accurate Garbage Collection in an Uncooperative Environment" -
> http://citeseer.ist.psu.edu/538613.html
> [...]
> http://code.haskell.org/ddc/ddc-head/runtime/

Thank you, I'll look into that.

> You can find a lot of info on C-- here; I recommend the paper at the
> bottom to start off:
>
> http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/

I'll look at that. But I am not sure it is the simplest path for me: I know C,
and I don't know C-- —yet.

> http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

I have it. Maybe I should study it a bit more.

> As for the FFI, I don't really have any papers on 'how to implement an
> FFI', but Matthias Blume's paper might be of interest:
>
> "No-Longer-Foreign: Teaching an ML compiler to speak c "natively""
> http://ttic.uchicago.edu/~blume/papers/nlffi-entcs.pdf
>
> libffi may also be worth investigating:
> http://sourceware.org/libffi/

I didn't know them, thank you.

>> I have chosen the STG machine because I thought i would be easier to
>> get an FFI and case exprs out of it. Maybe I am wrong, and template
>> instantiation is really easier. There is also the BRISK machine, and
>> GRIN. But the paper I read were less readable to me than the three
>> mentioned.
>
> How far into GRIN have you looked?

Not far enough, it seems.

> It is one of the intermediate
> languages for lhc/jhc, and for a low-level intermediate representation
> it works quite well I think; it is a strict, monadic intermediate
> language - being strict, we must still be able to model closures
> ('suspendeded applications' or thunks,) so we model closures/partial
> applications in a sort of 'defunctionalized' way (making grin a sort
> of 'defunctionalized intermediate language' really,) by turning them
> into something along the lines of an algebraic data type in GRIN. A
> good benefit of this is that you actually /don't/ have to write
> eval/apply yourself, because the compiler is supposed to generate it
> *for* you.

Here is my biggest problem with GRIN: it is not STG by far. I formatted
my brain one way, so I find difficult to reformat it another way

> In fact, this is a core part of GRIN - the generation of eval by the
> compiler is critical for many important optimizations to take
> place. It also makes the runtime a bit simpler.

If I find GRIN is overall simpler, I am happy.

> This comes at the price that GRIN is by design a whole-program
> intermediate form; in order to pull off any of these sophisticated
> transformations everything must be in memory at once. But as we have
> seen with LHC/JHC, it can make a *huge* difference (apps that are 10x
> smaller *and* 10x faster than ones created by GHC.)

I intend my final language to be quite simple. That imply a relatively small
standard library. So, maybe having all the program in memory will not be
such a big deal.

> Having dealt with GRIN as I work on LHC (when time now permits...) I
> can say that it's a reasonable strategy and in practice it turns out
> pretty well, but if you're not into optimizing the code, then STG
> might be a better fit.

OK, maybe I will choose STG for now, then.

> I can't really give you that much advice on this area so much as
> you'll just have to choose. But GRIN does have at least two excellent
> references:
>
>  Code Optimisation Techniques for Lazy Functional Languages -
>  U. Boquist, http://www.cs.chalmers.se/~boquist/phd/index.html
>
>  The GRIN Project: A Highly Optimising Back End For Lazy Functional
>  Languages, U. Boquist, http://www.cs.chalmers.se/~boquist/ifl96-abstract.html

Oh, I forgot about the second paper, thank you. I find it a bit clearer than the
thesis. Having the GRIN BNF helps a lot.

> Then again you said you're not interested in optimization, so perhaps
> I'm just preaching to the totally wrong choir... :)

No, you're not. A long (looong) term goal of mine is having crazy optimizations.
I will definitely explore that path.

> Austin

2009/3/8 wren ng thornton <wren at freegeek.org>:
> Loup Vaillant wrote:
>>
>> -> support algebraic data types and case expressions (unless I can get
>> away with encoding them as functions),
>
> Which you always can,
> [snip]
>
> The only trick is that you need to have closures (in order to build the
> Foos) and you need to have first-class functions (in order to pass the
> destructors in). If you're familiar with the STG machine, this is what they
> do under the covers anyways. At the core level, all you need is some
> primitive to force evaluation.

Ah, that is what I was missing. I thought I had to implement seq on top
of case, not the other way around. Maybe that can make Template
instantiation viable

> ~wren

Anyway, thank you all very much. Time for some reading.

Cheers,
Loup