[Haskell-cafe] Learning about haskell compilers

Tue Dec 20 19:58:49 EST 2005

On Tue, Dec 20, 2005 at 10:36:36AM -0600, Creighton Hogg wrote:
> I was wondering where I should get started in learing about 
> how to implement a haskell compiler?

Well, the way I learned was by just making the decision that one way or
another, I was going to write a haskell compiler and then reading and
reading and coding and tossing code and recoding until I succeeded. :)
it is sort of how I learn, choose something I have no idea how to do,
and do it anyway learning what I need to along the way out of neccesity.

in any case, assuming you know something about basic compilers, lexing,
parsing, and code generation and are interested in how you get parsed
haskell source into a form that you can generate code from, I'd highly
recommend the following (in order) (of course, this is just my
experience)

* The haskell report itself: this will give the rules on how to desugar
'complicated' haskell into simple haskell. things like getting rid of list
comprehensions and do notation and turning complicated pattern matching
into simple pattern matching (the most complicated of the desugaring
transformations). I'd recommend desugaring and expanding type synonyms
as soon as possible for your first attempt even though it gives worse
error messages because it _greatly_ simplifies the later steps and you
will likely go through several iterations of the typechecker before you
get one you are happy with. you can always extend your typechecker later
and postpone certain desugarings until you reach a balance of good error
messages you are comfortable with.

* typing haskell in haskell:
http://www.cse.ogi.edu/~mpj/thih/
this will tell you how to go from haskell source to explicitly typed
haskell source, which is now in a form suitable for transforming into an
intermediate language. the explicitly typed, desugared haskell can be
converted directly to core. It may take several readings to fully
understand and looking at the source to hatchet could help considerably
(it did for me)

* then you have a couple choices, basically, You want to learn about ghc
core. there are a lot of papers on ghc optimizations that are great
because they give examples of how core is used which really helps get an
intuitive understanding of why core is a good intermediate language:

recommend are:
Secrets of the GHC inliner.
http://www.research.microsoft.com/~simonpj/Papers/inlining/index.htm
this is by far the most important, the simplifier is the key to any
other optimizations and this paper goes into a lot of the details other
things leave out.

http://haskell.org/ghc/docs/papers/unboxed-values.ps.gz
very good paper. my main regret about jhc development is that I didn't
incorperate unboxed values right from the start thinking of them as an
optimization rather than an essential part of expressing program
properties. I just accumulated hack-after-hack which would have gone
away with unboxed values in the core.

http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/float.ps.gz&pub=ACM
let floating, not only an important optimization, but will get you
thinking about what it means to preserve lazyness and some of the
trickiness of not breaking performance by accident.

http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/comp-by-trans.ps.gz&pub=18
great overview, either read this, or all of the above. (or ideally
everything) starting out with this one is probably the best idea because
it will give you an idea what you are in for.

* next you have your simplified optimized core and you want to turn it
into machine code, there are various choices you can make, 

the most well understood is G-machine based implementations like nhc or
the more advanced STG machine used in ghc. the lester book mentioned in
another email is a great introduction and highly recommended even if the
source  there is a lot of practical
info out there on how to implement these and you have pretty much a
straight shot from core to G-machine code either to be compiled directly
or interpreted. 

for jhc I used boquist's GRIN:
http://www.cs.chalmers.se/~boquist/phd/
which is not the simplest way of getting things working right away, but
could be very interesting.

there are various other abstract machines out there, the Lazy Virtual
Machine used by Helium described in Daan Leijen's Phd thesis is quite
interesting, and might make a better first target than G-machine code.

so, I'd recommend, at least _understand_ the G-machine, enough to write
an interpreter for it. (if you already have core, translating and
interpreting G-machine code shouldn't be more than a few hundreds of
lines of code). then read about the LVM, STG, and GRIN, and any other
ones anyone recommends? and decide where you want to go from there.
whichever one inspires you.

Hopefully this helps you, this is just the route I would recommend based
on my experience with jhc, I read a whole lot of papers and books, and
there are a ton more great ones out there but I think this would be the
easiest path to a simple haskell compiler that is based on concepts
extensible to a full-scale non-toy implementation. I left out details
that should be covered in any traditional compiler book and just
mentioned the issues specific to a haskell compiler. I am sure someone
can fill in the holes if needed. 

I wish you good luck, I started jhc a month after hearing about haskell
and boy was it a great crash course in the language having to learn how
to program in it and its details at the same time. The great news is
that Haskell is a _superlative_ language to write compilers for any
language in and is quite fun to boot. writing a compiler you won't find
anywhere that haskell itself is lacking so can concentrate on the task
at hand.

        John 

-- 
John Meacham - ⑆repetae.net⑆john⑈