How to get started with a new backend?

Mon Jan 28 17:35:03 CET 2013

On 28/01/13 01:15, Jason Dagit wrote:
> I would like to explore making a backend for .NET. I've done a lot of
> background reading about previous .NET and JVM attempts for Haskell. It
> seems like several folks have made significant progress in the past and,
> with the exception of UHC, I can't find any code around the internet
> from the previous efforts. I realize that in total it's a huge
> undertaking and codegen is only one of several significant hurdles to
> success.
>
> I would like to get a very, very, very simple translation working inside
> GHC. If all I can compile and run is fibonacci, then I would be quite
> happy. For my first attempt, proof of concept is sufficient.
>
> I found a lot of good documentation on the ghc trac for how the
> compilation phases work and what happens in the different parts of the
> backend. The documentation is excellent, especially compared to other
> compilers I've looked at.
>
> When I started looking at how to write the code, I started to wonder
> about the "least effort" path to getting something (anything?) working.
> Here are some questions:
>    * Haskell.NET seems to be dead. Does anyone know where their code went?
>    * Did lambdavm also disappear? (JVM I know, but close enough to be
> useful)
>    * Would it make sense to copy&modify the -fvia-C backend to generate
> C#? The trac claims that ghc can compile itself to C so that only
> standard gnu C tools are needed to build an unregistered compiler. Could
> I use this trick to translate programs to C#?
>    * What stage in the pipeline should I translate from? Core? STG? Cmm?
>    * Which directories/source files should I look at to get familiar
> with the code gen? I've heard the LLVM codegen is relatively simple.
>    * Any other advice?

Just to put things in perspective a bit, the LLVM backend shares the RTS 
with the native backend, and uses exactly the same ABI.  That limits its 
scope significantly: it only has to replace the stages between Cmm and 
assembly code, everything else works as-is.

You don't have this luxury with .NET (or JVM), because you can't link 
.NET or JVM code to native code directly, and these systems already have 
their own runtimes.  Basically you're replacing not only the code 
generator, but also the runtime, and probably large chunks of the 
libraries.  That's why it's a bigger job.

You can't go from Cmm, because as Simon says it's already too low-level. 
  You'll want .NET/JVM to manage the stack for you, and you'll want to 
have your own compilation scheme for functions and thunks, and so on. 
The right place to start is after CorePrep, where thunks are explicit 
(this is where the bytecode generator starts, incidentally: you might 
want to look at ghci/ByteCodeGen.hs).

Cheers,
	Simon