Mentor for a JVM backend for GHC

Rahul Muttineni rahulmutt at gmail.com
Mon May 2 15:26:07 UTC 2016


Hi GHC Developers,

I've started working on a JVM backend for GHC [1] and I'd love to work on
it as my Summer of Haskell project.

Currently, the build system is setup using a mix of Shake (for the RTS
build) and Stack (for the main compiler build) and I ensure that most
commits build successfully. I have ported the core part of the scheduler
and ported over the fundamental types (Capability, StgTSO, Task,
StgClosure, etc.) taking advantage of OOP in the implementation when I
could.

Additionally, I performed a non-trivial refactor of the hs-java package
adding support for inner classes and fields which was very cumbersome to do
in the original package. On the frontend, I have tapped into the STG code
from the GHC 7.10.3 library and setup a CodeGen monad for generating JVM
bytecode. The main task of generating the actual bytecode, porting the more
critical parts of the RTS, and adding support for the threaded RTS remain.

The strategy for compilation is as follows:
- Intercept the STG code in the GHC pipeline
- Convert from STG->JVM bytecode [2] in a similar manner as STG->Cmm
preserving semantics as best as possible [3]
- Port the GHC RTS (normal & threaded) to Java [4]
- Put all the generated class files + RTS into a single jar to be run
directly by the JVM.

My objectives for the project during the summer are:
- To implement the compilation strategy mentioned above
- Implement the Java FFI for foreign imports. [5]
- Implement the most important [6] PrimOps that GHC supports.
- Port the base package replacing the C FFI imports with equivalent Java
FFI imports. [7]

A little bit about myself: I spent a lot of time studying functional
language implementation by reading SPJ's famous book and reading research
papers on related topics last summer as self-study.

I took a break and resumed a couple months ago where I spent a lot of time
plowing through the STG->Cmm code generator as well as the RTS and going
back and forth between them to get a clear understanding of how everything
works.

Moreover, I compiled simple Haskell programs and observed the STG, Cmm, and
assembly output (by decompiling the final executable with objdump) to
understand bits of the code generator where the source code wasn't that
clear.

I also spent a great deal of time studying the JVM internals, reading the
JVM spec, looking for any new features that could facilitate a high
performance implementation [8].

It would be great if someone with an understanding of nuances of the RTS
and code generator could mentor me for this project. It has been a blast so
far learning all the prerequisites and contemplating the design. I'd be
very excited to take this on as a summer project.

Also, given that I have hardly 5 days remaining, does anyone have
suggestions on how I can structure the proposal without getting into too
many details? There are still some parts of the design I haven't figured
out, but I know I could find some solution when I get to it during the
porting process.

Thanks,
Rahul Muttineni

[1] http://github.com/rahulmutt/ghcvm

[2] I intend to organically derive an IR at a later stage to allow for some
optimizations by looking at the final working implementation without an IR
and looking for patterns of repeated sequences of bytecode and assigning
each sequence its own instruction in the IR.

[3] Obviously, the lack of control of memory layouts (besides allocating
off the JVM heap using DirectByteBuffers) and lack of general tail calls
makes it tough to match the semantics of Cmm, but there are many solutions
around it, as can be found in the few papers on translating STG to Java/JVM
bytecode.

[4] This is the GHC RTS without GC and profiling since the JVM has great
support for those already. Also, lots of care must be taken to ensure that
the lock semantics stays in tact during the port.

[5] foreign exports will be dealt at a later stage, but I am taking care of
naming the closures nicely so that in the future you don't have to type
long names like the labels GHC compiles to call a Haskell function in Java.

[6] Basically all the PrimOps that would be required to provide plumbing
for the Prelude functions that can compile beginner-level programs found in
books such as Learn You a Haskell for Great Good.

[7] I know that it's a lot more complicated than just replacing FFI calls.
I'd have to change around a lot of the code in base as well.

[8] I found that the new "invokedynamic" instruction as well as the
MethodHandle API (something like function pointers) that were introduced in
JDK 7 could fit the bill. But as of now, I want to get a baseline
implementation that is compatible with Java 5 so I will not be utilizing
these newer features.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160502/5d1721a2/attachment.html>


More information about the ghc-devs mailing list