Mentor for a JVM backend for GHC
ekmett at gmail.com
Sat May 7 16:19:17 UTC 2016
By the time it has made it down to Cmm there are a lot of assumptions
about layout in memory -- everything is assumed to be a flat object
made out of 32-bit or 64-bit slots. These assumptions aren't really
suitable for the JVM.
On Sat, May 7, 2016 at 11:32 AM, Thomas Jakway <tjakway at nyu.edu> wrote:
> This is a strange coincidence. I'm definitely no expert GHC hacker but I
> started (highly preliminary) work on a JVM backend for GHC a few weeks ago.
> It's here: https://github.com/tjakway/ghcjvm/tree/jvm/compiler/jvmGen/Jvm
> (The memory runtime is here: https://github.com/tjakway/lljvm)
> I'm very new to this so pardon my ignorance, but I don't understand what the
> benefit is of intercepting STG code and translating that to bytecode vs.
> translating Cmm to bytecode (or Jasmin assembly, as I'd prefer)? It seems
> like Cmm is designed for backends and the obvious choice. Or have I got
> this really mixed up?
> I hope this isn't out of line considering my overall lack of experience but
> I think I can give some advice:
> read the JVM 7 spec cover-to-cover.
> I highly suggest outputting Jasmin assembly instead of raw bytecode. The
> classfile format is complicated and you will have to essentially rewrite
> Jasmin in Haskell if you don't want to reuse it. Jasmin is also the de
> facto standard assembler and much more thoroughly tested than any homegrown
> solution we might make.
> read the LLVM code generator. This project is more like the LLVM backend
> than the native code generator.
> Don't go for speed. The approach that I've begun is to emulate a C stack
> and memory system the RTS can run on top of
> This will make getting something working much faster and also solves the
> problem of how to deal with memcpy/memset/memmove on the JVM. This will of
> course be very slow (I think) and is not a permanent solution. Can't do
> everything at once. Any other approach will probably require rewriting the
> entire RTS from the beginning.
> I don't think Frege is especially useful to this project, though I'd love to
> be proven wrong. Frege's compilation model is completely different from
> GHC's: they compile Haskell to Java and then send that to javac. Porting
> GHC to the JVM is really more like writing a Cmm to JVM compiler.
> I've heard of the LambdaVM project but couldn't find the actual code
> anywhere. The site where it was hosted appears to be offline. I'd
> certainly like to look at it if anyone knows where to find it.
> Information on Jasmin:
> Once you've tried manually dealing with constant pools you'll appreciate
> Jonathan Meyer's work!
> I forked davidar's extended version of Jasmin. The differences versus the
> original Jasmin are detailed here. Some nice additions:
> supports invokedynamic
> supports .annotation, .inner, .attribute, .deprecated directives
> better handling of the ldc_w instruction
> multi-line fields
> .debug directives
> signatures for local variables
> .bytecode directive to specify bytecode version
> (most importantly, I think): support for the StackMap attribute. If we
> eventually want to use new JVM instructions like invokedynamic, we need
> stack map frames or the JVM will reject our bytecode. JVM 7 has options to
> bypass this (but it's a hack), but they're deprecated and I believe not
> optional going forward. Alternatively we can stick with older bytecode
> versions indefinitely and not use the new features.
> (Just to be clear, I forked it in case it was deleted. I didn't write those
> features, the credit belongs to him).
> I think the biggest risk is taking too much on at once. Any one of these
> subtasks, writing a bytecode assembler, porting the RTS, etc. could consume
> the whole summer if you're not careful.
> I'd love to help out with this project!
> Thomas Jakway
> Woops, after scrolling back through the emails it looks like someone sent
> out the LambdaVM source. I'll have to take a look at that.
> On 05/02/2016 11:26 AM, Rahul Muttineni wrote:
> Hi GHC Developers,
> I've started working on a JVM backend for GHC  and I'd love to work on it
> as my Summer of Haskell project.
> Currently, the build system is setup using a mix of Shake (for the RTS
> build) and Stack (for the main compiler build) and I ensure that most
> commits build successfully. I have ported the core part of the scheduler and
> ported over the fundamental types (Capability, StgTSO, Task, StgClosure,
> etc.) taking advantage of OOP in the implementation when I could.
> Additionally, I performed a non-trivial refactor of the hs-java package
> adding support for inner classes and fields which was very cumbersome to do
> in the original package. On the frontend, I have tapped into the STG code
> from the GHC 7.10.3 library and setup a CodeGen monad for generating JVM
> bytecode. The main task of generating the actual bytecode, porting the more
> critical parts of the RTS, and adding support for the threaded RTS remain.
> The strategy for compilation is as follows:
> - Intercept the STG code in the GHC pipeline
> - Convert from STG->JVM bytecode  in a similar manner as STG->Cmm
> preserving semantics as best as possible 
> - Port the GHC RTS (normal & threaded) to Java 
> - Put all the generated class files + RTS into a single jar to be run
> directly by the JVM.
> My objectives for the project during the summer are:
> - To implement the compilation strategy mentioned above
> - Implement the Java FFI for foreign imports. 
> - Implement the most important  PrimOps that GHC supports.
> - Port the base package replacing the C FFI imports with equivalent Java FFI
> imports. 
> A little bit about myself: I spent a lot of time studying functional
> language implementation by reading SPJ's famous book and reading research
> papers on related topics last summer as self-study.
> I took a break and resumed a couple months ago where I spent a lot of time
> plowing through the STG->Cmm code generator as well as the RTS and going
> back and forth between them to get a clear understanding of how everything
> Moreover, I compiled simple Haskell programs and observed the STG, Cmm, and
> assembly output (by decompiling the final executable with objdump) to
> understand bits of the code generator where the source code wasn't that
> I also spent a great deal of time studying the JVM internals, reading the
> JVM spec, looking for any new features that could facilitate a high
> performance implementation .
> It would be great if someone with an understanding of nuances of the RTS and
> code generator could mentor me for this project. It has been a blast so far
> learning all the prerequisites and contemplating the design. I'd be very
> excited to take this on as a summer project.
> Also, given that I have hardly 5 days remaining, does anyone have
> suggestions on how I can structure the proposal without getting into too
> many details? There are still some parts of the design I haven't figured
> out, but I know I could find some solution when I get to it during the
> porting process.
> Rahul Muttineni
>  http://github.com/rahulmutt/ghcvm
>  I intend to organically derive an IR at a later stage to allow for some
> optimizations by looking at the final working implementation without an IR
> and looking for patterns of repeated sequences of bytecode and assigning
> each sequence its own instruction in the IR.
>  Obviously, the lack of control of memory layouts (besides allocating off
> the JVM heap using DirectByteBuffers) and lack of general tail calls makes
> it tough to match the semantics of Cmm, but there are many solutions around
> it, as can be found in the few papers on translating STG to Java/JVM
>  This is the GHC RTS without GC and profiling since the JVM has great
> support for those already. Also, lots of care must be taken to ensure that
> the lock semantics stays in tact during the port.
>  foreign exports will be dealt at a later stage, but I am taking care of
> naming the closures nicely so that in the future you don't have to type long
> names like the labels GHC compiles to call a Haskell function in Java.
>  Basically all the PrimOps that would be required to provide plumbing for
> the Prelude functions that can compile beginner-level programs found in
> books such as Learn You a Haskell for Great Good.
>  I know that it's a lot more complicated than just replacing FFI calls.
> I'd have to change around a lot of the code in base as well.
>  I found that the new "invokedynamic" instruction as well as the
> MethodHandle API (something like function pointers) that were introduced in
> JDK 7 could fit the bill. But as of now, I want to get a baseline
> implementation that is compatible with Java 5 so I will not be utilizing
> these newer features.
> ghc-devs mailing list
> ghc-devs at haskell.org
> ghc-devs mailing list
> ghc-devs at haskell.org
More information about the ghc-devs