Mentor for a JVM backend for GHC

Sat May 7 16:19:17 UTC 2016

By the time it has made it down to Cmm there are a lot of assumptions
about layout in memory -- everything is assumed to be a flat object
made out of 32-bit or 64-bit slots. These assumptions aren't really
suitable for the JVM.

-Edward

On Sat, May 7, 2016 at 11:32 AM, Thomas Jakway <tjakway at nyu.edu> wrote:
> This is a strange coincidence.  I'm definitely no expert GHC hacker but I
> started (highly preliminary) work on a JVM backend for GHC a few weeks ago.
> It's here: https://github.com/tjakway/ghcjvm/tree/jvm/compiler/jvmGen/Jvm
> (The memory runtime is here: https://github.com/tjakway/lljvm)
>
> I'm very new to this so pardon my ignorance, but I don't understand what the
> benefit is of intercepting STG code and translating that to bytecode vs.
> translating Cmm to bytecode (or Jasmin assembly, as I'd prefer)?  It seems
> like Cmm is designed for backends and the obvious choice.  Or have I got
> this really mixed up?
>
> I hope this isn't out of line considering my overall lack of experience but
> I think I can give some advice:
>
> read the JVM 7 spec cover-to-cover.
> I highly suggest outputting Jasmin assembly instead of raw bytecode.  The
> classfile format is complicated and you will have to essentially rewrite
> Jasmin in Haskell if you don't want to reuse it.  Jasmin is also the de
> facto standard assembler and much more thoroughly tested than any homegrown
> solution we might make.
> read the LLVM code generator.  This project is more like the LLVM backend
> than the native code generator.
> Don't go for speed.  The approach that I've begun is to emulate a C stack
> and memory system the RTS can run on top of
> (https://github.com/tjakway/lljvm/blob/master/src/main/java/lljvm/runtime/Memory.java).
> This will make getting something working much faster and also solves the
> problem of how to deal with memcpy/memset/memmove on the JVM.  This will of
> course be very slow (I think) and is not a permanent solution.  Can't do
> everything at once.  Any other approach will probably require rewriting the
> entire RTS from the beginning.
> I don't think Frege is especially useful to this project, though I'd love to
> be proven wrong.  Frege's compilation model is completely different from
> GHC's: they compile Haskell to Java and then send that to javac.  Porting
> GHC to the JVM is really more like writing a Cmm to JVM compiler.
>
>
> I've heard of the LambdaVM project but couldn't find the actual code
> anywhere.  The site where it was hosted appears to be offline.  I'd
> certainly like to look at it if anyone knows where to find it.
>
> Information on Jasmin:
> http://web.mit.edu/javadev/packages/jasmin/doc/
> http://web.mit.edu/javadev/packages/jasmin/doc/instructions.html
> http://web.mit.edu/javadev/packages/jasmin/doc/about.html
>
> Once you've tried manually dealing with constant pools you'll appreciate
> Jonathan Meyer's work!
>
> I forked davidar's extended version of Jasmin.  The differences versus the
> original Jasmin are detailed here.  Some nice additions:
>
> supports invokedynamic
> supports .annotation, .inner, .attribute, .deprecated directives
> better handling of the ldc_w instruction
> multi-line fields
> .debug directives
> signatures for local variables
> .bytecode directive to specify bytecode version
> (most importantly, I think): support for the StackMap attribute.  If we
> eventually want to use new JVM instructions like invokedynamic, we need
> stack map frames or the JVM will reject our bytecode.  JVM 7 has options to
> bypass this (but it's a hack), but they're deprecated and I believe not
> optional going forward.  Alternatively we can stick with older bytecode
> versions indefinitely and not use the new features.
>
> (Just to be clear, I forked it in case it was deleted.  I didn't write those
> features, the credit belongs to him).
>
> I think the biggest risk is taking too much on at once.  Any one of these
> subtasks, writing a bytecode assembler, porting the RTS, etc. could consume
> the whole summer if you're not careful.
>
> I'd love to help out with this project!
>
> Sincerely,
> Thomas Jakway
>
> -------
>
> Woops, after scrolling back through the emails it looks like someone sent
> out the LambdaVM source.  I'll have to take a look at that.
>
>
>
> On 05/02/2016 11:26 AM, Rahul Muttineni wrote:
>
> Hi GHC Developers,
>
> I've started working on a JVM backend for GHC [1] and I'd love to work on it
> as my Summer of Haskell project.
>
> Currently, the build system is setup using a mix of Shake (for the RTS
> build) and Stack (for the main compiler build) and I ensure that most
> commits build successfully. I have ported the core part of the scheduler and
> ported over the fundamental types (Capability, StgTSO, Task, StgClosure,
> etc.) taking advantage of OOP in the implementation when I could.
>
> Additionally, I performed a non-trivial refactor of the hs-java package
> adding support for inner classes and fields which was very cumbersome to do
> in the original package. On the frontend, I have tapped into the STG code
> from the GHC 7.10.3 library and setup a CodeGen monad for generating JVM
> bytecode. The main task of generating the actual bytecode, porting the more
> critical parts of the RTS, and adding support for the threaded RTS remain.
>
> The strategy for compilation is as follows:
> - Intercept the STG code in the GHC pipeline
> - Convert from STG->JVM bytecode [2] in a similar manner as STG->Cmm
> preserving semantics as best as possible [3]
> - Port the GHC RTS (normal & threaded) to Java [4]
> - Put all the generated class files + RTS into a single jar to be run
> directly by the JVM.
>
> My objectives for the project during the summer are:
> - To implement the compilation strategy mentioned above
> - Implement the Java FFI for foreign imports. [5]
> - Implement the most important [6] PrimOps that GHC supports.
> - Port the base package replacing the C FFI imports with equivalent Java FFI
> imports. [7]
>
> A little bit about myself: I spent a lot of time studying functional
> language implementation by reading SPJ's famous book and reading research
> papers on related topics last summer as self-study.
>
> I took a break and resumed a couple months ago where I spent a lot of time
> plowing through the STG->Cmm code generator as well as the RTS and going
> back and forth between them to get a clear understanding of how everything
> works.
>
> Moreover, I compiled simple Haskell programs and observed the STG, Cmm, and
> assembly output (by decompiling the final executable with objdump) to
> understand bits of the code generator where the source code wasn't that
> clear.
>
> I also spent a great deal of time studying the JVM internals, reading the
> JVM spec, looking for any new features that could facilitate a high
> performance implementation [8].
>
> It would be great if someone with an understanding of nuances of the RTS and
> code generator could mentor me for this project. It has been a blast so far
> learning all the prerequisites and contemplating the design. I'd be very
> excited to take this on as a summer project.
>
> Also, given that I have hardly 5 days remaining, does anyone have
> suggestions on how I can structure the proposal without getting into too
> many details? There are still some parts of the design I haven't figured
> out, but I know I could find some solution when I get to it during the
> porting process.
>
> Thanks,
> Rahul Muttineni
>
> [1] http://github.com/rahulmutt/ghcvm
>
> [2] I intend to organically derive an IR at a later stage to allow for some
> optimizations by looking at the final working implementation without an IR
> and looking for patterns of repeated sequences of bytecode and assigning
> each sequence its own instruction in the IR.
>
> [3] Obviously, the lack of control of memory layouts (besides allocating off
> the JVM heap using DirectByteBuffers) and lack of general tail calls makes
> it tough to match the semantics of Cmm, but there are many solutions around
> it, as can be found in the few papers on translating STG to Java/JVM
> bytecode.
>
> [4] This is the GHC RTS without GC and profiling since the JVM has great
> support for those already. Also, lots of care must be taken to ensure that
> the lock semantics stays in tact during the port.
>
> [5] foreign exports will be dealt at a later stage, but I am taking care of
> naming the closures nicely so that in the future you don't have to type long
> names like the labels GHC compiles to call a Haskell function in Java.
>
> [6] Basically all the PrimOps that would be required to provide plumbing for
> the Prelude functions that can compile beginner-level programs found in
> books such as Learn You a Haskell for Great Good.
>
> [7] I know that it's a lot more complicated than just replacing FFI calls.
> I'd have to change around a lot of the code in base as well.
>
> [8] I found that the new "invokedynamic" instruction as well as the
> MethodHandle API (something like function pointers) that were introduced in
> JDK 7 could fit the bill. But as of now, I want to get a baseline
> implementation that is compatible with Java 5 so I will not be utilizing
> these newer features.
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>