Mentor for a JVM backend for GHC

Alois Cochard alois.cochard at gmail.com
Mon May 9 16:48:05 UTC 2016


I think Jasmin is really overrated, we can clearly do better in Haskell.

FWIW, the Kuna JVM assembler do support stack map frames, a lot of effort
was put into getting this right, it would be a shame to redo this works if
someone looks for a pure haskell implementation.
On 7 May 2016 5:32 pm, "Thomas Jakway" <tjakway at nyu.edu> wrote:

> This is a strange coincidence.  I'm definitely no expert GHC hacker but I
> started (highly preliminary) work on a JVM backend for GHC a few weeks
> ago.  It's here:
> https://github.com/tjakway/ghcjvm/tree/jvm/compiler/jvmGen/Jvm
> (The memory runtime is here: https://github.com/tjakway/lljvm)
>
> I'm very new to this so pardon my ignorance, but I don't understand what
> the benefit is of intercepting STG code and translating that to bytecode
> vs. translating Cmm to bytecode (or Jasmin assembly, as I'd prefer)?  It
> seems like Cmm is designed for backends and the obvious choice.  Or have I
> got this really mixed up?
>
> I hope this isn't out of line considering my overall lack of experience
> but I think I can give some advice:
>
>    - read the JVM 7 spec cover-to-cover.
>    - I *highly* suggest outputting Jasmin
>    <http://web.mit.edu/javadev/packages/jasmin/doc/> assembly instead of
>    raw bytecode.  The classfile format is complicated and you will have to
>    essentially rewrite Jasmin in Haskell if you don't want to reuse it.
>    Jasmin is also the de facto standard assembler and much more thoroughly
>    tested than any homegrown solution we might make.
>    - read the LLVM code generator.  This project is more like the LLVM
>    backend than the native code generator.
>    - Don't go for speed.  The approach that I've begun is to emulate a C
>    stack and memory system the RTS can run on top of (
>    https://github.com/tjakway/lljvm/blob/master/src/main/java/lljvm/runtime/Memory.java).
>    This will make getting *something* working much faster and also solves
>    the problem of how to deal with memcpy/memset/memmove on the JVM.  This
>    will of course be *very* slow (I think) and is not a permanent
>    solution.  Can't do everything at once.  Any other approach will probably
>    require rewriting the entire RTS from the beginning.
>    - I don't think Frege is especially useful to this project, though I'd
>    love to be proven wrong.  Frege's compilation model is completely different
>    from GHC's: they compile Haskell to Java and then send that to javac.
>    Porting GHC to the JVM is really more like writing a Cmm to JVM compiler.
>
>
> I've heard of the LambdaVM project but couldn't find the actual code
> anywhere.  The site where it was hosted appears to be offline.  I'd
> certainly like to look at it if anyone knows where to find it.
>
> Information on Jasmin:
> http://web.mit.edu/javadev/packages/jasmin/doc/
> http://web.mit.edu/javadev/packages/jasmin/doc/instructions.html
> http://web.mit.edu/javadev/packages/jasmin/doc/about.html
>
> Once you've tried manually dealing with constant pools you'll appreciate
> Jonathan Meyer's work!
>
> I forked davidar's <https://github.com/davidar> extended version of
> Jasmin.  The differences versus the original Jasmin are detailed here
> <https://github.com/davidar/jasmin/blob/master/html2x/xt.html>.  Some
> nice additions:
>
>    - supports invokedynamic
>    - supports .annotation, .inner, .attribute, .deprecated directives
>    - better handling of the ldc_w instruction
>    - multi-line fields
>    - .debug directives
>    - signatures for local variables
>    - .bytecode directive to specify bytecode version
>    - (most importantly, I think): support for the StackMap attribute.  If
>    we eventually want to use new JVM instructions like invokedynamic, we
>    *need* stack map frames or the JVM will reject our bytecode.  JVM 7
>    has options to bypass this (but it's a hack), but they're deprecated and I
>    believe not optional going forward.  Alternatively we can stick with older
>    bytecode versions indefinitely and not use the new features.
>
> (Just to be clear, I forked it in case it was deleted.  I didn't write
> those features, the credit belongs to him).
>
> I think the biggest risk is taking too much on at once.  Any one of these
> subtasks, writing a bytecode assembler, porting the RTS, etc. could consume
> the whole summer if you're not careful.
>
> I'd love to help out with this project!
>
> Sincerely,
> Thomas Jakway
>
> -------
>
> Woops, after scrolling back through the emails it looks like someone sent
> out the LambdaVM source.  I'll have to take a look at that.
>
>
> On 05/02/2016 11:26 AM, Rahul Muttineni wrote:
>
> Hi GHC Developers,
>
> I've started working on a JVM backend for GHC [1] and I'd love to work on
> it as my Summer of Haskell project.
>
> Currently, the build system is setup using a mix of Shake (for the RTS
> build) and Stack (for the main compiler build) and I ensure that most
> commits build successfully. I have ported the core part of the scheduler
> and ported over the fundamental types (Capability, StgTSO, Task,
> StgClosure, etc.) taking advantage of OOP in the implementation when I
> could.
>
> Additionally, I performed a non-trivial refactor of the hs-java package
> adding support for inner classes and fields which was very cumbersome to do
> in the original package. On the frontend, I have tapped into the STG code
> from the GHC 7.10.3 library and setup a CodeGen monad for generating JVM
> bytecode. The main task of generating the actual bytecode, porting the more
> critical parts of the RTS, and adding support for the threaded RTS remain.
>
> The strategy for compilation is as follows:
> - Intercept the STG code in the GHC pipeline
> - Convert from STG->JVM bytecode [2] in a similar manner as STG->Cmm
> preserving semantics as best as possible [3]
> - Port the GHC RTS (normal & threaded) to Java [4]
> - Put all the generated class files + RTS into a single jar to be run
> directly by the JVM.
>
> My objectives for the project during the summer are:
> - To implement the compilation strategy mentioned above
> - Implement the Java FFI for foreign imports. [5]
> - Implement the most important [6] PrimOps that GHC supports.
> - Port the base package replacing the C FFI imports with equivalent Java
> FFI imports. [7]
>
> A little bit about myself: I spent a lot of time studying functional
> language implementation by reading SPJ's famous book and reading research
> papers on related topics last summer as self-study.
>
> I took a break and resumed a couple months ago where I spent a lot of time
> plowing through the STG->Cmm code generator as well as the RTS and going
> back and forth between them to get a clear understanding of how everything
> works.
>
> Moreover, I compiled simple Haskell programs and observed the STG, Cmm,
> and assembly output (by decompiling the final executable with objdump) to
> understand bits of the code generator where the source code wasn't that
> clear.
>
> I also spent a great deal of time studying the JVM internals, reading the
> JVM spec, looking for any new features that could facilitate a high
> performance implementation [8].
>
> It would be great if someone with an understanding of nuances of the RTS
> and code generator could mentor me for this project. It has been a blast so
> far learning all the prerequisites and contemplating the design. I'd be
> very excited to take this on as a summer project.
>
> Also, given that I have hardly 5 days remaining, does anyone have
> suggestions on how I can structure the proposal without getting into too
> many details? There are still some parts of the design I haven't figured
> out, but I know I could find some solution when I get to it during the
> porting process.
>
> Thanks,
> Rahul Muttineni
>
> [1] http://github.com/rahulmutt/ghcvm
>
> [2] I intend to organically derive an IR at a later stage to allow for
> some optimizations by looking at the final working implementation without
> an IR and looking for patterns of repeated sequences of bytecode and
> assigning each sequence its own instruction in the IR.
>
> [3] Obviously, the lack of control of memory layouts (besides allocating
> off the JVM heap using DirectByteBuffers) and lack of general tail calls
> makes it tough to match the semantics of Cmm, but there are many solutions
> around it, as can be found in the few papers on translating STG to Java/JVM
> bytecode.
>
> [4] This is the GHC RTS without GC and profiling since the JVM has great
> support for those already. Also, lots of care must be taken to ensure that
> the lock semantics stays in tact during the port.
>
> [5] foreign exports will be dealt at a later stage, but I am taking care
> of naming the closures nicely so that in the future you don't have to type
> long names like the labels GHC compiles to call a Haskell function in Java.
>
> [6] Basically all the PrimOps that would be required to provide plumbing
> for the Prelude functions that can compile beginner-level programs found in
> books such as Learn You a Haskell for Great Good.
>
> [7] I know that it's a lot more complicated than just replacing FFI calls.
> I'd have to change around a lot of the code in base as well.
>
> [8] I found that the new "invokedynamic" instruction as well as the
> MethodHandle API (something like function pointers) that were introduced in
> JDK 7 could fit the bill. But as of now, I want to get a baseline
> implementation that is compatible with Java 5 so I will not be utilizing
> these newer features.
>
>
>
> _______________________________________________
> ghc-devs mailing listghc-devs at haskell.orghttp://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160509/38002700/attachment-0001.html>


More information about the ghc-devs mailing list