Mentor for a JVM backend for GHC

Alois Cochard alois.cochard at gmail.com
Mon May 9 16:51:53 UTC 2016


Totally agree, Cmm is too late.

My aim in Kuna was to start the transformation from Core, targeting
stack-safe JVM bytecode without using Graal or the like.

Yes, I am quite an optimistic person ;-) but I believe it's the path to
take. I'm not interested in Frege like approach for various reasons,
performance being one of them.
On 7 May 2016 6:19 pm, "Edward Kmett" <ekmett at gmail.com> wrote:

> By the time it has made it down to Cmm there are a lot of assumptions
> about layout in memory -- everything is assumed to be a flat object
> made out of 32-bit or 64-bit slots. These assumptions aren't really
> suitable for the JVM.
>
> -Edward
>
> On Sat, May 7, 2016 at 11:32 AM, Thomas Jakway <tjakway at nyu.edu> wrote:
> > This is a strange coincidence.  I'm definitely no expert GHC hacker but I
> > started (highly preliminary) work on a JVM backend for GHC a few weeks
> ago.
> > It's here:
> https://github.com/tjakway/ghcjvm/tree/jvm/compiler/jvmGen/Jvm
> > (The memory runtime is here: https://github.com/tjakway/lljvm)
> >
> > I'm very new to this so pardon my ignorance, but I don't understand what
> the
> > benefit is of intercepting STG code and translating that to bytecode vs.
> > translating Cmm to bytecode (or Jasmin assembly, as I'd prefer)?  It
> seems
> > like Cmm is designed for backends and the obvious choice.  Or have I got
> > this really mixed up?
> >
> > I hope this isn't out of line considering my overall lack of experience
> but
> > I think I can give some advice:
> >
> > read the JVM 7 spec cover-to-cover.
> > I highly suggest outputting Jasmin assembly instead of raw bytecode.  The
> > classfile format is complicated and you will have to essentially rewrite
> > Jasmin in Haskell if you don't want to reuse it.  Jasmin is also the de
> > facto standard assembler and much more thoroughly tested than any
> homegrown
> > solution we might make.
> > read the LLVM code generator.  This project is more like the LLVM backend
> > than the native code generator.
> > Don't go for speed.  The approach that I've begun is to emulate a C stack
> > and memory system the RTS can run on top of
> > (
> https://github.com/tjakway/lljvm/blob/master/src/main/java/lljvm/runtime/Memory.java
> ).
> > This will make getting something working much faster and also solves the
> > problem of how to deal with memcpy/memset/memmove on the JVM.  This will
> of
> > course be very slow (I think) and is not a permanent solution.  Can't do
> > everything at once.  Any other approach will probably require rewriting
> the
> > entire RTS from the beginning.
> > I don't think Frege is especially useful to this project, though I'd
> love to
> > be proven wrong.  Frege's compilation model is completely different from
> > GHC's: they compile Haskell to Java and then send that to javac.  Porting
> > GHC to the JVM is really more like writing a Cmm to JVM compiler.
> >
> >
> > I've heard of the LambdaVM project but couldn't find the actual code
> > anywhere.  The site where it was hosted appears to be offline.  I'd
> > certainly like to look at it if anyone knows where to find it.
> >
> > Information on Jasmin:
> > http://web.mit.edu/javadev/packages/jasmin/doc/
> > http://web.mit.edu/javadev/packages/jasmin/doc/instructions.html
> > http://web.mit.edu/javadev/packages/jasmin/doc/about.html
> >
> > Once you've tried manually dealing with constant pools you'll appreciate
> > Jonathan Meyer's work!
> >
> > I forked davidar's extended version of Jasmin.  The differences versus
> the
> > original Jasmin are detailed here.  Some nice additions:
> >
> > supports invokedynamic
> > supports .annotation, .inner, .attribute, .deprecated directives
> > better handling of the ldc_w instruction
> > multi-line fields
> > .debug directives
> > signatures for local variables
> > .bytecode directive to specify bytecode version
> > (most importantly, I think): support for the StackMap attribute.  If we
> > eventually want to use new JVM instructions like invokedynamic, we need
> > stack map frames or the JVM will reject our bytecode.  JVM 7 has options
> to
> > bypass this (but it's a hack), but they're deprecated and I believe not
> > optional going forward.  Alternatively we can stick with older bytecode
> > versions indefinitely and not use the new features.
> >
> > (Just to be clear, I forked it in case it was deleted.  I didn't write
> those
> > features, the credit belongs to him).
> >
> > I think the biggest risk is taking too much on at once.  Any one of these
> > subtasks, writing a bytecode assembler, porting the RTS, etc. could
> consume
> > the whole summer if you're not careful.
> >
> > I'd love to help out with this project!
> >
> > Sincerely,
> > Thomas Jakway
> >
> > -------
> >
> > Woops, after scrolling back through the emails it looks like someone sent
> > out the LambdaVM source.  I'll have to take a look at that.
> >
> >
> >
> > On 05/02/2016 11:26 AM, Rahul Muttineni wrote:
> >
> > Hi GHC Developers,
> >
> > I've started working on a JVM backend for GHC [1] and I'd love to work
> on it
> > as my Summer of Haskell project.
> >
> > Currently, the build system is setup using a mix of Shake (for the RTS
> > build) and Stack (for the main compiler build) and I ensure that most
> > commits build successfully. I have ported the core part of the scheduler
> and
> > ported over the fundamental types (Capability, StgTSO, Task, StgClosure,
> > etc.) taking advantage of OOP in the implementation when I could.
> >
> > Additionally, I performed a non-trivial refactor of the hs-java package
> > adding support for inner classes and fields which was very cumbersome to
> do
> > in the original package. On the frontend, I have tapped into the STG code
> > from the GHC 7.10.3 library and setup a CodeGen monad for generating JVM
> > bytecode. The main task of generating the actual bytecode, porting the
> more
> > critical parts of the RTS, and adding support for the threaded RTS
> remain.
> >
> > The strategy for compilation is as follows:
> > - Intercept the STG code in the GHC pipeline
> > - Convert from STG->JVM bytecode [2] in a similar manner as STG->Cmm
> > preserving semantics as best as possible [3]
> > - Port the GHC RTS (normal & threaded) to Java [4]
> > - Put all the generated class files + RTS into a single jar to be run
> > directly by the JVM.
> >
> > My objectives for the project during the summer are:
> > - To implement the compilation strategy mentioned above
> > - Implement the Java FFI for foreign imports. [5]
> > - Implement the most important [6] PrimOps that GHC supports.
> > - Port the base package replacing the C FFI imports with equivalent Java
> FFI
> > imports. [7]
> >
> > A little bit about myself: I spent a lot of time studying functional
> > language implementation by reading SPJ's famous book and reading research
> > papers on related topics last summer as self-study.
> >
> > I took a break and resumed a couple months ago where I spent a lot of
> time
> > plowing through the STG->Cmm code generator as well as the RTS and going
> > back and forth between them to get a clear understanding of how
> everything
> > works.
> >
> > Moreover, I compiled simple Haskell programs and observed the STG, Cmm,
> and
> > assembly output (by decompiling the final executable with objdump) to
> > understand bits of the code generator where the source code wasn't that
> > clear.
> >
> > I also spent a great deal of time studying the JVM internals, reading the
> > JVM spec, looking for any new features that could facilitate a high
> > performance implementation [8].
> >
> > It would be great if someone with an understanding of nuances of the RTS
> and
> > code generator could mentor me for this project. It has been a blast so
> far
> > learning all the prerequisites and contemplating the design. I'd be very
> > excited to take this on as a summer project.
> >
> > Also, given that I have hardly 5 days remaining, does anyone have
> > suggestions on how I can structure the proposal without getting into too
> > many details? There are still some parts of the design I haven't figured
> > out, but I know I could find some solution when I get to it during the
> > porting process.
> >
> > Thanks,
> > Rahul Muttineni
> >
> > [1] http://github.com/rahulmutt/ghcvm
> >
> > [2] I intend to organically derive an IR at a later stage to allow for
> some
> > optimizations by looking at the final working implementation without an
> IR
> > and looking for patterns of repeated sequences of bytecode and assigning
> > each sequence its own instruction in the IR.
> >
> > [3] Obviously, the lack of control of memory layouts (besides allocating
> off
> > the JVM heap using DirectByteBuffers) and lack of general tail calls
> makes
> > it tough to match the semantics of Cmm, but there are many solutions
> around
> > it, as can be found in the few papers on translating STG to Java/JVM
> > bytecode.
> >
> > [4] This is the GHC RTS without GC and profiling since the JVM has great
> > support for those already. Also, lots of care must be taken to ensure
> that
> > the lock semantics stays in tact during the port.
> >
> > [5] foreign exports will be dealt at a later stage, but I am taking care
> of
> > naming the closures nicely so that in the future you don't have to type
> long
> > names like the labels GHC compiles to call a Haskell function in Java.
> >
> > [6] Basically all the PrimOps that would be required to provide plumbing
> for
> > the Prelude functions that can compile beginner-level programs found in
> > books such as Learn You a Haskell for Great Good.
> >
> > [7] I know that it's a lot more complicated than just replacing FFI
> calls.
> > I'd have to change around a lot of the code in base as well.
> >
> > [8] I found that the new "invokedynamic" instruction as well as the
> > MethodHandle API (something like function pointers) that were introduced
> in
> > JDK 7 could fit the bill. But as of now, I want to get a baseline
> > implementation that is compatible with Java 5 so I will not be utilizing
> > these newer features.
> >
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160509/f0e937fb/attachment.html>


More information about the ghc-devs mailing list