Mentor for a JVM backend for GHC

Mon May 2 16:05:42 UTC 2016

I'd be willing to help out at least as a backup mentor for this.

I'd prefer if we could find someone who is more familiar with the GHC
peculiarities, but I can speak to the JVM and Java bytecode end and
know the general STG execution model pretty well.

-Edward

On Mon, May 2, 2016 at 11:48 PM, Carter Schonwald
<carter.schonwald at gmail.com> wrote:
> Woah this is some prodigious work!
>
> 1) would initially targeting only newer jvm  variants make the work targets
> for the summer easier ?
>
> 2) it's worth talking with ghc folks on #ghc and the gsoc admins at
> #haskell-gsoc on freenode IRC  about gsoc fit and possible mentors.
>
> 3) it sounds like the work so far is worth sharing on the r/Haskell sub
> Reddit.  If you want.  I do recommend IRC for talking with some of the
> active ghc and gsoc folk
>
>
> On Monday, May 2, 2016, Rahul Muttineni <rahulmutt at gmail.com> wrote:
>>
>> Hi GHC Developers,
>>
>> I've started working on a JVM backend for GHC [1] and I'd love to work on
>> it as my Summer of Haskell project.
>>
>> Currently, the build system is setup using a mix of Shake (for the RTS
>> build) and Stack (for the main compiler build) and I ensure that most
>> commits build successfully. I have ported the core part of the scheduler and
>> ported over the fundamental types (Capability, StgTSO, Task, StgClosure,
>> etc.) taking advantage of OOP in the implementation when I could.
>>
>> Additionally, I performed a non-trivial refactor of the hs-java package
>> adding support for inner classes and fields which was very cumbersome to do
>> in the original package. On the frontend, I have tapped into the STG code
>> from the GHC 7.10.3 library and setup a CodeGen monad for generating JVM
>> bytecode. The main task of generating the actual bytecode, porting the more
>> critical parts of the RTS, and adding support for the threaded RTS remain.
>>
>> The strategy for compilation is as follows:
>> - Intercept the STG code in the GHC pipeline
>> - Convert from STG->JVM bytecode [2] in a similar manner as STG->Cmm
>> preserving semantics as best as possible [3]
>> - Port the GHC RTS (normal & threaded) to Java [4]
>> - Put all the generated class files + RTS into a single jar to be run
>> directly by the JVM.
>>
>> My objectives for the project during the summer are:
>> - To implement the compilation strategy mentioned above
>> - Implement the Java FFI for foreign imports. [5]
>> - Implement the most important [6] PrimOps that GHC supports.
>> - Port the base package replacing the C FFI imports with equivalent Java
>> FFI imports. [7]
>>
>> A little bit about myself: I spent a lot of time studying functional
>> language implementation by reading SPJ's famous book and reading research
>> papers on related topics last summer as self-study.
>>
>> I took a break and resumed a couple months ago where I spent a lot of time
>> plowing through the STG->Cmm code generator as well as the RTS and going
>> back and forth between them to get a clear understanding of how everything
>> works.
>>
>> Moreover, I compiled simple Haskell programs and observed the STG, Cmm,
>> and assembly output (by decompiling the final executable with objdump) to
>> understand bits of the code generator where the source code wasn't that
>> clear.
>>
>> I also spent a great deal of time studying the JVM internals, reading the
>> JVM spec, looking for any new features that could facilitate a high
>> performance implementation [8].
>>
>> It would be great if someone with an understanding of nuances of the RTS
>> and code generator could mentor me for this project. It has been a blast so
>> far learning all the prerequisites and contemplating the design. I'd be very
>> excited to take this on as a summer project.
>>
>> Also, given that I have hardly 5 days remaining, does anyone have
>> suggestions on how I can structure the proposal without getting into too
>> many details? There are still some parts of the design I haven't figured
>> out, but I know I could find some solution when I get to it during the
>> porting process.
>>
>> Thanks,
>> Rahul Muttineni
>>
>> [1] http://github.com/rahulmutt/ghcvm
>>
>> [2] I intend to organically derive an IR at a later stage to allow for
>> some optimizations by looking at the final working implementation without an
>> IR and looking for patterns of repeated sequences of bytecode and assigning
>> each sequence its own instruction in the IR.
>>
>> [3] Obviously, the lack of control of memory layouts (besides allocating
>> off the JVM heap using DirectByteBuffers) and lack of general tail calls
>> makes it tough to match the semantics of Cmm, but there are many solutions
>> around it, as can be found in the few papers on translating STG to Java/JVM
>> bytecode.
>>
>> [4] This is the GHC RTS without GC and profiling since the JVM has great
>> support for those already. Also, lots of care must be taken to ensure that
>> the lock semantics stays in tact during the port.
>>
>> [5] foreign exports will be dealt at a later stage, but I am taking care
>> of naming the closures nicely so that in the future you don't have to type
>> long names like the labels GHC compiles to call a Haskell function in Java.
>>
>> [6] Basically all the PrimOps that would be required to provide plumbing
>> for the Prelude functions that can compile beginner-level programs found in
>> books such as Learn You a Haskell for Great Good.
>>
>> [7] I know that it's a lot more complicated than just replacing FFI calls.
>> I'd have to change around a lot of the code in base as well.
>>
>> [8] I found that the new "invokedynamic" instruction as well as the
>> MethodHandle API (something like function pointers) that were introduced in
>> JDK 7 could fit the bill. But as of now, I want to get a baseline
>> implementation that is compatible with Java 5 so I will not be utilizing
>> these newer features.
>>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>