Request for assistance from Haskell-oriented startup: GHCi performance

Wed Jan 14 23:16:07 UTC 2015

Konrad

That does sound frustrating.

I think your first port of call should be Manuel Chakravarty, the author of accelerate.  The example you give in your stackoverflow post<http://stackoverflow.com/questions/27541609/difference-in-performance-of-compiled-accelerate-code-ran-from-ghci-and-shell> can only be some weird systems thing.  After all, you are executing precisely the same code (namely compiled Accelerate code); it’s just that in one case it’s dynamically linked and excecuted from GHCi and in the other it’s linked and executed by the shell.  I have no clue what could cause that.  I wonder if you are using a GPU and whether that might somehow behave differently.   Could it be the difference between static linking and dynamic linking (which could plausibly account for some startup delay)?  Is it a fixed overhead (eg takes 100ms extra) or does it run a factor of two slower (increase the size of your test case to see)?

I’d be happy to have a Skype call with you, but I am rather unlikely to know anything helpful because it doesn’t sound like a core Haskell issue at all.   You are executing the very same machine instructions!

The overheads of the GHC API to compile and run the expression “main” are pretty small.

I’m copying ghc-devs in case anyone else has any ideas.

Simon

From: Konrad Gądek [mailto:kgadek at gmail.com]
Sent: 14 January 2015 13:59
To: Simon Peyton Jones
Cc: Piotr Młodawski; kgadek at flowbox.io
Subject: Request for assistance from Haskell-oriented startup: GHCi performance

Dear Mr Jones,

My name is Konrad Gądek and I'm one of the programmers at Flowbox ( http://flowbox.io ), a startup that is to bring a fresh view on image composition in movie industry. We proudly use Haskell in nearly all of our development. I believe you may remember our CEO, Wojciech Daniło, from discussions like in this thread: https://phabricator.haskell.org/D69 .

What can be interesting for you is that to achieve our goals as a company, we started developing a new programming language - Luna. Long story short, we believe that Luna could be as beneficial for the Haskell community as Elixir is for Erlang.

However, we found some major performance problems with the code that are as critical for us as they are cryptic. We have found difficulties in pinpointing the actual issue, not to mention solving it. We're getting a bit desperate about that, nobody so far has been able to help us, and so we would like to ask you for help. We would be really really grateful if you could take a look, maybe your fresh ideas could shed some light on the issue. Details are attached below.

Is there any chance we could arrange eg. a Skype call so we could further discuss the matter?

Thank you in advance!

Background

Currently Luna is trans-compiled to Haskell and then compiled to bytecode by GHC. Furthermore, we use ghci to evaluate expressions (the flow graph) interactively. We use accelerate library to perform high-performance computations with the help of graphic cards.

The problem

Executing some of the functions from libraries compiled with -O2 (especially from accelerate) is much slower than calling it from compiled executable (see http://stackoverflow.com/questions/27541609/difference-in-performance-of-compiled-accelerate-code-ran-from-ghci-and-shell and https://github.com/AccelerateHS/accelerate/issues/227).

Maybe there is some other way to interactively evaluate Haskell code, which is more lightweight/more customizable ie. would not require all ghc-api features which are probably slowing down the whole process? Is it possible to just use ghc linker and make function calls simpler and more time efficient?

Details

We feed ghci with statements (using ghc-api) and declarations (using runStmt and runDecls). We can also change imports and language extensions before each call. The overall process is as follows:

  *   on init:
·

     *   set ghcpath to one with our custom installation of ghc with preinstalled graphic libraries
     *   set imports to our libraries
     *   enable/disable appropriate language extensions

  *   for each run:
·

     *   generate haskell code (including datatype declarations, using lenses and TemplateHaskell) and load it to ghci using runDecls
     *   for each expression:
o

        *   run statements that use freshly generated code
        *   bind (lazy) results to variables
        *   evaluate values from bound variables, and get it from GhcMonad to runtime of our interpreter (see http://hackage.haskell.org/package/hint–0.4.2.1/docs/Language-Haskell-Interpreter.html#v:interpret<http://hackage.haskell.org/package/hint%E2%80%930.4.2.1/docs/Language-Haskell-Interpreter.html#v:interpret>)

This behaviour was observed when using GHC 7.8.3 (with D69 patch) on Fedora 20 (x86-64), Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz

Tried so far

  1.  Specializing nearly everything in accelerate library, specializing calls to accelearate methods (no speedup).
  2.  Load precompiled, optimised code to ghci (no speedup).
  3.  Truth to be told, we have no idea what to try next.

--
Konrad Gądek
typechecker team-leader in Flowbox
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20150114/21467779/attachment-0001.html>