Potential GSoC proposal: Reduce the speed gap between 'ghc -c' and 'ghc --make'

Simon Marlow marlowsd at gmail.com
Wed Apr 25 09:57:41 CEST 2012

On 25/04/2012 03:17, Mikhail Glushenkov wrote:
> Hello Simon,
> Sorry for the delay.
> On Tue, Apr 10, 2012 at 1:03 PM, Simon Marlow<marlowsd at gmail.com>  wrote:
>>> Questions:
>>> Would implementing this optimisation be a worthwhile/realistic GSoC
>>> project?
>>> What are other potential ways to bring 'ghc -c' performance up to par
>>> with 'ghc --make'?
>> My guess is that this won't have a significant impact on ghc -c compile
>> times.
>> The advantage of squashing the .hi files for a package together is that they
>> could share a string table, which would save a bit of space and time, but I
>> think the time saved is small compared to the cost of deserialising and
>> typechecking the declarations from the interface, which still has to be
>> done.  In fact it might make things worse, if the string table for the whole
>> base package is larger than the individual tables that would be read from
>> .hi files.  I don't think mmap() will buy very much over the current scheme
>> of just reading the file into a ByteArray.
> Thank you for the answer.
> I'll be working on another project during the summer, but I'm still
> interested in making interface files load faster.
> The idea that I currently like the most is to make it possible to save
> and load objects in the "GHC heap format". That way, deserialisation
> could be done with a simple fread() and a fast pointer fixup pass,
> which would hopefully make running many 'ghc -c' processes as fast as
> a single 'ghc --make'. This trick is commonly employed in the games
> industry to speed-up load times [1]. Given that Haskell is a
> garbage-collected language, the implementation will be trickier than
> in C++ and will have to be done on the RTS level.
> Is this a good idea? How hard it would be to implement this optimisation?

I believe OCaml does something like this.

I think the main difficulty is that the data structures in the heap are 
not the same every time, because we allocate unique identifiers 
sequentially as each Name is created.  So to make this work you would 
have to make Names globally unique.  Maybe using a 64-bit hash instead 
of the sequentially-allocated uniques would work, but that would entail 
quite a performance hit on 32-bit platforms (GHC uses IntMap everywhere 
with Unique as the key).

On top of this there will be a *lot* of other complications (e.g. 
handling sharing well, mapping info pointers somehow).  Personally I 
think it's at best very ambitious, and at worst not at all practical.


> Another idea (that I like less) is to implement a "build server" mode
> for GHC. That way, instead of a single 'ghc --make' we could run
> several ghc build servers in parallel. However, Evan Laforge's efforts
> in this direction didn't bring the expected speedup. Perhaps it's
> possible to improve on his work.
> [1] http://www.gamasutra.com/view/feature/132376/delicious_data_baking.php?print=1

More information about the Glasgow-haskell-users mailing list