GHC source code improvement ideas

Fri Jan 4 09:06:33 EST 2008

Twan van Laarhoven wrote:
> Hello GHC people,
> 
> I was trying my hand at some GHC hacking, and I couldn't help but notice 
> that much of the code looks (IMHO) horrible. There are some things that 
> look as if they haven't been touched since the Haskell 1.3 days. The 
> most striking is the use of `thenXX` instead of do notation.
> 
> I was thinking of cleaning up some of this. In particular:
> 
>  1a. Use do notation where possible instead of `thenXX`.

Yes.

>  1b. Even better, use Applicative where possible. For example
> 
>         ds_type (HsFunTy ty1 ty2)
>           = dsHsType ty1            `thenM` \ tau_ty1 ->
>             dsHsType ty2            `thenM` \ tau_ty2 ->
>             returnM (mkFunTy tau_ty1 tau_ty2)
> 
>      Could be rewritten as
> 
>         ds_type (HsFunTy ty1 ty2)
>           = mkFunTy <$> dsHsType ty1 <*> dsHsType ty2

No, IMO.  This just adds another abstraction that a potential GHC 
contributor has to learn about.  Including me :-)  It doesn't hurt to write 
  out the code in full sometimes, and as the GHC coding style guidelines say:

   Remember: other people have to be able to quickly understand what you've
   done, and overuse of abstractions just serves to obscure the really
   tricky stuff, and there's no shortage of that in GHC.

http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle

(I think I might be quoting myself there, in which case it doesn't really 
add much credibility to my argument :).

Also bear in mind that we have to be able to compile GHC with older 
versions of itself (currently back to 6.2.2), which explains why we're 
often conservative with respect to using external libraries.

>  2. Investigate the need for all these mappM functions. To quote the 
> source: "Standard combinators, but specialised for this monad (for 
> efficiency)". Wouldn't a SPECIALIZE pragma work just as well?

It does look like we could do away with some of those, yes.

>  3. In general, use standard functions when they are available. 
> Currently GHC has its own sort function and its own versions of 
> FiniteMap/Data.Map and Text.PrettyPrint. Using standard functions/data 
> structures will a) make GHC smaller and therefore easier to maintain, 
> and b) will make the code easier to understand for someone who knows the 
> standard library.

In general yes.  But bear in mind there's another principle at work here: 
we want to insulate GHC as much as possible from its environment, to reduce 
the chances of spurious failures or performance regressions.

I believe our version of Text.PrettyPrint has diverged from the library 
version - last time I looked it wasn't obvious that we could replace it.

Regarding Data.Map, I'd be interested in trying out AVL trees instead, to 
see if they have better performance, but then we'd have to import the code 
of course.

>  4. A more radical change would be introducing hierarchical modules. 
> This could be just a matter of renaming the directories to start with an 
> upper case character and changing the import declarations. This gives 
> module names like "Typecheck.TcGadt". The tc is redundant here, so 
> perhaps it should be renamed to "Typecheck.Gadt" or "Typecheck.GADT". 
> Perhaps even better would be "GHC.Typecheck.GADT", this way some modules 
> can be exposed as part of the GHC api.

Yes, I've wondered about doing this in the past.  It does mean more typing, 
though: clearly 'import BasicTypes.Id' is more painful than just 'import 
Id' and doesn't add much, so IMO we'd need to put some thought into a 
structure that makes sense.

> How do the GHC developers feel about this? Would such paches be gladly 
> excepted? Or will they be directly forwarded to the bit bucket?

Within the constraints I've outlined above, cleanups are definitely welcome.

Another worthwhile gardening-type activity is to go around elimianting 
warnings.  Many modules have {-# OPTIONS -w #-} at the top, waiting for 
someone to go in and clean up all the warnings.  They do catch real bugs, 
so this is not a waste of time by any means.

In addition to this, things like

  - identifying and removing dead code
  - commoning up duplicate code fragments
  - improving test coverage
  - profiling and optimising hotspots

are all useful tasks.

Cheers,
	Simon