Bootstrapping using a C compiler (was: RE: Can't compile GHC)
simonmar at microsoft.com
Tue May 10 04:43:18 EDT 2005
Thanks for all the comments in this thread. I'm going to describe some
of the technical issues with making a GHC distribution that you can
bootstrap on any platform from a single set of sources below.
But first, let me explain our thinking on why we don't consider this a
priority: we aim to provide working binary distributions for 99.9% of
the potential users of GHC. Once you have a working binary of GHC,
building GHC again should be easy (if you run into problems, it's a
*high* priority from our point of view to fix it). For the other 0.1%,
we provide detailed instructions for porting GHC to your platform. It
works, but it's a tricky process. Once you have *one* working GHC, you
can build future versions probably for at least 4 years (we pull the
ladder up after ourselves to avoid #ifdef-hell in the compiler).
It may be that we don't have that 99.9% figure for 6.4, but when you
consider versions back to 5.02 (the oldest version we can build 6.4
with, I believe), it's probably true.
If your platform has a binary distribution, but you want to build GHC
using just a C compiler for idealogical reasons, you're slightly
Let me describe some of the technical issues with getting GHC to build
out of the box, with no GHC binary installed, on any platform.
Hopefully this will provide some useful pointers to anyone out there
with a burning desire to improve the GHC OOB experience.
How could we achieve a platform-independent bootstrap?
The only way I can see to do this is to have a set of completely
platform-independent .hc files to bootstrap from. Then the general plan
- bootstrap unregisterised to get a stage1 compiler.
stage1 might be reduced functionality: no GHCi, perhaps
no native code gen, no template haskell.
- use this to build fully registerised libraries & stage2
To get our platform-independent .hc files, the contents of
must all be compilable to platform-independent .hc code. That means
- No platform #ifdefs in the Haskell code in these directories
- No hsc2hs code in these directories
We have to eradicate all platform #ifdefs, because they give rise to
platform-varying .hc code. In practice, this will probably mean
translating some code from Haskell to C. hsc2hs is a similar problem -
that gives rise to platform-varying .hs code. Fortunately there is very
little hsc2hs code left in here - it gives rise to other porting
problems so we removed it.
Our job is slightly easier:
- As long as we still compile via-C, we can omit the native code
generator in stage1, and build it in stage2 only. That's a big
source of #ifdefs we don't have to worry about (arguably we
should just compile in all the variants of the NCG anyway).
However, our job is harder because
- ghc/compiler/main/Constants.hs contains a bunch of baked-in
constants such as the platform word-size. These would have
to be made dynamic - or at least turned into unsafe reads of
an external C variable.
Given that a majority of the #ifdefs are Unix-vs-Windows, we could
consider a partial solution, and settle for having two sets of .hc
files, one for Unix and one for Windows. We could further split the
Unix camp into 32-bit and 64-bit Unix, which makes the job even easier
(the Constants.hs problem partly goes away).
A few very rough figures:
- total platform #ifdefs in the relevant sources: 395
- after omitting the native code gen: 148
- after omitting Windows: 28
- after omitting GHCi: 21
and there are 3 .hsc files in libraries/base, but they aren't necessary
Hmm, it's starting to look plausible. Constants.hs is the biggest
sticking point, and then hacking up the build system to do
Hope this has been of use to someone...
More information about the Glasgow-haskell-users