[Haskell-cafe] Compilers: Why do we need a core language?

Sat Nov 24 05:26:25 CET 2012

On 11/20/12 6:54 AM, citb at lavabit.com wrote:
> Hello,
>
> I know nothing about compilers and interpreters. I checked several
> books, but none of them explained why we have to translate a
> high-level language into a small (core) language. Is it impossible
> (very hard) to directly translate high-level language into machine
> code?

It is possible to remove stages in the standard compilation pipeline, 
and doing so can speed up compilation time. For example, Perl doesn't 
build an abstract syntax tree (for now-outdated performance reasons), 
and instead compiles the source language directly into bytecode (which 
is then interpreted by the runtime). This is one of the reasons why Perl 
is (or was?) so much faster than other interpreted languages like Python 
etc. But there are some big problems to beware of:

* Not having a concrete representation for intermediate forms can rule 
out performing obvious optimizations. And I do mean *obvious* 
optimizations; I can talk more about this problem in Perl, if you really 
care.

* Not having a concrete representation for intermediate forms means 
mixing together code from many different stages of the compilation 
process. This sort of spaghetti code is hard to maintain, and even 
harder to explain to new developers.

* Not having a concrete representation for intermediate forms can lead 
to code duplication (in the compiler) because there's no convenient way 
to abstract over certain patterns. And, of course, repeating code is 
just begging for inconsistency bugs due to the maintenance burden of 
keeping all the copies in sync.

All three points are major driving forces in having intermediate forms. 
Joachim Breitner gave some illustrations for why intermediate forms are 
"inevitable". But then, once you have intermediate forms, if you're 
interested in ensuring correctness and having a formal(izable) 
semantics, then it makes sense to try to turn those intermediate forms 
into an actual intermediate language. Intermediate forms are just an 
implementation detail, but intermediate languages can be reasoned about 
in the same ways as other languages. So it's more about shifting 
perspective in order to turn systems problems (implementation details) 
into language problems (semantics of the Core).

Furthermore, if you're a PL person and really are trying to ensure 
correctness of your language (e.g., type safety), you want to try to 
make your proof obligation as small as possible. For convenience to 
programmers, source code is full of constructs which are all more or 
less equivalent. But this is a problem for making proofs because when we 
perform case analysis on an expression we have to deal with all those 
different syntactic forms. Whereas if you first compile everything down 
into a small core language, then the proof has far fewer syntactic forms 
it has to deal with and so the proof is much easier. Bear in mind that 
this isn't just a linear problem. If we have N different syntactic 
forms, then proving something like confluence will require proving 
O(N^2) cases since you're comparing two different terms.

-- 
Live well,
~wren