[Haskell-cafe] Compilers: Why do we need a core language?
wren ng thornton
wren at freegeek.org
Sat Nov 24 05:26:25 CET 2012
On 11/20/12 6:54 AM, citb at lavabit.com wrote:
> Hello,
>
> I know nothing about compilers and interpreters. I checked several
> books, but none of them explained why we have to translate a
> high-level language into a small (core) language. Is it impossible
> (very hard) to directly translate high-level language into machine
> code?
It is possible to remove stages in the standard compilation pipeline,
and doing so can speed up compilation time. For example, Perl doesn't
build an abstract syntax tree (for now-outdated performance reasons),
and instead compiles the source language directly into bytecode (which
is then interpreted by the runtime). This is one of the reasons why Perl
is (or was?) so much faster than other interpreted languages like Python
etc. But there are some big problems to beware of:
* Not having a concrete representation for intermediate forms can rule
out performing obvious optimizations. And I do mean *obvious*
optimizations; I can talk more about this problem in Perl, if you really
care.
* Not having a concrete representation for intermediate forms means
mixing together code from many different stages of the compilation
process. This sort of spaghetti code is hard to maintain, and even
harder to explain to new developers.
* Not having a concrete representation for intermediate forms can lead
to code duplication (in the compiler) because there's no convenient way
to abstract over certain patterns. And, of course, repeating code is
just begging for inconsistency bugs due to the maintenance burden of
keeping all the copies in sync.
All three points are major driving forces in having intermediate forms.
Joachim Breitner gave some illustrations for why intermediate forms are
"inevitable". But then, once you have intermediate forms, if you're
interested in ensuring correctness and having a formal(izable)
semantics, then it makes sense to try to turn those intermediate forms
into an actual intermediate language. Intermediate forms are just an
implementation detail, but intermediate languages can be reasoned about
in the same ways as other languages. So it's more about shifting
perspective in order to turn systems problems (implementation details)
into language problems (semantics of the Core).
Furthermore, if you're a PL person and really are trying to ensure
correctness of your language (e.g., type safety), you want to try to
make your proof obligation as small as possible. For convenience to
programmers, source code is full of constructs which are all more or
less equivalent. But this is a problem for making proofs because when we
perform case analysis on an expression we have to deal with all those
different syntactic forms. Whereas if you first compile everything down
into a small core language, then the proof has far fewer syntactic forms
it has to deal with and so the proof is much easier. Bear in mind that
this isn't just a linear problem. If we have N different syntactic
forms, then proving something like confluence will require proving
O(N^2) cases since you're comparing two different terms.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list