[jhc] [john@repetae.net: Re: [Haskell-cafe] I want to write a compiler]

Tue Mar 10 00:21:21 EDT 2009

forwarding this to the jhc list as it is jhc relevant

----- Forwarded message from John Meacham <john at repetae.net> -----

To: haskell-cafe at haskell.org
List-Id: The Haskell Cafe <haskell-cafe.haskell.org>
Date: Mon, 9 Mar 2009 17:41:01 -0700
From: John Meacham <john at repetae.net>
Subject: Re: [Haskell-cafe] I want to write a compiler

On Mon, Mar 09, 2009 at 09:25:58AM -0500, Austin Seipp wrote:
> Indeed, I stumbled upon it whilst looking at how unsafeCoerce worked
> (to find out it is super-duper-special and implemented as part of E.)
> I think it's actually pretty clever, and who knows, maybe it could be
> useful as at least a debugging aid. :)

It is actually  not all that special, it is just a primitive like any
other than transates to a nop, or no code. Since the optimizer can't
know what is inside a primitive in general, it leaves it alone and the
only affect is the type coercion implicit in its signature. When
translating to Grin, the unsafeCoerces are simply dropped. Coercions
used to be more complicated because once upon a time I required them to
be used to translate to/from newtype declared synonyms, I found they
inhibited optimizations too much so switched to the current mechanism
whereby newtype expansion happens transparently in the type checker when
needed. As a side effect I worked hard to try to recognize and optimize
around coercions, but now that they occur fairly rarely, it isn't much
of an issue any more.

> This is a very good point I hadn't even thought about! Indeed, since
> GRIN represents thunks in a defunctionalized way - encoded as nodes -
> dealing with boxed/unboxed values becomes more of the compiler's job,
> since the nature of unboxed values etc. becomes more transparent.

The main reason I think it is important is because I feel not including
unboxed values from the get-go was one of the biggest false starts I had
when writing jhc. When I initially started writing it I considered
unboxed values as a feature to be added later. When I started
implementing them, I found that there were a lot of decisions I would
have made differently and a lot of places they would have greatly
simplified my initial design had I had them from the start. Namely, A
whole host of 'compiler magic' could have gone away if I implemented
all primitive types in pure haskell like I implement floats and chars now,

data Float = Float_ Float32_ 
data Char = Char Bits32_

(where Float32_ and Bits32_ are the unboxed raw types)

Since instead I had the compiler implement Int and friends directly, it
suddenly had the responisibity to conjure up all sorts of operations on
them, such as the Num instances, creating the big mess of compiler magic
in PrimitiveOperators.hs. The more I can express in the haskell source
directly the better, it raises odd questions when the compiler has to do
things like implement Num instances directly, like the instances are
built in, but we might not have 'imported' the types needed to declare
them into the current scope, so we end up having to deal with instancse
for types that don't exist yet.

In any case, yeah. if you plan on unboxed types at some point, include
them from the start because they can simplify a lot to offset their cost
of implementation, by trying to add them on later I ended up having to
pay the cost of implementing them, but also had to deal with the cruft
left over from not utilizing them from the beginning.

> Since you bring this up, I figure this decision also had some
> influence on E in lhc/jhc, considering its type system is rich enough
> to distinguish values in whnf/boxed/unboxed etc.?

Yeah, there were a couple things that motivated this. A big one is that
evals are bad in jhc, signifigantly moreso than traditional ghc. The
points-to analysis is the main way to fight it, but I also decided I
wanted to give it the best shot possible by getting rid of as many evals
as I could in higher level optimizations, which necessitated a fairly
rich type system since it actually made sense for jhc to keep strict
track of what was in WHNF vs not even if they were both boxed. in
general ghc didn't care unless a value was either CPRable or unboxable,
but jhc can benefit from such knowledge no matter what the type, even if
it is a partially applied function. Though, it is interesting that GHC
may be moving to a jhc-style semi-tagging mechanism according to this
paper[1] so more optimizations may help both equally in the future. 

The other main motivating factor was that I was always enamoured with
the idea of jhc supporting more than one source language, in particular,
I wanted to explore it having an SML front end, so wanted a core type
system that was equally good at expressing lazy haskell as strict ML,
and more importantly would be able to seamlessly intermingle the two
without difficulty. So, when designing it, I kept that in mind, "is this
equally useful as a ML compilers core?" or a "wacky ML/Haskell hybrid?".
Although the ML front end is more of a thought experiment at this point,
I think it helped me come up with a coherent type system that had
intruiging possibilies for haskell optimizations, as well as some
language implications, such as the ability to declare new 'strict'
algebraic types that may lead to new ideas for language extensions.

        John

[1]
http://research.microsoft.com/en-us/um/people/simonpj/papers/ptr-tag/

-- 
John Meacham - ⑆repetae.net⑆john⑈
-- 
Haskell-Cafe mailing list
Haskell-Cafe at haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

----- End forwarded message -----

-- 
John Meacham - ⑆repetae.net⑆john⑈