[Yhc] Some more changes to core

Sat Aug 4 18:01:38 EDT 2007

Hi All,

In my quest to get Yhc bytecode compiled from Yhc Core I've discovered 
that I will need to make (yet more) changes to Core. In short the 
changes necessary are (from most substantial to least substantial):

    a) changing the way names are encoded
    b) adding to Core a list of symbols imported from other modules
    c) adding yet more things to the CorePrim data type

Some of these may break some people's code, but hopefully not too many 
or too much.

This email is quite long so feel free to skip to the relevant sections 
if you don't want to read it all.

---------------------------------------------
BACKGROUND
---------------------------------------------

Previously I was looking at converting the names generated by Core back 
into nhc98's internal Id data type and then using the nhc98 symbol 
table. I've decided this is a bad idea because it seriously limits what 
possible transformations could be made to core; anything that would 
cause a mismatch with the nhc98 symbol table wouldn't work. This is very 
much against the spirit of converting the backend to generate from Core 
in the first place.

Thus the more ideal solution would be to convert the internal PosLambda 
to a Core form that contained enough information to do the complete 
bytecode generation process. Having done the translation the nhc98 
symbol table could then simply be forgotten about.

However, unfortunately Core at the moment, doesn't quite have all the 
information needed.

---------------------------------------------
CHANGE a) CHANGING THE ENCODING OF NAMES
---------------------------------------------

I propose changing the way Core encodes names from Module.Item to 
Module;Item. For example, the fromJust function would appear as

    Data.Maybe;fromJust x = ...

instead of

    Data.Maybe.fromJust x = ....

-- CONSEQUENCES ---------

- Anyone who relies on being able to parse the names will find the name 
parsing code will break.

- Anyone trying to convert the names to valid Haskell identifiers will 
need to change their code.

-- REASON ---------------

The reason this change is necessary is to do with class instances and 
how the interpreter load symbols. Consider the following class instance

    module Foo.Bar

    data Baz = Baz

    instance Eq Baz where
      a == b = True

The Core generated for the '==' function would currently look like:

    Foo.Bar.Prelude.Eq.Foo.Bar.Baz.== a b = True

This encodes:
    - that the instance is defined in the Foo.Bar module
    - that it is an instance of the class Prelude.Eq
    - that the data type being given an instance is Foo.Bar.Baz
    - that the function being defined is '=='

The problem is with the ambiguity in separating these components. 
Suppose some function defined in another module needs to use the == 
function for the Baz datatype. It would do this by asking the 
interpreter to load

     "Foo.Bar.Prelude.Eq.Foo.Bar.Baz.=="

In order to load this the interpreter first needs to work out which 
module file it should load. Unfortunately from this name alone it has no 
way of knowing. This name could be

     (Foo.Bar.Prelude).(Eq.Foo.Bar.Baz.==)

Or

     (Foo).(Bar.Prelude.Eq).(Foo.Bar.Baz.==)

Or even

     (Foo.Bar.Prelude.Eq.Foo.Bar.Baz).(==)

The name simply doesn't contain enough information to decide which part 
is the module name and which part is the item in that module. I thus 
suggest changing the name Core generates to

     Foo.Bar;Prelude.Eq.Foo.Bar.Baz.==

which makes it clear. Semicolon is a good choice of separator because it 
is one of the few characters that cannot appear in a valid Haskell 
identifier.

---------------------------------------------
CHANGE b) ADDING AN IMPORT TABLE
---------------------------------------------

I propose changing the Core datatype to include a list of symbols that 
are imported from other modules. So

   data Core = {
	...
	coreImportSymbols :: [CoreImport]
         ...
   }

   data CoreImport = CoreImportData CoreData
		  | CoreImportFunc {
			coreImportName :: String,
			coreImportArity :: Int
		  }

-- CONSEQUENCES ---------

- Anyone who does a complete pattern match on Core will find their code 
breaks as it will have gained an extra field.

-- REASON ---------------  	

The only information Yhc Core currently provides about symbols defined 
in other modules is their name. This is not enough information to 
compile applications to those functions or make cases on those datatypes.

For example, in module Foo you make an application to the function 
'Bar.bar' such as

     Foo.foo x = Bar.bar (x+1)

To compile this application the compiler needs to know the arity of the 
bar function. Depending on the arity it will then either make a partial 
application, a saturated application or a super-saturated application 
(each of which would generate different bytecodes).

Similarly when casing on a datatype

    Foo.foo x = case x of
		 Bar.Bar y -> ...

The compiler needs to know what the tag number for Bar.Bar is, and 
whether this case statement is complete or partial (again each has 
different bytecodes).

---------------------------------------------
CHANGE c) ADDING FIELDS TO CorePrim
---------------------------------------------

I propose changing the CorePrim datatype to:

      CorePrim {
         ...
         corePrimExternal :: String, -- the 'C' name of the function
	corePrimConv :: String,     -- the calling convention
         corePrimImport :: Bool,     -- whether this is import/export
         corePrimTypes :: [String]   -- the types of the arguments/return
      }

Three of these changes were suggested earlier. The types would be a 
simple encoding of the arguments and return type, so.

    foreign import malloc :: Int -> Ptr a

would have types

    [ "Prelude.Int", "Data.Foreign.Ptr a" ]

-- CONSEQUENCES ---------

- Anyone who does a complete pattern match on CorePrim (it's not 
recommended) will find their code breaks.

- Recommendation: from now on people don't do a complete pattern match 
on CorePrim instead using the field selectors and

      (CorePrim{})

for pattern matches. This will make it easier to accommodate any further 
changes to CorePrim (which may well be necessary).

-- REASON ---------------  	

The current CorePrim datatype does not contain enough information to 
compile calls to foreign functions. The above changes would mean that 
from this bytecode backend's point of view this would no longer be true.

---------------------------------------------
CONCLUSION
---------------------------------------------

 From a detailed look at the code, and a start at implementing the Yhc 
Core to Yhc bytecode compiler, I believe the changes listed above are 
everything that's necessary. I could easily be proven wrong on that one 
though ;-)

Cheers

Tom