[Yhc] Some more changes to core
Tom Shackell
shackell at cs.york.ac.uk
Sat Aug 4 18:01:38 EDT 2007
Hi All,
In my quest to get Yhc bytecode compiled from Yhc Core I've discovered
that I will need to make (yet more) changes to Core. In short the
changes necessary are (from most substantial to least substantial):
a) changing the way names are encoded
b) adding to Core a list of symbols imported from other modules
c) adding yet more things to the CorePrim data type
Some of these may break some people's code, but hopefully not too many
or too much.
This email is quite long so feel free to skip to the relevant sections
if you don't want to read it all.
---------------------------------------------
BACKGROUND
---------------------------------------------
Previously I was looking at converting the names generated by Core back
into nhc98's internal Id data type and then using the nhc98 symbol
table. I've decided this is a bad idea because it seriously limits what
possible transformations could be made to core; anything that would
cause a mismatch with the nhc98 symbol table wouldn't work. This is very
much against the spirit of converting the backend to generate from Core
in the first place.
Thus the more ideal solution would be to convert the internal PosLambda
to a Core form that contained enough information to do the complete
bytecode generation process. Having done the translation the nhc98
symbol table could then simply be forgotten about.
However, unfortunately Core at the moment, doesn't quite have all the
information needed.
---------------------------------------------
CHANGE a) CHANGING THE ENCODING OF NAMES
---------------------------------------------
I propose changing the way Core encodes names from Module.Item to
Module;Item. For example, the fromJust function would appear as
Data.Maybe;fromJust x = ...
instead of
Data.Maybe.fromJust x = ....
-- CONSEQUENCES ---------
- Anyone who relies on being able to parse the names will find the name
parsing code will break.
- Anyone trying to convert the names to valid Haskell identifiers will
need to change their code.
-- REASON ---------------
The reason this change is necessary is to do with class instances and
how the interpreter load symbols. Consider the following class instance
module Foo.Bar
data Baz = Baz
instance Eq Baz where
a == b = True
The Core generated for the '==' function would currently look like:
Foo.Bar.Prelude.Eq.Foo.Bar.Baz.== a b = True
This encodes:
- that the instance is defined in the Foo.Bar module
- that it is an instance of the class Prelude.Eq
- that the data type being given an instance is Foo.Bar.Baz
- that the function being defined is '=='
The problem is with the ambiguity in separating these components.
Suppose some function defined in another module needs to use the ==
function for the Baz datatype. It would do this by asking the
interpreter to load
"Foo.Bar.Prelude.Eq.Foo.Bar.Baz.=="
In order to load this the interpreter first needs to work out which
module file it should load. Unfortunately from this name alone it has no
way of knowing. This name could be
(Foo.Bar.Prelude).(Eq.Foo.Bar.Baz.==)
Or
(Foo).(Bar.Prelude.Eq).(Foo.Bar.Baz.==)
Or even
(Foo.Bar.Prelude.Eq.Foo.Bar.Baz).(==)
The name simply doesn't contain enough information to decide which part
is the module name and which part is the item in that module. I thus
suggest changing the name Core generates to
Foo.Bar;Prelude.Eq.Foo.Bar.Baz.==
which makes it clear. Semicolon is a good choice of separator because it
is one of the few characters that cannot appear in a valid Haskell
identifier.
---------------------------------------------
CHANGE b) ADDING AN IMPORT TABLE
---------------------------------------------
I propose changing the Core datatype to include a list of symbols that
are imported from other modules. So
data Core = {
...
coreImportSymbols :: [CoreImport]
...
}
data CoreImport = CoreImportData CoreData
| CoreImportFunc {
coreImportName :: String,
coreImportArity :: Int
}
-- CONSEQUENCES ---------
- Anyone who does a complete pattern match on Core will find their code
breaks as it will have gained an extra field.
-- REASON ---------------
The only information Yhc Core currently provides about symbols defined
in other modules is their name. This is not enough information to
compile applications to those functions or make cases on those datatypes.
For example, in module Foo you make an application to the function
'Bar.bar' such as
Foo.foo x = Bar.bar (x+1)
To compile this application the compiler needs to know the arity of the
bar function. Depending on the arity it will then either make a partial
application, a saturated application or a super-saturated application
(each of which would generate different bytecodes).
Similarly when casing on a datatype
Foo.foo x = case x of
Bar.Bar y -> ...
The compiler needs to know what the tag number for Bar.Bar is, and
whether this case statement is complete or partial (again each has
different bytecodes).
---------------------------------------------
CHANGE c) ADDING FIELDS TO CorePrim
---------------------------------------------
I propose changing the CorePrim datatype to:
CorePrim {
...
corePrimExternal :: String, -- the 'C' name of the function
corePrimConv :: String, -- the calling convention
corePrimImport :: Bool, -- whether this is import/export
corePrimTypes :: [String] -- the types of the arguments/return
}
Three of these changes were suggested earlier. The types would be a
simple encoding of the arguments and return type, so.
foreign import malloc :: Int -> Ptr a
would have types
[ "Prelude.Int", "Data.Foreign.Ptr a" ]
-- CONSEQUENCES ---------
- Anyone who does a complete pattern match on CorePrim (it's not
recommended) will find their code breaks.
- Recommendation: from now on people don't do a complete pattern match
on CorePrim instead using the field selectors and
(CorePrim{})
for pattern matches. This will make it easier to accommodate any further
changes to CorePrim (which may well be necessary).
-- REASON ---------------
The current CorePrim datatype does not contain enough information to
compile calls to foreign functions. The above changes would mean that
from this bytecode backend's point of view this would no longer be true.
---------------------------------------------
CONCLUSION
---------------------------------------------
From a detailed look at the code, and a start at implementing the Yhc
Core to Yhc bytecode compiler, I believe the changes listed above are
everything that's necessary. I could easily be proven wrong on that one
though ;-)
Cheers
Tom
More information about the Yhc
mailing list