[Haskell-cafe] Avoiding name collisions by using value spaces
instead of modules
Brian Hulley
brianh at metamilk.com
Sun Jan 8 08:06:18 EST 2006
Hi -
A main problem I've found with Haskell is that within a module, too many
things are put into the same scope. For example
data Tree a b = Leaf a | Node {elem::b, lhs::Tree a b, rhs::Tree a b}
puts Leaf, Node, elem, lhs, and rhs, into the module's namespace,
as well as Tree. This means that if you want another kind of tree in
the same module you've got to try and think of another word for Leaf,
or else prefix every value constructor with some prefix of the tycon to
try and prevent name collisions.
I propose that the above declaration should introduce a *new module* Tree,
as a sub module of the containing module, and Leaf, Node, elem etc will be
put into this module, and not the module containing the data declaration
itself.
Identifiers in this child module could then either be used qualified or
unqualified as long as they don't conflict with anything in the parent
(containing) module, thus in the parent module, we could have:
a = Tree.Leaf 3
b = Node{elem = 6, lhs = Leaf 2, rhs = Leaf 3}
c = Tree.elem b
c = b^elem
where ^ is just a helpful sugar which binds tighter than function
application and can be used anywhere to allow an "object oriented" way of
thinking where the first arg is put on the left of the function (more on
this later).
Also, we could generalise the GHC GADT style data declarations by allowing
any declarations to appear in the where part, and allow modules to consist
of a single data declaration (ie begin with the keyword data instead of
module) so that the contents of the Data.Set module would instead be
something like:
data Ord a => Set a = .... where
insert :: Set a -> a -> Set a
...
This would allow Set to be imported qualified into other modules where we
could refer to the Set type by the single word Set instead of Set.Set (which
I always think looks very strange indeed) ie
module M where
import qualified Data.Set as Set
f :: a -> Set a -- no longer need to write a -> Set.Set a
In fact, the whole complexity asociated with hiding components of a module
and allowing unqualified imports could be discarded, since the correct
resolution for each identifier would be determined by its typed context,
just as in OOP the resolution is determined by the type of the object.
So to summarise so far, for any value f :: T0 -> T1 -> ... -> Tn, we
construct the tuple (Q0, Q1, ... Qn) where for each i, Qi is obtained from
Ti by ignoring all predicates and quantification and replacing each tyvar by
the symbol '?'.
For example, for
module M where
foo :: forall b. Eq a => Set (Tree a) -> Tree ([a],b) -> [Tree (a,b)]
we ignore the forall and Eq, and replace tyvars and tuple the results to
get:
M.(Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo
as the fully qualified reference to the entity we've just declared.
If we call the tuple (Set, ...) the "ty-tuple", which specifies a space of
values (ie something that each value is considered to "belong" to - the
analogue of the object in OOP), then we can define a projection from a space
into a containing space as follows:
1) Let Qi == Ui Si0 Si1 ... Sik be any element of a ty-tuple P. Then we can
form a new ty-tuple P' by replacing Qi by Ui.
2) Let Qi be any element of a ty-tuple P of arity n+1. Then we can form
P' == (Q0, ..., Qi-1, Qi+1, ... Qn) of arity n
In both cases, we can use P' to qualify the identifier to get a value
reference as long as this still has enough information for disambiguating
other uses of the identifier in M.
Applying these transformations of the ty-tuple to the example above,
any of the following could be used as a partial qualification of foo in M:
(Set Tree, Tree ([],?), [Tree]).foo
(Set, Tree, []).foo
Tree.foo
foo
Rule 2 allows us to propagate identifiers back up to their ancestor modules
as long as there are no conflicts along the way.
The type inference algorithm could then be modified (I hope) to use the
top-level annotation for a value declaration to determine the entities that
identifiers refer to, by searching in the appropriate value spaces
determined by the types of the args and return value.
label :: Tree a -> Int -> (Tree a, Int)
label (Leaf x) i = (Leaf i, i+1)
The identifier Leaf is resolved in the current namespace augmented by the
namespace for module Tree, thus we don't have to explicitly write Tree.Leaf
even though the declaration of label occurs outside the Tree module itself.
(I suggest that top-level type annotations should be mandatory since without
them one just drowns in a sea of TI confusion when there is a type error. By
making them mandatory the TI algorithm could make real use of them to
resolve identifier bindings as in OOP)
Similarly:
foo :: a -> Set a
foo x = singleton x
There is no need to say Set.singleton x, because we know that the result of
foo has type Set a therefore we search in the current module augmented by
the contents of the Set module (every type is also a module) for a binding
for "singleton" which maps from a to Set a.
Finally (apologies for this long post), returning to the use of ^ to allow
an object oriented way of thinking consider:
insert :: a -> Set a -> Set a
ps = singleton 3
qs = insert 4 ps
rs = ps^insert 4
When resolving "insert" used in the binding for rs, the compiler should see
that we are looking for some function Set Int -> Int -> Set Int, and hence
will be looking in the current module augmented by the Set module. However
the Set module only has a binding for insert with type a -> Set a -> Set a.
So the compiler should generate a new function insert' from insert by moving
the first Set a arg to the front.
Summary:
1) Every value binding belongs to a value space represented by a ty-tuple
which abstracts away from the details of the actual type
2) The space of ty-tuples forms a lattice, and in particular this means
qualification is optional when there are no conflicts (we can use Leaf 3
instead of Tree.Leaf 3), and we can use partial qualification to supply just
the disambiguation info needed.
3) The TI algorithm should use the type of the expression (generated
top-down from the top-level annotation) to search in the correct value
spaces to resolve identifiers
4) Record field selection doesn't need any special scoping rules, and is
just one example of a general object-oriented way of thinking about function
application.
5) We can get all the advantages of automatic namespace management the OOP
programmers take for granted, in functional programming, by using value
spaces as the analogue of objects, and can thereby get rid of complicated
import/export directives and improve code readability with less typing (but
making more use of typing - excuse the pun :-))
I'm in the process of trying to implement a pre-processor for Haskell which
would use the algorithms sketched above (as well as fixing a few other
things I think are wrong with Haskell such as the way the current layout
rule allows one to write code that would break if you replaced an identifier
with one that had a different number of characters in it etc), so any
feedback on these ideas would be welcome.
Thanks for reading so far!
Brian Hulley
PS Everything above is purely intended for resolving the kind of ad-hoc
overloading that is not amenable to treatment by type classes (which
transforms some subset of the ad-hoc-overloaded functions which share the
same shape into a single function with implicit dictionary arguments (in the
usual implementation)).
More information about the Haskell-Cafe
mailing list