[Haskell-cafe] Avoiding name collisions by using value spaces instead of modules

Sun Jan 8 08:06:18 EST 2006

Hi -
A main problem I've found with Haskell is that within a module, too many 
things are put into the same scope. For example

    data Tree a b = Leaf a | Node {elem::b, lhs::Tree a b, rhs::Tree a b}

puts Leaf, Node, elem, lhs, and rhs, into the module's namespace,
as well as Tree. This means that if you want another kind of tree in
the same module you've got to try and think of another word for Leaf,
or else prefix every value constructor with some prefix of the tycon to
try and prevent name collisions.

I propose that the above declaration should introduce a *new module* Tree, 
as a sub module of the containing module, and Leaf, Node, elem etc will be 
put into this module, and not the module containing the data declaration 
itself.

Identifiers in this child module could then either be used qualified or 
unqualified as long as they don't conflict with anything in the parent 
(containing) module, thus in the parent module, we could have:

    a = Tree.Leaf 3
    b = Node{elem = 6, lhs = Leaf 2, rhs = Leaf 3}
    c = Tree.elem b
    c = b^elem

where ^ is just a helpful sugar which binds tighter than function 
application and can be used anywhere to allow an "object oriented" way of 
thinking where the first arg is put on the left of the function (more on 
this later).

Also, we could generalise the GHC GADT style data declarations by allowing 
any declarations to appear in the where part, and allow modules to consist 
of a single data declaration (ie begin with the keyword data instead of 
module) so that the contents of the Data.Set module would instead be 
something like:

    data Ord a => Set a = .... where
        insert :: Set a -> a -> Set a
        ...

This would allow Set to be imported qualified into other modules where we 
could refer to the Set type by the single word Set instead of Set.Set (which 
I always think looks very strange indeed) ie

    module M where
        import qualified Data.Set as Set

        f :: a -> Set a            -- no longer need to write a -> Set.Set a

In fact, the whole complexity asociated with hiding components of a module 
and allowing unqualified imports could be discarded, since the correct 
resolution for each identifier would be determined by its typed context, 
just as in OOP the resolution is determined by the type of the object.

So to summarise so far, for any value f  :: T0 -> T1 -> ... -> Tn, we 
construct the tuple (Q0, Q1, ... Qn) where for each i, Qi is obtained from 
Ti by ignoring all predicates and quantification and replacing each tyvar by 
the symbol '?'.

For example, for

module M where
foo :: forall b. Eq a =>  Set (Tree a) -> Tree ([a],b) -> [Tree (a,b)]

we ignore the forall and Eq, and replace tyvars and tuple the results to 
get:

   M.(Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo

as the fully qualified reference to the entity we've just declared.

If we call the tuple (Set, ...) the "ty-tuple", which specifies a space of 
values (ie something that each value is considered to "belong" to - the 
analogue of the object in OOP), then we can define a projection from a space 
into a containing space as follows:

1) Let Qi == Ui Si0 Si1 ... Sik be any element of a ty-tuple P. Then we can 
form a new ty-tuple P' by replacing Qi by Ui.

2) Let Qi be any element of a ty-tuple P of arity n+1. Then we can form
P' == (Q0, ..., Qi-1, Qi+1, ... Qn) of arity n

In both cases, we can use P' to qualify the identifier to get a value 
reference as long as this still has enough information for disambiguating 
other uses of the identifier in M.

Applying these transformations of the ty-tuple to the example above,
any of the following could be used as a partial qualification of foo in M:

    (Set Tree, Tree ([],?), [Tree]).foo
    (Set, Tree, []).foo
    Tree.foo
    foo

Rule 2 allows us to propagate identifiers back up to their ancestor modules 
as long as there are no conflicts along the way.

The type inference algorithm could then be modified (I hope) to use the 
top-level annotation for a value declaration to determine the entities that 
identifiers refer to, by searching in the appropriate value spaces 
determined by the types of the args and return value.

    label :: Tree a -> Int -> (Tree a, Int)
    label (Leaf x) i = (Leaf i, i+1)

The identifier Leaf is resolved in the current namespace augmented by the 
namespace for module Tree, thus we don't have to explicitly write Tree.Leaf 
even though the declaration of label occurs outside the Tree module itself.

(I suggest that top-level type annotations should be mandatory since without 
them one just drowns in a sea of TI confusion when there is a type error. By 
making them mandatory the TI algorithm could make real use of them to 
resolve identifier bindings as in OOP)

Similarly:

    foo :: a -> Set a
    foo x = singleton x

There is no need to say Set.singleton x, because we know that the result of 
foo has type Set a therefore we search in the current module augmented by 
the contents of the Set module (every type is also a module) for a binding 
for "singleton" which maps from a to Set a.

Finally (apologies for this long post), returning to the use of ^ to allow 
an object oriented way of thinking consider:

    insert :: a -> Set a -> Set a
    ps = singleton 3
    qs = insert 4 ps
    rs = ps^insert 4

When resolving "insert" used in the binding for rs, the compiler should see 
that we are looking for some function Set Int -> Int -> Set Int, and hence 
will be looking in the current module augmented by the Set module. However 
the Set module only has a binding for insert with type a -> Set a -> Set a. 
So the compiler should generate a new function insert' from insert by moving 
the first Set a arg to the front.

Summary:

1) Every value binding belongs to a value space represented by a ty-tuple 
which abstracts away from the details of the actual type

2) The space of ty-tuples forms a lattice, and in particular this means 
qualification is optional when there are no conflicts (we can use Leaf 3 
instead of Tree.Leaf 3), and we can use partial qualification to supply just 
the disambiguation info needed.

3) The TI algorithm should use the type of the expression (generated 
top-down from the top-level annotation) to search in the correct value 
spaces to resolve identifiers

4) Record field selection doesn't need any special scoping rules, and is 
just one example of a general object-oriented way of thinking about function 
application.

5) We can get all the advantages of automatic namespace management the OOP 
programmers take for granted, in functional programming, by using value 
spaces as the analogue of objects, and can thereby get rid of complicated 
import/export directives and improve code readability with less typing (but 
making more use of typing - excuse the pun :-))

I'm in the process of trying to implement a pre-processor for Haskell which 
would use the algorithms sketched above (as well as fixing a few other 
things I think are wrong with Haskell such as the way the current layout 
rule allows one to write code that would break if you replaced an identifier 
with one that had a different number of characters in it etc), so any 
feedback on these ideas would be welcome.

Thanks for reading so far!

Brian Hulley

PS Everything above is purely intended for resolving the kind of ad-hoc 
overloading that is not amenable to treatment by type classes (which 
transforms some subset of the ad-hoc-overloaded functions which share the 
same shape into a single function with implicit dictionary arguments (in the 
usual implementation)).