[Haskell-cafe] Avoiding name collisions by using value spaces instead of modules

Sun Jan 8 10:47:44 EST 2006

Am Sonntag, 8. Januar 2006 14:06 schrieb Brian Hulley:
> Hi -
> A main problem I've found with Haskell is that within a module, too many
> things are put into the same scope. For example
>
>     data Tree a b = Leaf a | Node {elem::b, lhs::Tree a b, rhs::Tree a b}
>
> puts Leaf, Node, elem, lhs, and rhs, into the module's namespace,
> as well as Tree. This means that if you want another kind of tree in
> the same module you've got to try and think of another word for Leaf,
> or else prefix every value constructor with some prefix of the tycon to
> try and prevent name collisions.
>
> I propose that the above declaration should introduce a *new module* Tree,
> as a sub module of the containing module, and Leaf, Node, elem etc will be
> put into this module, and not the module containing the data declaration
> itself.

What speaks against putting the data declaration in a separate module:

module ThisKindOfTrees where

data Tree a b = ...

and then use qualified imports (with a short alias), if you want to use 
different kinds of trees in one module?
Yes, more files, but, IMHO, much more readable.

>
> Identifiers in this child module could then either be used qualified or
> unqualified as long as they don't conflict with anything in the parent
> (containing) module, thus in the parent module, we could have:
>
>     a = Tree.Leaf 3
>     b = Node{elem = 6, lhs = Leaf 2, rhs = Leaf 3}
>     c = Tree.elem b

You get that by unqualified import.

>     c = b^elem
>
> where ^ is just a helpful sugar which binds tighter than function
> application and can be used anywhere to allow an "object oriented" way of
> thinking where the first arg is put on the left of the function (more on
> this later).

'^' is definitely a bad choice, 'cause it's exponentiation (and has a long and 
venerable history as symbol therefore).
>
> Also, we could generalise the GHC GADT style data declarations by allowing
> any declarations to appear in the where part, and allow modules to consist
> of a single data declaration (ie begin with the keyword data instead of
> module) so that the contents of the Data.Set module would instead be
> something like:
>
>     data Ord a => Set a = .... where
>         insert :: Set a -> a -> Set a
>         ...
>
> This would allow Set to be imported qualified into other modules where we
> could refer to the Set type by the single word Set instead of Set.Set
> (which I always think looks very strange indeed) ie
>
>     module M where
>         import qualified Data.Set as Set
>
>         f :: a -> Set a            -- no longer need to write a -> Set.Set
> a
>

import qualified Data.Set as Set
import Data.Set (Set)

does the trick and I prefer it over your suggestions below.

> In fact, the whole complexity asociated with hiding components of a module
> and allowing unqualified imports could be discarded, since the correct
> resolution for each identifier would be determined by its typed context,
> just as in OOP the resolution is determined by the type of the object.
>
> So to summarise so far, for any value f  :: T0 -> T1 -> ... -> Tn, we
> construct the tuple (Q0, Q1, ... Qn) where for each i, Qi is obtained from
> Ti by ignoring all predicates and quantification and replacing each tyvar
> by the symbol '?'.
>
> For example, for
>
> module M where
> foo :: forall b. Eq a =>  Set (Tree a) -> Tree ([a],b) -> [Tree (a,b)]
>
> we ignore the forall and Eq, and replace tyvars and tuple the results to
> get:
>
>    M.(Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo
>
> as the fully qualified reference to the entity we've just declared.

Looks absolutely horrible to me. What would we gain?
We could have

M.(Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo
and 
M.(Int, Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo

but why? Do we really want it? I certainly don't.
>
> If we call the tuple (Set, ...) the "ty-tuple", which specifies a space of
> values (ie something that each value is considered to "belong" to - the
> analogue of the object in OOP), then we can define a projection from a
> space into a containing space as follows:
>
> 1) Let Qi == Ui Si0 Si1 ... Sik be any element of a ty-tuple P. Then we can
> form a new ty-tuple P' by replacing Qi by Ui.
>
> 2) Let Qi be any element of a ty-tuple P of arity n+1. Then we can form
> P' == (Q0, ..., Qi-1, Qi+1, ... Qn) of arity n
>
> In both cases, we can use P' to qualify the identifier to get a value
> reference as long as this still has enough information for disambiguating
> other uses of the identifier in M.
>
> Applying these transformations of the ty-tuple to the example above,
> any of the following could be used as a partial qualification of foo in M:
>
>     (Set Tree, Tree ([],?), [Tree]).foo
>     (Set, Tree, []).foo
>     Tree.foo
>     foo
>
> Rule 2 allows us to propagate identifiers back up to their ancestor modules
> as long as there are no conflicts along the way.
>
> The type inference algorithm could then be modified (I hope) to use the
> top-level annotation for a value declaration to determine the entities that
> identifiers refer to, by searching in the appropriate value spaces
> determined by the types of the args and return value.
>
>     label :: Tree a -> Int -> (Tree a, Int)
>     label (Leaf x) i = (Leaf i, i+1)
>
> The identifier Leaf is resolved in the current namespace augmented by the
> namespace for module Tree, thus we don't have to explicitly write Tree.Leaf
> even though the declaration of label occurs outside the Tree module itself.
>
> (I suggest that top-level type annotations should be mandatory since
> without them one just drowns in a sea of TI confusion when there is a type
> error. By making them mandatory the TI algorithm could make real use of
> them to resolve identifier bindings as in OOP)
>
> Similarly:
>
>     foo :: a -> Set a
>     foo x = singleton x
>
> There is no need to say Set.singleton x, because we know that the result of
> foo has type Set a therefore we search in the current module augmented by
> the contents of the Set module (every type is also a module) for a binding
> for "singleton" which maps from a to Set a.

import qualified Data.Set as Set
import Data.Set (Set, singleton, ... )

But yes, it's a nice idea to have that done automatically, however, plain

import Data.Set

gives you qualified and unqualified access and you only need to use the 
qualified form in case of ambiguities. I admit that sometimes it's annoying 
to have to write Data.Set.map (or whatever) when you apply it to a Set, 
rather than the type-checker determines it automatically, but if it did, 
you'd havoc because sometimes you've just made an error -- which wouldn't 
then be spotted by the type-checker.
>
> Finally (apologies for this long post), returning to the use of ^ to allow
> an object oriented way of thinking consider:
>
>     insert :: a -> Set a -> Set a
>     ps = singleton 3
>     qs = insert 4 ps
>     rs = ps^insert 4
>
> When resolving "insert" used in the binding for rs, the compiler should see
> that we are looking for some function Set Int -> Int -> Set Int, and hence
> will be looking in the current module augmented by the Set module. However
> the Set module only has a binding for insert with type a -> Set a -> Set a.
> So the compiler should generate a new function insert' from insert by
> moving the first Set a arg to the front.

Automatic permutation of arguments? Has its merits, but goodbye to 
type-safety, I believe.
>
> Summary:
>
> 1) Every value binding belongs to a value space represented by a ty-tuple
> which abstracts away from the details of the actual type
>
> 2) The space of ty-tuples forms a lattice, and in particular this means
> qualification is optional when there are no conflicts (we can use Leaf 3
> instead of Tree.Leaf 3), and we can use partial qualification to supply
> just the disambiguation info needed.
>
> 3) The TI algorithm should use the type of the expression (generated
> top-down from the top-level annotation) to search in the correct value
> spaces to resolve identifiers
>
> 4) Record field selection doesn't need any special scoping rules, and is
> just one example of a general object-oriented way of thinking about
> function application.
>
> 5) We can get all the advantages of automatic namespace management the OOP
> programmers take for granted, in functional programming, by using value
> spaces as the analogue of objects, and can thereby get rid of complicated
> import/export directives and improve code readability with less typing (but
> making more use of typing - excuse the pun :-))
>
> I'm in the process of trying to implement a pre-processor for Haskell which
> would use the algorithms sketched above (as well as fixing a few other
> things I think are wrong with Haskell such as the way the current layout
> rule allows one to write code that would break if you replaced an
> identifier with one that had a different number of characters in it etc),

So you would have to adjust your indentation in that case.
If you consider that a serious problem, what about using braces and 
semicolons? If you do it like

foo bar = do { x <- something bar
             ; more x
             ; yetMore x bar
             }

you get the advantages of both, layout-enhanced readability and 
layout-insensitivity due to explicit braces and semicolons.

> so any feedback on these ideas would be welcome.

You want object-oriented Haskell?
Go ahead (there's something around already), but make it a language of its 
own.

No offence intended, but I like Haskell as it is and haven't really seen the 
merits of OO-overloading yet.
>
> Thanks for reading so far!
>
> Brian Hulley
>
> PS Everything above is purely intended for resolving the kind of ad-hoc
> overloading that is not amenable to treatment by type classes (which
> transforms some subset of the ad-hoc-overloaded functions which share the
> same shape into a single function with implicit dictionary arguments (in
> the usual implementation)).
>

Cheers,
Daniel