[Haskell wikibook] Re: Haskell wikibook - chapter "Theseus and the Zipper"

Sun Jul 8 08:58:44 EDT 2007

Dear Mr. Huet,

> Very nice indeed.
>
> The analogy with Ariadne's thread is a natural one. I use myself the 
> analogy with mountaineering, you are hanging from the top of the 
> datastructure using the zipper as a return rope.
>
> Compliments on a nice site on functional programming. Keep up the good 
> work.

Thank you very much for the encouragement, we are delighted that you 
appreciate our efforts :)

The aim of the Haskell wikibook is to write a free and gentle textbook 
on Haskell and associated techniques for functional programming. Of 
course, to keep up the good work, we are always looking for 
contributors. In fact, the book lives through contributions from its 
readers: it is built with a wiki, utilizing the same platform as 
Wikipedia and hence sharing the same principle that anyone can edit.

> After I learned of Conor Mc Bride's work, I wrote some more about 
> zippers for a de Bruijn's Festschrift, see item 79 in my bibliography 
> [http://yquem.inria.fr/~huet/bib.html].
>
> My main application of zippers so far has been for the design of the 
> Zen toolkit for computational linguistics, see items 78,82,84,85 and 
> site http://sanskrit.inria.fr/huet/ZEN/index.html. This toolkit was 
> reused in Haskell by the Grammatical Framework linguistic platform of 
> Aarne Ranta.

Very interesting. Pondering the sharing functor, I think there is a way 
to make practical the Gödel-numbering you mentioned on page 10 of item 
78.

The starting point is that I personally have a slight distaste for hash 
tables since they are tied to machine numbers and arrays, concepts 
alien to algebraic data types and purely functional programming. I 
prefer tries, i.e. finite representations of functions

   key -> value

 From the point of view of computational complexity, they are optimal: 
lookup and insertion all take O(|size of key in bits|) time.

Tries can be obtained systematically from the structure of the key type 
by the isomorphisms

   (key1 + key2) -> value  ≅ (key1 -> value) × (key2 -> value)
   trie for a sum type     ⇒ pair two tries for the components

   (key1 × key2) -> value  ≅ (key1 -> (key2 -> value))
   trie for a product type ⇒ nest the tries for the components

See also

   Ralf Hinze. Generalizing generalized tries. Journal of Functional
   Programming, 10(4):327-351, July 2000
   http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz

 From now on, the function arrow -> shall denote a finite map from keys 
to values, so that every -> type is represented by an algebraic data 
type. In particular, it is possible to construct a generalized trie 
with a key type ([Char] -> Bool) representing a set of words 
(Haskell-like syntax, I don't know ML). Thus, both your hash table with 
subtries of the dictionary as keys and a generalized trie with subtries 
as keys represent a map

   ([Char] -> Bool) -> value ≅ (µX. Bool × (Char -> X)) -> value

Of course, replacing the hash table with the generalized trie 
corresponds to the one-to-one Gödel numbering of the keys as you 
mentioned on page 10 of item 78. In a sense, the subtries as keys are 
(a representation of) their own Gödel numbers. But now, Gödel-number 
equality is the structural equality of the tries which defeats the hole 
purpose of avoiding repeated traversal of the subtries of the 
dictionary. Put differently, even a numeric Gödel-numbering cannot 
decrease the size of the tries or it would not be able to represent 
every possible trie.

The solution is to restrict the Gödel-numbering to the set of tries we 
are interested in, namely to all the subtries of the dictionary. The 
intention is that the Gödel numbers sort of simply enumerate the 
subtries from 1 to n. In other words, the finite map storing the 
subtries becomes a structure storing "local Gödel numbers"

  type Subtrie    = µX. Bool × (Char -> X)
  type Gödel      = Int
  type GödelStore ≅ Subtrie -> Gödel

The key is that the Gödel numbers can now be used to break the 
recursion, yielding

  type GödelStore = (Bool × (Char -> Gödel)) -> Gödel

Put differently, a subtrie is represented by its topmost branches with 
the children being Gödel numbers instead of hole tries. By traversing 
the dictionary bottom-up, we have the Gödel numbers of subtries 
available before assigning a Gödel number to the parent. The parent 
gets a fresh Gödel number when its not in the GödelStore, otherwise we 
use the Gödel number from the store and thus share this subtrie. In 
fact, your current hashing scheme proceeds essentially the same way: a 
hash number is calculated from the already calculated hash numbers of 
the children.

Maybe your current hash-table implementation could already benefit from 
Gödel numbers. The insight is to separate Gödel numbers representing 
hole subtries and hash numbers which may be used to implement the 
"shallow" GödelStore as a hash table. The benefit is that in case of a 
collision, the expensive test for structural equality is replaced with 
a cheap check for Gödel number equality.

All the best,
apfelmus

> Le 30 juin 07 à 10:40, apfelmus at quantentunnel.de a écrit :
>> Dear Mr. Huet,
>>
>> Fascinated by the zipper since reading your publication, the authors 
>> of the Haskell wikibook have written a special introduction to this 
>> data structure:
>>
>>   "Theseus and the Zipper".
>>   http://en.wikibooks.org/wiki/Haskell/Zippers
>>
>> starring Theseus and Ariadne. We hope you will enjoy it :)
>>
>> All the best,
>> apfelmus