[Haskell-cafe] I love purity, but it's killing me.

wren ng thornton wren at freegeek.org
Wed May 27 14:33:36 EDT 2009


Conal Elliott wrote:
> Hi Wren,
> 
> I considered the idea of hashing, but not *perfect* hashing.  I don't know
> how to hash perfectly with something like expressions, which have infinitely
> many values.

An imperfect hash can work. You'll need a memo table with a source of 
unique symbols (e.g. storing the next unused integer) in order to, 
effectively, make the "hash" function perfect[1]. If you have a source 
of unique symbols then you can also use a trie, Data.Map, or similar in 
lieu of a hash map.

In a language with pointers (or stable names), the pointer is often used 
as the "hash" in conjunction with using the memo table as an intern 
table for smart constructors. Thus, client code can never observe that 
structurally equal expressions could have different hashes.

[1]
hash :: HashState m => Expr -> m Hash
hash e = case lookupHS e of
          Just h  -> return h
          Nothing -> do h <- nextH
                        insertHS e h
                        return h

> > Since it's stateful, that means the smart constructors may need to be in an
> > appropriate monad/applicative for passing the memo table around (some hash
> > functions may not need to store the table explicitly).
> 
> Hm -- stateful?  Unless I'm misunderstanding, a stateful &
> monadic/applicative approach would break the simple functional interface I'm
> going for.  Could well be I haven't formed a mental picture that matches
> yours.

Er, it's only stateful for the versions above that use pointers or a 
source of unique symbols (since they need to maintain a memo table). If 
you can come up with a perfect hash function[2], then there's no need to 
create/store the memo table at all, since it can be reconstructed on the 
fly. Since perfect hashing often isn't feasible, the stateful 
approximations to a perfect hash function are generally used. Sorry if I 
was unclear.

If you don't mind unsafePerformIO (or similar hacks) then you can hide 
the state from the type system by using the same table for the whole 
program. Generally for hash cons'ing you want your tables to be as large 
as they can be (to maximize sharing) so this shouldn't be problematic. 
However, for languages with scoping it can be beneficial to use separate 
tables to recognize when expressions need to be recomputed; so the 
global store might want to be something like a stack of memo tables with 
fall-through lookup.

I believe Applicative is powerful enough to capture the sort of state 
passing needed since the client code can't ever make decisions based on 
the state. So with smart constructors (to package up the <*> etc) I'd 
think you should be able to have an EDSL that looks nice, just with a 
more complicated type. Perhaps the issues are with mixing pure Haskell 
functions into the EDSL?

...

The real trick behind hash cons'ing is everywhere substituting the 
"hash" in for the sub-expression, effectively flattening all expressions 
into a single ply. Thus, expression constructors "cons the hashes" 
rather than cons'ing expressions. It's similar in spirit to trie'ing, 
but from the bottom up in the same way that dynamic programming is done.

The reason for wanting to do the hashing in smart constructors, as 
opposed to at the end, is to maximize the benefit of dynamic 
programming. If all client-visible expressions are represented by 
hashes, then any structure sharing in the Haskell layer is sharing the 
hash representation, thus you don't need to traverse the shared 
substructure multiple times. (If you hand construct equal expressions 
without sharing, then you'll have to traverse each expression to prove 
that they're equal, but you can use that proof (the hashes) 
thenceforth). For host languages with destructive updates (like 
Smalltalk's "become"), you can rewrite the subterms as you traverse 
them, so doing it at the end isn't too bad.

If you only expose smart constructors then your Expr type can "recurse" 
as whatever Hash type. If you do the hashing at the end, then you'll 
need to define a catamorphism on Expr.

...

This is probably similar to what you're doing in Pan, Vertigo, and 
Pajama (I haven't read it). The general technique is elegant in its 
simplicity, and it's not uncommon. Though, like most dynamic programming 
tricks, it seems not to be as widely known as I would assume, so I 
thought I'd mention it in case you've missed it.



[2] Into any domain of terms that can quickly answer (==), namely flat 
terms like Integer. Using a bounded type like Int can give better 
performance guarantees, but there's only so many of them.

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list