[Haskell-cafe] I love purity, but it's killing me.
wren ng thornton
wren at freegeek.org
Wed May 27 14:33:36 EDT 2009
Conal Elliott wrote:
> Hi Wren,
> I considered the idea of hashing, but not *perfect* hashing. I don't know
> how to hash perfectly with something like expressions, which have infinitely
> many values.
An imperfect hash can work. You'll need a memo table with a source of
unique symbols (e.g. storing the next unused integer) in order to,
effectively, make the "hash" function perfect. If you have a source
of unique symbols then you can also use a trie, Data.Map, or similar in
lieu of a hash map.
In a language with pointers (or stable names), the pointer is often used
as the "hash" in conjunction with using the memo table as an intern
table for smart constructors. Thus, client code can never observe that
structurally equal expressions could have different hashes.
hash :: HashState m => Expr -> m Hash
hash e = case lookupHS e of
Just h -> return h
Nothing -> do h <- nextH
insertHS e h
> > Since it's stateful, that means the smart constructors may need to be in an
> > appropriate monad/applicative for passing the memo table around (some hash
> > functions may not need to store the table explicitly).
> Hm -- stateful? Unless I'm misunderstanding, a stateful &
> monadic/applicative approach would break the simple functional interface I'm
> going for. Could well be I haven't formed a mental picture that matches
Er, it's only stateful for the versions above that use pointers or a
source of unique symbols (since they need to maintain a memo table). If
you can come up with a perfect hash function, then there's no need to
create/store the memo table at all, since it can be reconstructed on the
fly. Since perfect hashing often isn't feasible, the stateful
approximations to a perfect hash function are generally used. Sorry if I
If you don't mind unsafePerformIO (or similar hacks) then you can hide
the state from the type system by using the same table for the whole
program. Generally for hash cons'ing you want your tables to be as large
as they can be (to maximize sharing) so this shouldn't be problematic.
However, for languages with scoping it can be beneficial to use separate
tables to recognize when expressions need to be recomputed; so the
global store might want to be something like a stack of memo tables with
I believe Applicative is powerful enough to capture the sort of state
passing needed since the client code can't ever make decisions based on
the state. So with smart constructors (to package up the <*> etc) I'd
think you should be able to have an EDSL that looks nice, just with a
more complicated type. Perhaps the issues are with mixing pure Haskell
functions into the EDSL?
The real trick behind hash cons'ing is everywhere substituting the
"hash" in for the sub-expression, effectively flattening all expressions
into a single ply. Thus, expression constructors "cons the hashes"
rather than cons'ing expressions. It's similar in spirit to trie'ing,
but from the bottom up in the same way that dynamic programming is done.
The reason for wanting to do the hashing in smart constructors, as
opposed to at the end, is to maximize the benefit of dynamic
programming. If all client-visible expressions are represented by
hashes, then any structure sharing in the Haskell layer is sharing the
hash representation, thus you don't need to traverse the shared
substructure multiple times. (If you hand construct equal expressions
without sharing, then you'll have to traverse each expression to prove
that they're equal, but you can use that proof (the hashes)
thenceforth). For host languages with destructive updates (like
Smalltalk's "become"), you can rewrite the subterms as you traverse
them, so doing it at the end isn't too bad.
If you only expose smart constructors then your Expr type can "recurse"
as whatever Hash type. If you do the hashing at the end, then you'll
need to define a catamorphism on Expr.
This is probably similar to what you're doing in Pan, Vertigo, and
Pajama (I haven't read it). The general technique is elegant in its
simplicity, and it's not uncommon. Though, like most dynamic programming
tricks, it seems not to be as widely known as I would assume, so I
thought I'd mention it in case you've missed it.
 Into any domain of terms that can quickly answer (==), namely flat
terms like Integer. Using a bounded type like Int can give better
performance guarantees, but there's only so many of them.
More information about the Haskell-Cafe