[Haskell-cafe] Traversing a graph in STM

Tue Sep 19 09:47:15 EDT 2006

On Sep 18, 2006, at 4:47 AM, Einar Karttunen wrote:

> On 18.09 01:23, Josef Svenningsson wrote:
>> On 9/17/06, Jan-Willem Maessen <jmaessen at alum.mit.edu> wrote:
>>> You can associate a unique name with each traversal, and store a set
>>> of traversals at each node (instead of a mark bit).  But this set
>>> grows without bound unless you follow each traversal with a  
>>> "cleaning
>>> traversal" which removes a traversal tag from the set.  And you'd
>>> need some way to generate the unique names.
>>>
>> Well, if your set implementation used weak pointers there would be no
>> need for a cleaning traversal. The garbage collector will take  
>> care of
>> that. The only slightly tricky thing is to keep a live pointer to the
>> unique traversal name during the entire of the traversal. But I don't
>> think that should be a big problem.
>>

This just amounts to saying "we can use the GC to implement the  
cleanup traversal on our behalf."  I'd be quite surprised if this  
were actually more efficient.  But this is all a bit moot, as Einar  
observes:

> This suffers from the problem that two traversals reading the
> same parts of the graph would have a good chance to make each other
> retry.

Any solution which stores traversal states in the nodes has this  
problem.  Fundamentally you can't  update the state of graph nodes in  
any way using STM and expect to run multiple traversals concurrently  
over the same subgraph.

> I am thinking of going the StableName route. But as this happens
> inside STM Data.HashTable does not help much (without using
> unsafeIOToSTM and dealing with retries).

I'd like to make an STM version of Data.HashTable, but it requires  
implementing some sort of STMArray, or using an array of TVars and  
slowing the hashtable implementation to a crawl.  Without access to  
the internals of STM, implementing some form of STMArray turns out to  
be awfully difficult (I understand the implementation techniques, but  
the ones I understand involve adding frobs to the STM implementation  
itself or degenerating to maps).  This would also address the lack of  
a concurrent hash table (the other alternatives for which are to run  
into a similar set of shortcomings for IOArrays or STArrays, where  
I'd want to have a CAS operation or some sort of compact array of  
MVars).

I'm always a little conflicted about making StableNames for keys into  
a hash table.  Internally GHC creates a giant invisible hash table of  
StableNames, just so we can look things up in it and then use the  
result to insert stuff into our user-visible hashtable.

> If StableNames were in Ord using Set (StableName T)
> would be nice. But in the current implementation one has to resort
> to IntSet Int [StableName T] which is not pretty at all.

I agree.  I wish StableNames were ordered.

-Jan-Willem Maessen

[PS - Does the StableName-internal hash table use the same hash  
function as the old Data.HashTable did?  The comments in  
Data.HashTable suggested it might.  If so, it might be a good idea to  
switch to something like the multiplicative hash function in my  
Data.HashTable code; this gets much of the benefit of switching hash  
table implementations at pretty low cost.  It's not the best hash by  
a long stretch, but it's much better than what was there.]

>
> - Einar Karttunen

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2425 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20060919/f389e607/smime.bin