[Haskell-cafe] ANNOUNCE: DSTM 0.1.1

Wed Aug 4 09:55:41 EDT 2010

Good questions. I am about to write a paper explaining the design of the DSTM library in more detail which I will link when available. Please bear with me, here. In the meantime please find some shorter answers below.

Regards,
Frank

Am 04.08.2010 um 10:53 schrieb Andrew Coppin:

> Frank Kupke wrote:
>> For usage please look into the documentation file: DSTMManual.pdf.
> 
> 1. Any danger of puting this somewhere I can read it without having to download and manually unpack the Hackage tarball?
Actually not. I just had not thought about it. Here is a link to the pdf.

http://www-ps.informatik.uni-kiel.de/~frk/dstm.pdf

> 
> 2. Since DSTM depends on the Unix package, I presume this won't work on Windows. (?) OOC, what does it use Unix for?
Good point. I use System.Posix for benchmark debug code and a sigPIPE handler. Probably both can go without doing any damage. I will look into it. As you can tell, I have not tested using Windows...
> 
> 3. It is unclear to me what happens in the event of a communications failure. The documentation says an exception is thrown, but does the running transaction rollback or just retry or...? (The documentation seems to suggest that it might actually *commit* in spite of a communications failure, which sounds wrong.)
Two things happen in parallel. 
+ One is that the library throws an exception to the application saying basically: Hey, at least one of your TVars you just accessed is broken, you better check which ones and react on it in your program. This is the abstraction level outside of a transaction (atomic function). The app now knows that one or more of its services represented by TVars is down and needs attention.
+ The other thing is the low-level stuff within the transaction thus within the library. The behavior of the transaction depends on *when* actually the failure is detected. 
- If the failure comes up *before* the transaction has been validated ok, it is aborted. A normal invalidation abort would restart the transaction automatically but in this case throwing the exception terminates it. Abort is often called rollback (when referring to databases) but I prefer not to because up to now everything has been done safely within the STM monad. Nothing happened in the IO world, hence nothing need to be rolled back. (Btw. *retry* is a different kind of transaction restart. It is not done automatically but forced by the application calling the retry function.)
- If the failure comes up *after* the transaction has been validated ok, it is committed. At first, this might look wrong but it is not. Here is why: If the validation is ok, all participating TVars on any node have agreed that the transaction is ready to commit. If then any one of these TVars fails, the decision to commit is still valid as nothing else has changed. Note that all TVars are locked by the library from before validating until after committing. Furthermore, some TVars might have already finished the commit. Then it would be inconsistent for the others not to commit.
In both cases, commit or not, however, the library takes precautions that no deadlocks build up by broken TVars unexpectedly quitting the transaction protocol.

> Also, when is failure detected? Is it only when a transaction tries to access the variable?
Yes. However, all such accesses are happening within the library and are fully transparent to the application. As TCP is a connection based protocol only simulating a connection, we do not know exactly when a connection actually breaks. We can peek, though. Either by sending test messages (ping), if they bounce, the connection is obviously broken; Or by just observing when a regular message bounces. Regular messages are due to reading, validating, committing, ... TVars within atomic transactions. The library detects the failure (sooner or later) and informs the application by throwing the exception right after the detection. From an application's perspective the failure is only detected when the application reads from or writes to the TVar within an atomic transaction. If a TVar is not accessed its failure might remain undetected.
> 
> 4. What network transport does this thing use? TCP? UDP? What port numbers?
DSTM uses TCP communication. It searches dynamically for available ports starting at port 60001, using 60000 for the name server. If two nodes are running each on a separate machine with a different IP address, chances are that both use the same port 60001. If they run on the same machine, most likely one will use port 60001 and the other 60002.
> 
> 5. How does it work? Does it spawn a Haskell thread for each machine connection or something?
Each node spawns a Haskell thread listening to its designated port and spawning itself a thread for each accepted TCP communication, i.e. one for each foreign node talking to it. Each such thread implements a communication line between two threads. I have tried several communication line schemas which I will describe in more detail in the paper yet to come...
> 
> 6. The past tense of "join" is "joined", not "joint". ;-)
Thanks! The pun was not intended ;-)
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe