[Haskell-cafe] heterogeneous environment

Wed May 2 08:36:27 CEST 2012

dear static typing masters --

while working on an STM-like library, i ran into the issue of trying to create a transaction log for reads / writes of heterogeneous reference types.  i don't have a good answer to this problem.  the problem is twofold : first, the general heterogeneous collection problem, and second, connecting a reference to it's log.  there are some solutions i've tried but each has drawbacks :

- store the transaction log inside of the reference itself.  essentially, each reference has an IORef (Map ThreadId a) associated to it.  this is the approach used by [1].  unfortunately this creates a point of concurrency contention at each reference for each read / write, and a lot of repeated computation (the transaction log is spread out in a lot of pieces.) 

- use Data.Unique to identify Refs, and use existential quantification or Data.Dynamic to create a heterogenous Map from uid to log.  for example, to represent a log of compare-and-swaps we might do something like

data Ref a = Ref (IORef a) Unique
data OpaqueCAS = forall a . OpaqueCAS { casRef :: Ref a, oldVal :: a, newVal :: a }
type CASLog = Map Unique OpaqueCAS
logCAS r@(Ref _ uid) o n log = insert uid (OpaqueCAS r o n) log...

but what if the transaction wants to perform a second CAS on a reference we've already CAS'd?  then we should create a combined OpaqueCAS record which expects the first oldVal and swaps in the second newVal.  unfortunately the type checker balks at this, as it can't prove that the type variable 'a from the first record is the same as the 'a in the new record; i suppose there might be some fancy type shenanigans which might solve this...

Data.Dynamic works but puts a Typeable constraint on the Ref type, so requires the user to modify their data types, and requires a run-time type check (and while performance isn't paramount now it will become important to me later.)  also it doesn't feel right...

- tupling and functional representations.  a monadic function that does a read on an reference can be thought of as a pure function with an extra argument.  a monadic function that does a write can be thought of as a pure function with an extra return value.  combining all the reads and writes into a transaction log is a big network / switchboard connecting problem, e.g. creating the extra inputs / outputs to the various functions and then stitching them together.  running the monad then just is connecting up the final composed function to the actual input / output references.  in a sense this amounts to tupling (or currying) the heterogeneous types.  (is this is a kind of "final" representation, in the "finally tagless" sense?)

i looked at this there were two problems : 1 - the representation is very difficult to manipulate, at least the way i was trying (using the arrow operations); the switchboard problem is extremely verbose, and 2 - it is hard to reconcile identity and position in the tuples -- the repeated read / write problem again.  i also experimented with HList but got stuck on similar issues.

it strikes me this is just the problem of keeping an environment for an interpreter of a language with mutable heterogeneous reference types.  this must be a problem that has either been solved or else there is a haskell point of view on it i'm not grasping which avoids the need for this data structure.  maybe there is a way of writing this as an interpreter or using some existing monad, like the ST monad?

best, ben

[1] - Frank Huch and Frank Kupke, A High-Level Implementation of Composable Memory Transactions in Concurrent Haskell