[Haskell-cafe] Efficient mutable arrays in STM
Ben Franksen
ben.franksen at online.de
Tue Oct 25 19:47:39 CEST 2011
I have an application in mind where concurrent access to large arrays (up to
millions of elements) of mostly small elements (Int or Double) is common.
Typical access patterns would be chunk-wise, i.e. reads or writes from index
n up to index m. Imagine stuff like images, scientific data, etc.
All this suggests that Control.Concurrent.STM.TArray, in its current
implementation, is not appropriate. Quoting the docs:
"It is currently implemented as Array ix (TVar e), but it may be replaced by
a more efficient implementation in the future (the interface will remain the
same, however)."
An array of TVars is certainly *much* too inefficient for what I have in
mind w.r.t. both memory and cpu time. In fact I had already decided to use
Data.Vector.Unboxed from the vector package.
I see that Data.Vector.Unboxed.Mutable provides
slice :: Unbox a => Int -> Int -> MVector s a -> MVector s a
Yield a part of the mutable vector without copying it.
which is almost what I need... Can I use this together with unsafeIOToSTM
internally inside a library to provide shared transactional access to an
IOArray? The docs warn that using unsafeIOToSTM is (obviously) "highly
dangerous", but for what I have in mind the listed problems are not an
issue:
* running the IO code multiple times is ok
* aborting is ok, too
* inconsistent views are ok, too
The main question is: does the STM transaction actually "see" that I changed
part of the underlying array, so that the transaction gets re-tried? Or do I
have to implement this manually, and if yes: how?
Has anyone done something like that before? (If you tried this and found it
doesn't work, please tell me, it would save me a lot of work repeating the
effort).
Is someone working on providing a more efficient version of TArray?
Would it help if I said I'd be a happy user of a better TArray? ;-)
If what I sketched above is infeasible (I would not be surprised if it was,
I haven't yet spent much effort trying it out), what other options do I
have? Is there an internal API for the STM stuff, i.e. a C header file or
something which would make it possible to add efficient TArrays?
Or should I use a high-level approach, something like a Data.Sequence.Seq of
medium sized chunks (TVar (IOVector e))?
Any comments are highly appreciated!
Cheers
Ben
More information about the Haskell-Cafe
mailing list