[Haskell-cafe] Freeze/thaw fusion

Ian Duncan ian at iankduncan.com
Mon Aug 3 14:18:38 UTC 2015

On August 3, 2015 at 8:28:02 AM, Jake McArthur (jake.mcarthur at gmail.com) wrote:
Vector also does something like what your are describing. I think the phrase to google for is "array recycling".

On 11:56PM, Fri, Jul 31, 2015 William Yager <will.yager at gmail.com> wrote:
Has anyone done any research into fusing operations that involve thawing some data, mutating it, and then freezing it again?

Data.Vector does something similar; it turns vectors into streams, operates on the streams, and then turns them back into vectors. It can fuse these operations by removing intermediate  However, I've done a bit of preliminary work on a fusion system that works on operations of the form 

    runST $ do 
        x' <- thaw x
        foo x'
        freeze x'

and yields promising results in some cases. This could be useful for data structures that aren't stream-able, but suffer from lots of unnecessary freezing and unfreezing. 

This seems like the sort of thing that someone would have already done, so I wanted to check if anyone knew of any previous work on this.

Haskell-Cafe mailing list
Haskell-Cafe at haskell.org
Haskell-Cafe mailing list 
Haskell-Cafe at haskell.org 

I’d be interested in knowing about the preliminary work you’re doing here. I’ve got a port of Clojure’s persistent vectors (pvector for short) in development right now, and I’m working on figuring out how to batch consecutive pure modifications of a pvector into doing cheaper operations on the mutable version before converting it into the immutable version again. Here are the operations scenarios that I’m especially interested in optimizing this way:

{- each of these requires cloning a 32-element array.
   optimizing consecutive snocs is important because
   fromList is defined as (foldl' snoc empty)
update1 :: Vector a -> Int -> a -> Vector a
snoc :: Vector a -> a -> Vector a
unsnoc :: Vector a -> Maybe (a, Vector a)

{- requires cloning <= N/32 arrays, where N is length of update batch -}
update :: Vector a -> Vector (Int, a) -> Vector a

{- would be nice to preallocate a fully-sized pvector for consecutive concats.
   for example (\vs -> foldr concat empty vs)
concat :: Vector a -> Vector a -> Vector a
I only just started working on freeze/thaw fusion myself, so I haven’t gotten far at all. One trick that I’ve got in place in the transient structure to make this cheaper though is how I track modifications of the persistent vector. The immutable version is represented like so:

data Node a
  = Leaf   !(Array a) {- Array here from primitive package -}
  | Branch !(Array (Node a))
  | EmptyNode

data Vector a = Vector
  { vCount     :: !Int
  , vShift     :: !Int
  , vRoot      :: !(Node a)
  , vTail      :: !(Array a)
  , vTailCount :: !Int

And then here’s the transient version:

data TransientNode s a
  = UneditedLeaf   !(Array a)
  | UneditedBranch !(Array (Node a))
  | EditedLeaf     !(MutableArray s a)
  | EditedBranch   !(MutableArray s (TransientNode s a))
  | EmptyTransientNode

data TransientVector s a = TransientVector
  { tvCount      :: !Int
  , tvShift      :: !Int
  , tvRoot       :: !(TransientNode s a)
  , tvTail       :: !(MutableArray s a)
  , tvTailEdited :: !Bool
  , tvTailCount  :: !Int
When the persistent version is converted to the transient version, the data structure tracks which children have been modified over the course of operations on the transient structure. So, when the mutable version is converted back into the persistent version, any of the underlying arrays which haven’t been touched are still shared with any of the other live pvectors that are referencing them. That’s not specifically relevant to fusion, but it does make it quite cheap to go roundtrip from persistent->transient->persistent again.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150803/5b16c0f6/attachment.html>

More information about the Haskell-Cafe mailing list