[Haskell-cafe] Freeze/thaw fusion

Mon Aug 3 14:22:49 UTC 2015

On Mon, Aug 3, 2015 at 10:18 AM, Ian Duncan <ian at iankduncan.com> wrote:
> On August 3, 2015 at 8:28:02 AM, Jake McArthur (jake.mcarthur at gmail.com)
> wrote:
>
> Vector also does something like what your are describing. I think the phrase
> to google for is "array recycling".
>
>
> On 11:56PM, Fri, Jul 31, 2015 William Yager <will.yager at gmail.com> wrote:
>>
>> Has anyone done any research into fusing operations that involve thawing
>> some data, mutating it, and then freezing it again?
>>
>> Data.Vector does something similar; it turns vectors into streams,
>> operates on the streams, and then turns them back into vectors. It can fuse
>> these operations by removing intermediate  However, I've done a bit of
>> preliminary work on a fusion system that works on operations of the form
>>
>>     runST $ do
>>         x' <- thaw x
>>         foo x'
>>         freeze x'
>>
>> and yields promising results in some cases. This could be useful for data
>> structures that aren't stream-able, but suffer from lots of unnecessary
>> freezing and unfreezing.
>>
>> This seems like the sort of thing that someone would have already done, so
>> I wanted to check if anyone knew of any previous work on this.
>>
>> Cheers,
>> Will
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>

Sorry, I know I already sent this once, but it looks like the
formatting got a little weird, so for anyone who couldn't read it
before, I'll try this again:

I’d be interested in knowing about the preliminary work you’re doing
here. I’ve got a port of Clojure’s persistent vectors (pvector for
short) in development right now, and I’m working on figuring out how
to batch consecutive pure modifications of a pvector into doing
cheaper operations on the mutable version before converting it into
the immutable version again.

Here are the operations scenarios that I’m especially interested in
optimizing this way:

{- each of these requires cloning a 32-element array.
   optimizing consecutive snocs is important because
   fromList is defined as (foldl' snoc empty)
-}
update1 :: Vector a -> Int -> a -> Vector a
snoc :: Vector a -> a -> Vector a
unsnoc :: Vector a -> Maybe (a, Vector a)

{- requires cloning <= N/32 arrays, where N is length of update batch -}
update :: Vector a -> Vector (Int, a) -> Vector a

{- would be nice to preallocate a fully-sized pvector for consecutive concats.
   for example (\vs -> foldr concat empty vs)
-}
concat :: Vector a -> Vector a -> Vector a

I only just started working on freeze/thaw fusion myself, so I haven’t
gotten far at all. One trick that I’ve got in place in the transient
structure to make this cheaper though is how I track modifications of
the persistent vector. The immutable version is represented like so:

data Node a
  = Leaf   !(Array a) {- Array here from primitive package -}
  | Branch !(Array (Node a))
  | EmptyNode

data Vector a = Vector
  { vCount     :: !Int
  , vShift     :: !Int
  , vRoot      :: !(Node a)
  , vTail      :: !(Array a)
  , vTailCount :: !Int
  }

And then here’s the transient version:

data TransientNode s a
  = UneditedLeaf   !(Array a)
  | UneditedBranch !(Array (Node a))
  | EditedLeaf     !(MutableArray s a)
  | EditedBranch   !(MutableArray s (TransientNode s a))
  | EmptyTransientNode

data TransientVector s a = TransientVector
  { tvCount      :: !Int
  , tvShift      :: !Int
  , tvRoot       :: !(TransientNode s a)
  , tvTail       :: !(MutableArray s a)
  , tvTailEdited :: !Bool
  , tvTailCount  :: !Int
  }

When the persistent version is converted to the transient version, the
data structure tracks which children have been modified over the
course of operations on the transient structure. So, when the mutable
version is converted back into the persistent version, any of the
underlying arrays which haven’t been touched are still shared with any
of the other live pvectors that are referencing them.