Compact Normal Form and ST

Mon Jun 19 17:58:47 UTC 2017

Actually, this is more interesting than I thought!  Let me explain.

Let's take the compact API and make a hypothetical ST style
interface.  Here's one possibility:

    data Compact s a
    compact :: a -> ST s (Compact s a)
    getCompact :: Compact a -> a
    runST :: (forall s. ST s a) -> a

Let's consider the following code:

    let x1 = runST (fmap getCompact (compact big_value))
        x2 = runST (fmap getCompact (compact big_value))

GHC may CSE x1 and x2 into a single expression, which means that
what was previously two compact expressions turns into a single
one.

"But Edward", you may say, "That's fine, I only cared about the compact
region being distinct within a single ST block. If you really wanted
x1 and x2 to be distinct, you should have written it this way:"

    let (x1, x2) = runST $ do x1 <- fmap getCompact (compact big_value)
                              x2 <- fmap getCompact (compact big_value)
                              return (x1, x2)

which is indeed true!  So I suppose it depends on what your expectations
are.

Edward

Excerpts from Andrew Martin's message of 2017-06-19 09:06:39 -0400:
> I too am struggling to find a scenario in which this causes something to
> go wrong. Ending up with only one block instead of two copies of it
> seems like it could only be a problem, as Edward says, "if you were
> planning to use these blocks as separate allocation buffers for
> subsequent modifications". But, mutable byte arrays allocated by
> newByteArray# cannot escape runST, so I don't see how this could happen.
> Well, I guess if you returned a frozen array, and then you thawed it
> for subsequent modification outside of runST, that would be problematic,
> but CSE already makes that unsafe, even without involving compact
> regions. It's good to know that, for my purposes, I'm in the clear, but
> I would also like to hear from Edward clarifying the original statement.
> 
> -Andrew Martin
> 
> On Mon, Jun 19, 2017 at 4:41 AM, David Feuer <david.feuer at gmail.com> wrote:
> 
> > Can you explain how things could go wrong in ST, perhaps with an example?
> > It's hard to see the potential problem.
> >
> > On Jun 18, 2017 11:49 PM, "Edward Z. Yang" <ezyang at mit.edu> wrote:
> >
> > There are two senses of referential transparency here which should be
> > considered.  First is whether or not you will get the same value results
> > if you use the compact functionality in ST.  Here, the answer is yes.
> > Compact normal form has very trivial semantics in this domain, and
> > it would have been OK even to make compact normal forms be pure
> > functions.
> >
> > Second is whether or not the performance characteristics are preserved.
> > Here, the situation is different.  Most notably, pure expressions and
> > invocations of the same runST block may be commoned up (via an
> > optimization pass like CSE.)  In that case, what was previously two
> > separate compact blocks may be commoned up into a single one.  This
> > could be disaster if you were planning to use these blocks as separate
> > allocation buffers for subsequent modifications.
> >
> > This motivated specializing compact to IO.  It won't segfault if you
> > put it in ST, but the performance characteristics might change.
> >
> > Edward
> >
> > Excerpts from Andrew Martin's message of 2017-06-18 10:24:09 -0400:
> > > In the primops file where the compact normal form functions are
> > > documented (https://github.com/ghc/ghc/blob/adcd1c62b6d372f100ccf1d5d7c
> > d94f79aaac194/compiler/prelude/primops.txt.pp#L2487),
> > > I noticed that all of the functions have type signatures that constrain
> > > them to only being used in IO. For example:
> > >
> > >     compactAdd# :: Compact# -> a -> State# RealWorld -> (# State#
> > RealWorld, a #)
> > >
> > > I would like to know if generalizing these to allow them to work in ST
> > would
> > > be sound. That is, changing the type signature to:
> > >
> > >     compactAdd# :: Compact# -> a -> State# s -> (# State# s, a #)
> > >
> > > I'm not requesting that this change actually be made. I only want to
> > > know if using unsafeCoerce to create the second function for my own
> > > project would actually be sound.
> > >
> > > For those interested in knowing why I want this, it's because there are
> > > situation where I'm interested in building up a giant structure in a
> > > compact region, but in a way that doesn't actually require IO. I think
> > > it's a pity to have to use a type signature with IO and then call
> > > unsafePerformIO at the end instead of using the more constrained ST,
> > > and runST, which makes it clear that I'm not doing anything observable
> > > form the outside.
> > >
> > > As a minor bonus, the ST version of the Compact data type shouldn't need
> > > the lock that the IO version does, since concurrent calls to compactAdd
> > > are not possible.
> > >
> > > -Andrew Martin
> > >
> > _______________________________________________
> > Libraries mailing list
> > Libraries at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
> >
> >
> >
>