[Haskell-cafe] ST Vector / STRef -- Performances and allocation

Sun Jun 19 10:37:24 UTC 2016

Thank you for this answer.

On Sun, Jun 19, 2016 at 1:08 AM, Amos Robinson <amos.robinson at gmail.com> wrote:
> Hi,
> You might want to add inline pragmas to "inc" and so on.
> STRef is boxed, so it will allocate. The 'URef' type here might be useful:
> http://hackage.haskell.org/package/mutable-containers-0.3.2.1/docs/Data-Mutable.html
> It is just a wrapper around unboxed mutable vector, as Carter suggests.

I added INLINE pragma on all my functions and it does not change
anything unfortunately.

I tried the package mutable-containers and the Api is really great,
but I'm surprised by the results :

- PRef (Unboxed bytearray of size 1, the closest thing to an unboxed
stack allocation) : 86ms
- URef (Unboxed vector of size 1) : 130ms
- SRef (Storable vector of size 1) : 43ms (This is an improvement !)
- BRef (Boxed ref) : 137ms
- STRef : 54ms (this was my baseline)

This is based on the creation of the Ref using a "mutable" function such as :

mutable v = asF <$> newRef v

(with asF == asPRef, asURef, ...)

However, if I'm providing a type signature for mutable, I got totally
different results :

mutable :: forall (f :: * -> *) a.
                 (Prim a, PrimMonad f) =>
                 a -> f (MyRef (PrimState f) a)
mutable v = asF <$> newRef v

(with type MyRef = PRef, URef, ...)

- PRef: 39ms (huge improvement, and winner as expected)
- URef: 45ms
- SRef: 43ms
- BRef: 84ms
- STRef: 54ms

I don't understand the speed difference which is only impacted by the
availability of a signature for mutable.

Now much of the allocation due to STRef have disappears. There is
still some which annoys me.

Most of the allocations comes from that line :

https://github.com/guibou/inversionsCount/blob/syntax/inversions.hs#L35

Which is nothing more that a boolean expression between different ST
action which are reading from unboxed Vector or PRef. So I was
expecting without allocation. Perhaps the fact that it uses "high
level" combinators such as "<$>", "<*>" and "=<<" is responsible ?

> The sad story is that in my experience, if you really want decent
> performance you will have to dump the Core and inspect it by hand. You'd
> then add inline and bang patterns based on the Core. I do not recommend it
> as a relaxing weekend exercise. I usually use an invocation something like:
>> ghc -ddump-prep -dppr-case-as-let -dsuppress-all -fforce-recomp
>> inversions.hs > inversions.hscore

Thank you ;) For now I rather prefer reading intel x86 assembler than
GHC core, but today is a rainy day, so I'll use it to start my
understanding of the GHC core ;)

-- 
G.