[Haskell-cafe] Question on optimization

Viktor Dukhovni ietf-dane at dukhovni.org
Thu Dec 17 06:48:34 UTC 2020


On Thu, Dec 17, 2020 at 12:15:19AM -0500, Viktor Dukhovni wrote:

> I don't know whether indirection via "foldr = go" with `go`
> recursive and `foldr` "INLINABLE" and ditto for `traverse` would
> create further opportunities for optimisation when the instances
> are in a separate module.  Someone else might clear that up, or
> you could benchmark and see.  IIRC, recursive functions don't
> directly inline without such tweaks, but the tweaks might not
> help much.

Naive experiments seem to suggest that inlining via a `go` indirection
that captures `f` is somewhat helpful, but the real win is using a right
fold, vs.  your original flatten implementation.  Here's a comparison
with a somewhat branchy deep structure.

Flatten via Foldable:
---------------------

       2,281,024 bytes allocated in the heap
           3,312 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          25,224 bytes maximum slop
              17 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0005s    0.0005s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.002s  (  0.002s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.003s  (  0.003s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,234,320,346 bytes per MUT second

  Productivity  68.0% of total user, 68.0% of total elapsed

The posted flatten (1000x more allocations! 200x runtime):
--------------------

   2,651,027,056 bytes allocated in the heap
      11,386,968 bytes copied during GC
         164,424 bytes maximum residency (2 sample(s))
          29,320 bytes maximum slop
              27 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       104 colls,     0 par    0.015s   0.015s     0.0001s    0.0004s
  Gen  1         2 colls,     0 par    0.001s   0.001s     0.0007s    0.0010s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.715s  (  0.715s elapsed)
  GC      time    0.016s  (  0.016s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.732s  (  0.732s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    3,705,952,327 bytes per MUT second

  Productivity  97.7% of total user, 97.7% of total elapsed

-- 
    Viktor.

{-# LANGUAGE DeriveFunctor #-}

module Nest
    ( Nest(..)
    ) where

data Nest a = Nil
            | Cons a (Nest a)
            | Nest (Nest a) (Nest a)
    deriving (Show, Functor)

instance Foldable Nest where
    {-# INLINE foldr #-}
    foldr f = go
      where
        go z Nil = z
        go z (Cons x n) = f x (go z n)
        go z (Nest x y) = go (go z y) x

instance Traversable Nest where
    {-# INLINE traverse #-}
    traverse f = go
      where
        go Nil = pure Nil
        go (Cons x n) = Cons <$> f x <*> go n
        go (Nest x y) = Nest <$> go x <*> go y


More information about the Haskell-Cafe mailing list