[Haskell-cafe] Performance best practices

Thu Aug 1 10:54:57 UTC 2019

To make the comparison more fair you should use the seqList function
instead of nf for evaluation the list functions. Then the list version
is actually faster (*sl is with my seqList function):

benchmarking splits1
time                 4.799 ms   (4.751 ms .. 4.857 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 4.854 ms   (4.821 ms .. 4.893 ms)
std dev              111.0 μs   (90.31 μs .. 136.2 μs)

benchmarking splits2
time                 12.18 ms   (11.93 ms .. 12.53 ms)
                     0.995 R²   (0.990 R² .. 0.999 R²)
mean                 12.82 ms   (12.61 ms .. 13.04 ms)
std dev              568.8 μs   (514.4 μs .. 636.1 μs)
variance introduced by outliers: 18% (moderately inflated)

benchmarking splits3
time                 4.118 ms   (4.067 ms .. 4.164 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 4.186 ms   (4.154 ms .. 4.229 ms)
std dev              114.2 μs   (89.12 μs .. 144.2 μs)
variance introduced by outliers: 11% (moderately inflated)

benchmarking splits1sl
time                 24.91 μs   (24.78 μs .. 25.04 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 24.78 μs   (24.67 μs .. 24.89 μs)
std dev              378.3 ns   (317.1 ns .. 466.5 ns)
variance introduced by outliers: 11% (moderately inflated)

benchmarking splits2sl
time                 8.903 ms   (8.835 ms .. 8.962 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 8.877 ms   (8.834 ms .. 8.924 ms)
std dev              124.0 μs   (104.2 μs .. 147.1 μs)

benchmarking splits3sl
time                 11.54 μs   (11.49 μs .. 11.58 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 11.59 μs   (11.54 μs .. 11.64 μs)
std dev              156.7 ns   (127.7 ns .. 207.3 ns)

benchmarking splits4 (boxed)
time                 1.903 ms   (1.894 ms .. 1.910 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.908 ms   (1.901 ms .. 1.915 ms)
std dev              25.61 μs   (20.81 μs .. 31.73 μs)

benchmarking splits4 (unboxed)
time                 33.63 μs   (33.51 μs .. 33.74 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 33.60 μs   (33.46 μs .. 33.76 μs)
std dev              493.7 ns   (386.8 ns .. 622.1 ns)
variance introduced by outliers: 10% (moderately inflated)

On 01-08-2019 12:48, William Yager wrote:
> In high performance Haskell, you will often find yourself using sequential
> structures besides lists. For example, an unboxed vector implementation is
> over 100x faster than any of the proposed list implementations.
> 
> Code: https://gist.github.com/wyager/45946f9f1531351468e4366b7ba168fa
> 
> Benchmark result:
> https://gist.github.com/wyager/96e7876a4b170d83dca971dd152e475e
> 
> GHC is very powerful, and can often do a surprisingly good job of
> optimizing away list allocations and such. However, in sharing-heavy
> applications like this (and if random indexing is helpful), Vectors can be
> much more efficient.
> 
> On Thu, Aug 1, 2019 at 5:25 PM Jaro Reinders <jaro.reinders at gmail.com>
> wrote:
> 
>> I just realized seqTails and seqInits are unnecessary, seqList is enough.
>>
>> On 01-08-2019 11:04, Jaro Reinders wrote:
>>> Replying to myself, you can actually write an evaluation function that
>>> forces all values in the result of tails and inits in linear time:
>>>
>>>     -- force the result of tails in linear time
>>>     seqTails (x:xs) = x `deepseq` seqList xs
>>>     seqTails [] = ()
>>>
>>>     -- force all the values in a list to whnf in linear time
>>>     -- https://wiki.haskell.org/Weak_head_normal_form
>>>     seqList (x:xs) = x `seq` seqList xs
>>>     seqList [] = ()
>>>
>>>     -- force the result of inits in linear time
>>>     seqInits xs = last xs `deepseq` seqList xs
>>>
>>> Try it in ghci with :sprint
>>> (
>> https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghci.html#ghci-cmd-:sprint
>> ):
>>>
>>>     > let x = tails [1..3::Int]
>>>     > :sprint x
>>>     x = _
>>>     > seqTails x
>>>     ()
>>>     > :sprint x
>>>     x = [[1,2,3],[2,3],[3],[]]
>>>
>>>     > let y = inits [1..3::Int]
>>>     > :sprint y
>>>     y = _
>>>     > seqInits y
>>>     ()
>>>     > :sprint y
>>>     y = [[],[1],[1,2],[1,2,3]]
>>>
>>> Using criterion you can see that it is actually linear time:
>>>
>>>     main = defaultMain
>>>       [ bgroup "inits"
>>>         [ bench "1000" $ whnf (seqInits . inits) [1..1000 :: Int]
>>>         , bench "10000" $ whnf (seqInits . inits) [1..10000 :: Int]
>>>         , bench "100000" $ whnf (seqInits . inits) [1..100000 :: Int]
>>>         , bench "1000000" $ whnf (seqInits . inits) [1..1000000 :: Int]
>>>         ]
>>>       , bgroup "tails"
>>>         [ bench "1000" $ whnf (seqTails . tails) [1..1000 :: Int]
>>>         , bench "10000" $ whnf (seqTails . tails) [1..10000 :: Int]
>>>         , bench "100000" $ whnf (seqTails . tails) [1..100000 :: Int]
>>>         , bench "1000000" $ whnf (seqTails . tails) [1..1000000 :: Int]
>>>         ]
>>>       ]
>>>
>>>
>>>     benchmarking inits/1000
>>>     time                 204.2 μs   (203.2 μs .. 205.4 μs)
>>>                          1.000 R²   (1.000 R² .. 1.000 R²)
>>>     mean                 203.4 μs   (202.8 μs .. 204.1 μs)
>>>     std dev              2.163 μs   (1.755 μs .. 2.664 μs)
>>>
>>>     benchmarking inits/10000
>>>     time                 3.127 ms   (3.107 ms .. 3.148 ms)
>>>                          1.000 R²   (0.999 R² .. 1.000 R²)
>>>     mean                 3.105 ms   (3.088 ms .. 3.118 ms)
>>>     std dev              45.73 μs   (32.97 μs .. 69.14 μs)
>>>
>>>     benchmarking inits/100000
>>>     time                 41.05 ms   (39.11 ms .. 42.87 ms)
>>>                          0.993 R²   (0.988 R² .. 0.998 R²)
>>>     mean                 41.52 ms   (40.62 ms .. 42.46 ms)
>>>     std dev              1.912 ms   (1.330 ms .. 2.930 ms)
>>>     variance introduced by outliers: 12% (moderately inflated)
>>>
>>>     benchmarking inits/1000000
>>>     time                 423.0 ms   (318.2 ms .. 535.5 ms)
>>>                          0.991 R²   (0.969 R² .. 1.000 R²)
>>>     mean                 459.1 ms   (428.8 ms .. 505.2 ms)
>>>     std dev              44.05 ms   (10.06 ms .. 58.49 ms)
>>>     variance introduced by outliers: 22% (moderately inflated)
>>>
>>>
>>>
>>>     benchmarking tails/1000
>>>     time                 8.811 μs   (8.768 μs .. 8.873 μs)
>>>                          1.000 R²   (0.999 R² .. 1.000 R²)
>>>     mean                 8.874 μs   (8.819 μs .. 8.963 μs)
>>>     std dev              225.7 ns   (168.4 ns .. 325.2 ns)
>>>     variance introduced by outliers: 28% (moderately inflated)
>>>
>>>     benchmarking tails/10000
>>>     time                 87.21 μs   (86.85 μs .. 87.79 μs)
>>>                          1.000 R²   (0.999 R² .. 1.000 R²)
>>>     mean                 87.42 μs   (87.01 μs .. 87.88 μs)
>>>     std dev              1.481 μs   (1.132 μs .. 1.953 μs)
>>>     variance introduced by outliers: 11% (moderately inflated)
>>>
>>>     benchmarking tails/100000
>>>     time                 886.9 μs   (882.9 μs .. 890.9 μs)
>>>                          1.000 R²   (1.000 R² .. 1.000 R²)
>>>     mean                 881.5 μs   (878.1 μs .. 885.7 μs)
>>>     std dev              12.40 μs   (9.598 μs .. 18.97 μs)
>>>
>>>     benchmarking tails/1000000
>>>     time                 9.796 ms   (9.757 ms .. 9.840 ms)
>>>                          1.000 R²   (1.000 R² .. 1.000 R²)
>>>     mean                 9.817 ms   (9.791 ms .. 9.845 ms)
>>>     std dev              78.47 μs   (60.63 μs .. 99.31 μs)
>>>
>>>
>>> On 01-08-2019 10:25, Jaro Reinders wrote:
>>>> If you fully evaluate the list produced by tails, then you're still
>>>> spending O(n^2) time, because that is just the size of the produced
>>>> list. But constructing the list and the memory taken by the list is
>>>> O(n), because most of the lists are shared
>>>> (https://wiki.haskell.org/Sharing).
>>>>
>>>> On 01-08-2019 04:45, Todd Wilson wrote:
>>>>> It seems that, asymptotically, tails is O(n) while inits is O(n^2) in
>>>>> both time and space (when fully evaluated)
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>