[Haskell-cafe] Performance best practices
Jaro Reinders
jaro.reinders at gmail.com
Thu Aug 1 10:54:57 UTC 2019
To make the comparison more fair you should use the seqList function
instead of nf for evaluation the list functions. Then the list version
is actually faster (*sl is with my seqList function):
benchmarking splits1
time 4.799 ms (4.751 ms .. 4.857 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 4.854 ms (4.821 ms .. 4.893 ms)
std dev 111.0 μs (90.31 μs .. 136.2 μs)
benchmarking splits2
time 12.18 ms (11.93 ms .. 12.53 ms)
0.995 R² (0.990 R² .. 0.999 R²)
mean 12.82 ms (12.61 ms .. 13.04 ms)
std dev 568.8 μs (514.4 μs .. 636.1 μs)
variance introduced by outliers: 18% (moderately inflated)
benchmarking splits3
time 4.118 ms (4.067 ms .. 4.164 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 4.186 ms (4.154 ms .. 4.229 ms)
std dev 114.2 μs (89.12 μs .. 144.2 μs)
variance introduced by outliers: 11% (moderately inflated)
benchmarking splits1sl
time 24.91 μs (24.78 μs .. 25.04 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 24.78 μs (24.67 μs .. 24.89 μs)
std dev 378.3 ns (317.1 ns .. 466.5 ns)
variance introduced by outliers: 11% (moderately inflated)
benchmarking splits2sl
time 8.903 ms (8.835 ms .. 8.962 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 8.877 ms (8.834 ms .. 8.924 ms)
std dev 124.0 μs (104.2 μs .. 147.1 μs)
benchmarking splits3sl
time 11.54 μs (11.49 μs .. 11.58 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 11.59 μs (11.54 μs .. 11.64 μs)
std dev 156.7 ns (127.7 ns .. 207.3 ns)
benchmarking splits4 (boxed)
time 1.903 ms (1.894 ms .. 1.910 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.908 ms (1.901 ms .. 1.915 ms)
std dev 25.61 μs (20.81 μs .. 31.73 μs)
benchmarking splits4 (unboxed)
time 33.63 μs (33.51 μs .. 33.74 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 33.60 μs (33.46 μs .. 33.76 μs)
std dev 493.7 ns (386.8 ns .. 622.1 ns)
variance introduced by outliers: 10% (moderately inflated)
On 01-08-2019 12:48, William Yager wrote:
> In high performance Haskell, you will often find yourself using sequential
> structures besides lists. For example, an unboxed vector implementation is
> over 100x faster than any of the proposed list implementations.
>
> Code: https://gist.github.com/wyager/45946f9f1531351468e4366b7ba168fa
>
> Benchmark result:
> https://gist.github.com/wyager/96e7876a4b170d83dca971dd152e475e
>
> GHC is very powerful, and can often do a surprisingly good job of
> optimizing away list allocations and such. However, in sharing-heavy
> applications like this (and if random indexing is helpful), Vectors can be
> much more efficient.
>
> On Thu, Aug 1, 2019 at 5:25 PM Jaro Reinders <jaro.reinders at gmail.com>
> wrote:
>
>> I just realized seqTails and seqInits are unnecessary, seqList is enough.
>>
>> On 01-08-2019 11:04, Jaro Reinders wrote:
>>> Replying to myself, you can actually write an evaluation function that
>>> forces all values in the result of tails and inits in linear time:
>>>
>>> -- force the result of tails in linear time
>>> seqTails (x:xs) = x `deepseq` seqList xs
>>> seqTails [] = ()
>>>
>>> -- force all the values in a list to whnf in linear time
>>> -- https://wiki.haskell.org/Weak_head_normal_form
>>> seqList (x:xs) = x `seq` seqList xs
>>> seqList [] = ()
>>>
>>> -- force the result of inits in linear time
>>> seqInits xs = last xs `deepseq` seqList xs
>>>
>>> Try it in ghci with :sprint
>>> (
>> https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghci.html#ghci-cmd-:sprint
>> ):
>>>
>>> > let x = tails [1..3::Int]
>>> > :sprint x
>>> x = _
>>> > seqTails x
>>> ()
>>> > :sprint x
>>> x = [[1,2,3],[2,3],[3],[]]
>>>
>>> > let y = inits [1..3::Int]
>>> > :sprint y
>>> y = _
>>> > seqInits y
>>> ()
>>> > :sprint y
>>> y = [[],[1],[1,2],[1,2,3]]
>>>
>>> Using criterion you can see that it is actually linear time:
>>>
>>> main = defaultMain
>>> [ bgroup "inits"
>>> [ bench "1000" $ whnf (seqInits . inits) [1..1000 :: Int]
>>> , bench "10000" $ whnf (seqInits . inits) [1..10000 :: Int]
>>> , bench "100000" $ whnf (seqInits . inits) [1..100000 :: Int]
>>> , bench "1000000" $ whnf (seqInits . inits) [1..1000000 :: Int]
>>> ]
>>> , bgroup "tails"
>>> [ bench "1000" $ whnf (seqTails . tails) [1..1000 :: Int]
>>> , bench "10000" $ whnf (seqTails . tails) [1..10000 :: Int]
>>> , bench "100000" $ whnf (seqTails . tails) [1..100000 :: Int]
>>> , bench "1000000" $ whnf (seqTails . tails) [1..1000000 :: Int]
>>> ]
>>> ]
>>>
>>>
>>> benchmarking inits/1000
>>> time 204.2 μs (203.2 μs .. 205.4 μs)
>>> 1.000 R² (1.000 R² .. 1.000 R²)
>>> mean 203.4 μs (202.8 μs .. 204.1 μs)
>>> std dev 2.163 μs (1.755 μs .. 2.664 μs)
>>>
>>> benchmarking inits/10000
>>> time 3.127 ms (3.107 ms .. 3.148 ms)
>>> 1.000 R² (0.999 R² .. 1.000 R²)
>>> mean 3.105 ms (3.088 ms .. 3.118 ms)
>>> std dev 45.73 μs (32.97 μs .. 69.14 μs)
>>>
>>> benchmarking inits/100000
>>> time 41.05 ms (39.11 ms .. 42.87 ms)
>>> 0.993 R² (0.988 R² .. 0.998 R²)
>>> mean 41.52 ms (40.62 ms .. 42.46 ms)
>>> std dev 1.912 ms (1.330 ms .. 2.930 ms)
>>> variance introduced by outliers: 12% (moderately inflated)
>>>
>>> benchmarking inits/1000000
>>> time 423.0 ms (318.2 ms .. 535.5 ms)
>>> 0.991 R² (0.969 R² .. 1.000 R²)
>>> mean 459.1 ms (428.8 ms .. 505.2 ms)
>>> std dev 44.05 ms (10.06 ms .. 58.49 ms)
>>> variance introduced by outliers: 22% (moderately inflated)
>>>
>>>
>>>
>>> benchmarking tails/1000
>>> time 8.811 μs (8.768 μs .. 8.873 μs)
>>> 1.000 R² (0.999 R² .. 1.000 R²)
>>> mean 8.874 μs (8.819 μs .. 8.963 μs)
>>> std dev 225.7 ns (168.4 ns .. 325.2 ns)
>>> variance introduced by outliers: 28% (moderately inflated)
>>>
>>> benchmarking tails/10000
>>> time 87.21 μs (86.85 μs .. 87.79 μs)
>>> 1.000 R² (0.999 R² .. 1.000 R²)
>>> mean 87.42 μs (87.01 μs .. 87.88 μs)
>>> std dev 1.481 μs (1.132 μs .. 1.953 μs)
>>> variance introduced by outliers: 11% (moderately inflated)
>>>
>>> benchmarking tails/100000
>>> time 886.9 μs (882.9 μs .. 890.9 μs)
>>> 1.000 R² (1.000 R² .. 1.000 R²)
>>> mean 881.5 μs (878.1 μs .. 885.7 μs)
>>> std dev 12.40 μs (9.598 μs .. 18.97 μs)
>>>
>>> benchmarking tails/1000000
>>> time 9.796 ms (9.757 ms .. 9.840 ms)
>>> 1.000 R² (1.000 R² .. 1.000 R²)
>>> mean 9.817 ms (9.791 ms .. 9.845 ms)
>>> std dev 78.47 μs (60.63 μs .. 99.31 μs)
>>>
>>>
>>> On 01-08-2019 10:25, Jaro Reinders wrote:
>>>> If you fully evaluate the list produced by tails, then you're still
>>>> spending O(n^2) time, because that is just the size of the produced
>>>> list. But constructing the list and the memory taken by the list is
>>>> O(n), because most of the lists are shared
>>>> (https://wiki.haskell.org/Sharing).
>>>>
>>>> On 01-08-2019 04:45, Todd Wilson wrote:
>>>>> It seems that, asymptotically, tails is O(n) while inits is O(n^2) in
>>>>> both time and space (when fully evaluated)
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
More information about the Haskell-Cafe
mailing list