[Haskell-cafe] Performance best practices

Thu Aug 1 09:04:19 UTC 2019

Replying to myself, you can actually write an evaluation function that
forces all values in the result of tails and inits in linear time:

    -- force the result of tails in linear time
    seqTails (x:xs) = x `deepseq` seqList xs
    seqTails [] = ()

    -- force all the values in a list to whnf in linear time
    -- https://wiki.haskell.org/Weak_head_normal_form
    seqList (x:xs) = x `seq` seqList xs
    seqList [] = ()

    -- force the result of inits in linear time
    seqInits xs = last xs `deepseq` seqList xs

Try it in ghci with :sprint
(https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghci.html#ghci-cmd-:sprint):

    > let x = tails [1..3::Int]
    > :sprint x
    x = _
    > seqTails x
    ()
    > :sprint x
    x = [[1,2,3],[2,3],[3],[]]

    > let y = inits [1..3::Int]
    > :sprint y
    y = _
    > seqInits y
    ()
    > :sprint y
    y = [[],[1],[1,2],[1,2,3]]

Using criterion you can see that it is actually linear time:

    main = defaultMain
      [ bgroup "inits"
        [ bench "1000" $ whnf (seqInits . inits) [1..1000 :: Int]
        , bench "10000" $ whnf (seqInits . inits) [1..10000 :: Int]
        , bench "100000" $ whnf (seqInits . inits) [1..100000 :: Int]
        , bench "1000000" $ whnf (seqInits . inits) [1..1000000 :: Int]
        ]
      , bgroup "tails"
        [ bench "1000" $ whnf (seqTails . tails) [1..1000 :: Int]
        , bench "10000" $ whnf (seqTails . tails) [1..10000 :: Int]
        , bench "100000" $ whnf (seqTails . tails) [1..100000 :: Int]
        , bench "1000000" $ whnf (seqTails . tails) [1..1000000 :: Int]
        ]
      ]

    benchmarking inits/1000
    time                 204.2 μs   (203.2 μs .. 205.4 μs)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 203.4 μs   (202.8 μs .. 204.1 μs)
    std dev              2.163 μs   (1.755 μs .. 2.664 μs)

    benchmarking inits/10000
    time                 3.127 ms   (3.107 ms .. 3.148 ms)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 3.105 ms   (3.088 ms .. 3.118 ms)
    std dev              45.73 μs   (32.97 μs .. 69.14 μs)

    benchmarking inits/100000
    time                 41.05 ms   (39.11 ms .. 42.87 ms)
                         0.993 R²   (0.988 R² .. 0.998 R²)
    mean                 41.52 ms   (40.62 ms .. 42.46 ms)
    std dev              1.912 ms   (1.330 ms .. 2.930 ms)
    variance introduced by outliers: 12% (moderately inflated)

    benchmarking inits/1000000
    time                 423.0 ms   (318.2 ms .. 535.5 ms)
                         0.991 R²   (0.969 R² .. 1.000 R²)
    mean                 459.1 ms   (428.8 ms .. 505.2 ms)
    std dev              44.05 ms   (10.06 ms .. 58.49 ms)
    variance introduced by outliers: 22% (moderately inflated)

    benchmarking tails/1000
    time                 8.811 μs   (8.768 μs .. 8.873 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 8.874 μs   (8.819 μs .. 8.963 μs)
    std dev              225.7 ns   (168.4 ns .. 325.2 ns)
    variance introduced by outliers: 28% (moderately inflated)

    benchmarking tails/10000
    time                 87.21 μs   (86.85 μs .. 87.79 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 87.42 μs   (87.01 μs .. 87.88 μs)
    std dev              1.481 μs   (1.132 μs .. 1.953 μs)
    variance introduced by outliers: 11% (moderately inflated)

    benchmarking tails/100000
    time                 886.9 μs   (882.9 μs .. 890.9 μs)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 881.5 μs   (878.1 μs .. 885.7 μs)
    std dev              12.40 μs   (9.598 μs .. 18.97 μs)

    benchmarking tails/1000000
    time                 9.796 ms   (9.757 ms .. 9.840 ms)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 9.817 ms   (9.791 ms .. 9.845 ms)
    std dev              78.47 μs   (60.63 μs .. 99.31 μs)

On 01-08-2019 10:25, Jaro Reinders wrote:
> If you fully evaluate the list produced by tails, then you're still
> spending O(n^2) time, because that is just the size of the produced
> list. But constructing the list and the memory taken by the list is
> O(n), because most of the lists are shared
> (https://wiki.haskell.org/Sharing).
> 
> On 01-08-2019 04:45, Todd Wilson wrote:
>> It seems that, asymptotically, tails is O(n) while inits is O(n^2) in
>> both time and space (when fully evaluated)