[Haskell-cafe] Cost: (:) vs head

Sat Sep 11 10:16:09 EDT 2010

On Saturday 11 September 2010 14:46:48, Bas van Dijk wrote:
> On Sat, Sep 11, 2010 at 5:13 AM, michael rice <nowgate at yahoo.com> wrote:
> > Which of these would be more costly for a long list?
> >
> > f :: [Int] -> [Int]
> > f [x] = [x]
> > f (x:xs) = x + (head xs) : f xs
> >
> > f :: [Int] -> [Int]
> > f [x] = [x]
> > f (x:y:xs) = x + y : f (y:xs)
>
> So to summarize from fastest to slowest:
>
> f1: mean: 80.04258 ms
> f4: mean: 80.80089 ms
> f3: mean: 82.66586 ms
> f2: mean: 83.13315 ms
>
>
> To find out why f1 is the fastest you can look at the core using
> ghc-core[2]:
>

f4_go :: Int -> [Int] -> [Int]

>
> f3_$sf3 :: [Int] -> Int -> [Int]

>
> f2_$sf2 :: [Int] -> Int -> [Int]

>
> I don't immediately see the reason for the time difference between f4,
> f3 and f2. The inner loops all seem equivalent.

I don't pretend to understand at the processor level why f4's loop is 
faster that f2's and f3's, but I've observed on several occasions that 
parameter order plays a big rôle for performance. Apart from the rule to 
order them in increasing order of variation, on my box, getting Int# 
parameters first (in the core) is better (in particular, getting them 
before Double# parameters, doesn't seem to make so much difference for 
lists or boxed types in general), so that might explain the difference.

And, just for kicks:

f5 :: [Int] -> [Int]
f5 [] = []
f5 xs@(_:ys) = zipWith (+) xs (ys ++ [0])

and for speed:

f6 :: [Int] -> [Int]
f6 [] = []
f6 (x:xs) = go x xs
  where
    go y [] = [y]
    go y (z:zs) = let s = y+z in s `seq` (s : go z zs)

f7 :: [Int] -> [Int]
f7 [] = []
f7 (x:xs) = go xs x
  where
    go [] y = [y]
    go (z:zs) y = let s = y+z in s `seq` (s : go zs z)

f7: mean: 32.06289 ms
f6: mean: 32.70934 ms
f1: mean: 39.27808 ms
f4: mean: 40.30768 ms
f2: mean: 41.05561 ms
f3: mean: 41.49728 ms
f5: mean: 59.87034 ms

Well, actually it isn't so clear cut between f2, f3 and f4, I've even had 
benchmark runs where f4 was slower than f2 and f3, the order of f2 and f3 
changes too. Also sometimes f6 is faster than f7, but usually f7 is a 
little faster than f6, there's a largish gap to f1, a smaller gap to f4, 
closely followed by f2 and f3, f5 trailing by a long distance.

So, let's look at the core for f6 and f7 (inner loops only):

Rec {
TestFuns.f6_$sgo :: [GHC.Types.Int]
                    -> GHC.Prim.Int#
                    -> [GHC.Types.Int]
GblId
[Arity 2
 NoCafRefs
 Str: DmdType SL]
TestFuns.f6_$sgo =
  \ (sc_sok :: [GHC.Types.Int]) (sc1_sol :: GHC.Prim.Int#) ->
    case sc_sok of _ {
      [] ->
        GHC.Types.:
          @ GHC.Types.Int
          (GHC.Types.I# sc1_sol)
          (GHC.Types.[] @ GHC.Types.Int);
      : z_ae4 zs_ae5 ->
        case z_ae4 of _ { GHC.Types.I# y_amw ->
        GHC.Types.:
          @ GHC.Types.Int
          (GHC.Types.I# (GHC.Prim.+# sc1_sol y_amw))
          (TestFuns.f6_$sgo zs_ae5 y_amw)
        }
    }
end Rec }

Yay, the Int parameter got unboxed, so one visit less to the heap per 
round. But GHC switched the parameter order, so let's try to fix that by 
switching the order ourselves:

Rec {
TestFuns.f7_$sgo :: GHC.Prim.Int#
                    -> [GHC.Types.Int]
                    -> [GHC.Types.Int]
GblId
[Arity 2
 NoCafRefs
 Str: DmdType LS]
TestFuns.f7_$sgo =
  \ (sc_sot :: GHC.Prim.Int#) (sc1_sou :: [GHC.Types.Int]) ->
    case sc1_sou of _ {
      [] ->
        GHC.Types.:
          @ GHC.Types.Int
          (GHC.Types.I# sc_sot)
          (GHC.Types.[] @ GHC.Types.Int);
      : z_aeb zs_aec ->
        case z_aeb of _ { GHC.Types.I# y_amw ->
        GHC.Types.:
          @ GHC.Types.Int
          (GHC.Types.I# (GHC.Prim.+# sc_sot y_amw))
          (TestFuns.f7_$sgo y_amw zs_aec)
        }
    }
end Rec }

Exactly what I wanted :)

>
> Regards,
>
> Bas