[Haskell-cafe] Re: Another optimization question

Sun May 18 01:45:49 EDT 2008

Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> 
> Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
> >
> > Why not -O3?
> 
> As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than 
> -O2.
> >
> > > Using a real list of primes,
> >
> > What's the size of the real list?
> 
> arbitrary, however, since it's [Int], it will actually be at most 105097565 
> primes long, but since only numbers up to 5000 are checked, only 670 primes 
> will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 
> 5134 primes evaluated), with -O0 / -O2, it's
> Jeroen 1 : 14.5 s / 2.4 s
> Jeroen 2 : 52.5 s / 49.7 s
> (== x) . head . dropWhile (< x) : 8.2 s /4.1 s
> go primes : 5.5 s / 2.5 s
> 
> So Jeroen 1 is the best to be optimised :)
> 
> > >> but another
> > >> implementation of isPrime:
> > >>
> > >> isPrime x = (== x) $ head $ dropWhile (< x) primes
> > >
> > > With -O2, this is about 20% slower than the Jeroen's first version,
> > > without optimisations 50% faster.
> > > Strange.
> >
> > Well, head has its overhead as well.  Cf. two variants:
> >
> > firstNotLess :: Int -> [Int] -> Int
> > firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x
> >
> > dropLess :: Int -> [Int] -> [Int]
> > dropLess s l@(x:xs) = if x < s then dropLess s xs else l
> >
> > isPrime :: Int -> Bool
> > isPrime x = x == (firstNotLess x primes)
> >
> > isPrime' :: Int -> Bool
> > isPrime' x = x == (head $ dropLess x primes)
> >
> > On my box firstNotLess gives numbers pretty close (if not better) than
> 
> for primes up to 50000, that's
> 6.8 s / 2.1 s with -O0 / -O2 respectively on mine
> 
> > Jeroen's first variant, while head $ dropLess notably worse.
> 
> 5.8 s / 2.4 s here.
> >
> > > isPrime :: Int -> Bool
> > > isPrime x = go primes
> > >      where
> > >        go (p:ps) = case compare x p of
> > >                        LT -> False
> > >                        EQ -> True
> > >                        GT -> go ps
> > >
> > > does best (on my box), with and without optimisations (very very slightly
> > > with -O2) for a list of real primes, but not for [1 .. ].
> >
> > And what happens for [1..]?
> 
> With -O2, Jeroen 1 was best (1.62), nex firstNotLess (1.63), head . dropLess 
> (1.74), then in close succesion Jeroen 2 (1.92), go primes (1.96) and head . 
> dropWhile (1.99),
> with -O0, it's
> head . dropWhile (1.7 s, YES, that actually is faster with -O0 than with 
> -O2!), head . dropLess (2.0), Jeroen 2 and firstNotLess (2.1 s), go primes 
> (2.3 s), Jeroen 1 (3.2 s).
> 
> Weirder and weirder.
> 
> >
> > > However, more than can be squished out of fiddling with these versions
> > > could be gained from a better algorithm.
> >
> > Definitely.
> >
> > yours,
> > anton.
> 

Thanks for the responses so far!

I only tested with -O2 and my primes implementation
is the Sieve of Eratosthenes and has signature

primes :: Integral a => [a]

What's also quite strange is that experiment2 is about
10 times time slower than experiment1 when 
using primes (with the Eratosthenes formula) instead of [1..].

I redid the experiments with -prof and the output was quite revealing:

experiment1 (fastest):
total time  =        2.64 secs   (132 ticks @ 20 ms)
total alloc =     323,356 bytes  (excludes profiling overheads)

                        individual    inherited
COST CENTRE   entries  %time %alloc   %time %alloc

MAIN                 0   0.0    0.0   100.0  100.0
 main                1   0.0    0.5     0.0    0.5
 CAF                 4   0.0    0.0   100.0   99.0
  primes             1   9.8   61.9     9.8   61.9
  main               0   0.0   37.1    90.2   37.1
   isPrime        5000  90.2    0.0    90.2    0.0
 CAF                 4   0.0    0.4     0.0    0.4

experiment2 (slowest):
total time  =        6.12 secs   (306 ticks @ 20 ms)
total alloc = 350,473,356 bytes  (excludes profiling overheads)

                      individual    inherited
COST CENTRE entries  %time %alloc   %time %alloc

MAIN              0   0.0    0.0   100.0  100.0
 main             1   0.0    0.0     0.0    0.0
 CAF              4   0.0    0.0   100.0  100.0
  primes          1   0.0    0.1     0.0    0.1
  main            0   0.0    0.0   100.0   99.9
   isPrime     5000 100.0   99.9   100.0   99.9
 CAF              4   0.0    0.0     0.0    0.0

Would this be only because isPrime of experiment 2 builds
this temporary list (takeWhile) all the time?

Jeroen Baekelandt