[Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

Sat Dec 22 18:52:11 EST 2007

Am Samstag, 22. Dezember 2007 22:57 schrieb Joost Behrends:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Of course, one minute after I sent my previous mail, I receive this one
> > :( However, one point, it might be faster to factor out all factors p in
> > found and only then compute the intsqrt, like
> >
> > found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++
> > replicate k p}
> > 	  where
> > 	    p = divisor x
> > 	    (xstop,k) = go (dividend x) 0
> > 	    go n m
> >
> > 		| r == 0 	= go q (m+1)
> > 		| otherwise = (n,m)
> >
> > 		  where
> > 		    (q,r) = n `divMod` p
>
> True - but be aware, that this will slightly slow down the computation for
> not multiple factors.

Why? You have to do one more division than the power of the prime dividing 
anyway. Or are you thinking about 'replicate 1 p' for simple factors? That 
will indeed be a bit slower than [p], but I dare say factorise a number with 
at least two prime squares or one prime cube dividing it, and you gain.

> And - as you recently noted - the really expensive
> part are all the tried factors, which do not divide the queried number.

Yes, for anything with sufficiently many small primes not dividing but one or 
more large, anything you do with the prime factors is peanuts.

Okay, make it (xstop,k) = go (dividend x `quot` p) 1, and replace divMod with 
quotRem. then you still have one superfluous division, but that's your 
making, not mine:

d2 x | dividend x `mod` divisor x > 0 =
     | otherwise = 

make it 

d2 x | r == 0	= x{divisor = divisor x + 2}
     | otherwise = x{dividend = xstop, bound = intsqrt xstop, result = result 
x ++ replicate k p}
	where
	  p = divisor x
	  (q,r) = dividend x `quotRem` p
	  (xstop,k) = go q 1
	  go n m
	    | r' == 0 = go q' (m+1)
	    | otherwise = (n,m)
	      where
		(q',r') = n `quotRem` p

and you have no superfluous division, exactly one intsqrt per prime dividing 
your number.
>
> All this is just a first approach to the problem. When i talk of "naively
> programmed", then i want to say, that number theorists might have much
> better numerical orders marching through all primes plus some more odd
> numbers. I didn't search for that on the net.

A simple fast method, although not guaranteed to succeed is Pollard's 
rho-method, preceded by trial division by small primes (up to 1,2,10 million 
or so, depending on what you have) and a good pseudo-primality test (google 
for Rabin-Miller, if you're interested).

I haven't but skimmed your code below, but I think the idea is something like

factor :: Integer -> [Integer]
factor 0 = error "0 has no decomposition"
factor 1 = []
factor n
    | n < 0     = (-1):factor (-n)
    | n < 4     = [n]
    | otherwise = go n srn tests
      where
        srn = isqrt n
        go m sr pps@(p:ps)
            | r == 0    = p:go q (isqrt q) pps
            | p > sr   = if m == 1 then [] else [m]
            | otherwise = go m sr ps
              where
                (q,r) = m `quotRem` p
        tests = 2:3:5:7:11:scanl (+) 13 (tail $ cycle dlist)

isqrt :: Integer -> Integer
isqrt n
    | n < 10^60 = floor (sqrt $ fromInteger n)
    | otherwise = 10^20*isqrt (n `quot` 10^40)

dlist :: [Integer]
dlist = zipWith (-) (tail wheel) wheel

smallPrimes :: [Integer]
smallPrimes = [2,3,5,7,11]

wheelMod :: Integer
wheelMod = product smallPrimes

wheel :: [Integer]
wheel = [r | r <- [1,3 .. wheelMod+1]
            , all ((/= 0) . (mod r)) (tail smallPrimes)]

Now, the construction of the wheel and the list of differences, dlist, is 
relatively time-consuming, but can be sped up by using a good sieving method 
(my choice would be using an array (STUArray s Int Bool)). Of course, when 
you go much beyond 13 with smallPrimes, you'll end up with a rather large 
dlist in your memory, but smallPrimes = [2,3,5,7,11,13,17] should still be 
okay, no indexing means it's probably faster than Seq.

Cheers,
Daniel

>
> The last version was some kind of resign from tries like this:
>
> firstPrimes = [3,5,7,11,13,17]
> start = last firstPrimes
> pac = product firstPrimes
> slen = length lsumds
>
> lsumds = drop 1 (fst$getSummands (singleton start, start)) where
>     getSummands :: (Seq Int, Int) -> (Seq Int, Int)
>     getSummands r |snd r < bnd    = getSummands ((fst r)|>k, snd r + k)
>
> 		  |otherwise      = r
>
>         where
>             bnd = 2*pac + start
>             k = getNext (snd r)
>             getNext n |and [(n+2)`mod`x>0 | x<-firstPrimes] = 2
>
>                       |otherwise                            = 2 + getNext
>                       | (n+2)
>
> smallmod :: Int -> Int -> Int
> smallmod n m | n<m = n | otherwise = 0
>
> divstep :: (DivIter,Int) -> (DivIter, Int)
> divstep (x,n) | and [(fromInteger $ divisor x)<start, ximod>0] =
>                           (x {divisor = divisor x + 2}, n)
>
>               | (fromInteger$divisor x) < start =
>
>                                       (x {dividend = xidiv,
>                                           bound = intsqrt(xidiv),
>                                           result = result x ++ [divisor
> x]}, n)
>
>               | ximod>0 =
>
> (x {divisor = divisor x + toInteger (index lsumds n)}, smallmod (n+1) slen)
>
>               | otherwise = (x {dividend = xidiv,
>
>                                 bound    = intsqrt(xidiv),
>                                 result   = result x ++ [divisor x]}, n)
>     where
>         (xidiv, ximod) = divMod (dividend x) (divisor x)
>
> divisions :: (DivIter, Int) -> (DivIter, Int)
> divisions (y,n) | divisor y <= bound y = divisions (divstep (y,n))
>
>                 | otherwise            = (y,0)
>
> Here the additions to divisor are taken from the sequence lsmnds (List of
> SuMaNDS) - the type Seq from Data.Sequence is faster with the function
> index than Data.List with !!. getSummands is a kind of reduced sieve of
>
> Eratosthenes. The main improvement is the longest line:
> |ximod>0 = (x {divisor = divisor x + toInteger (index lsumds n)},
>
>                smallmod (n+1) slen)
>
> I even considered converting lsmnds to ByteString and storing them - the
> build of lsmnds for firstPrimes = [3,5,7,11,13,17,19,23,29] (which already
> has some MB footprint) takes several minutes.
>
> But we have to track the number of iteration we are in. And that eats up
> much more than the reduction of divisions for "failing" factors. The code
> works (called slightly modificated by primefactors), but needs 5.41 minutes
> for 2^61+1 :((. Also expensive might be the lookup in lsumds - the code
> gets even slower with longer lists for firstPrimes.
>
> divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) is derived from
>
> lsmnds [3,5] = [4,2,4,2,4,6,2,6].
>
> For me the whole matter is closed for now - the 1.34 minutes are no bad
> result. Amd anyway the code might represent a not too bad lower bound for
> efficiency of decomposing algorithms.
>
> Auf Wiedersehen, Joost
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe