Finding primes using a primes map with Haskell and Hugs98

Elke Kasimir elke.kasimir@catmint.de
Sun, 17 Dec 2000 19:56:46 +0100 (CET)


This message is in MIME format
--_=XFMail.1.3.p0.Linux:001217195636:327=_
Content-Type: text/plain; charset=iso-8859-1

Your algorithm seems to be based on the following idea: 
calculate the non-primes and derive the primes from 
them by calculating the set difference of the natural numbers 
and the non-primes.

A naive implementation of this idea can be found as 

primes' 

in the attachached file. The function uses no multiplication 
or division and though performs 6 times worse than the sieve
in calculating the first 30000 primes.

The complexity for finding the next i'th prime with this naive
implementation is about O(i). In comparison to this, the sieve 
provides a good optimization because only those natural numbers 
are tested against the i'th prime which have run through all other
sieves.

Nevertheless, your algorithm is promising when the non-primes are 
merged efficiently enough into a single sorted list which can be easily
subtracted from the natural numbers.

I think the deployment of an array is basically a way to efficiently merge the 
multiples of the primaries into a sorted list (where even duplicates 
are removed), thus hoping to reduce the number of  the operations better 
than the optimization that is provided by the sieve.

However, to use arrays this way, you probably need destructive array updates, 
because the array must be incrementally updated when new primes are
found. I think that standard haskell arrays don't do the job very well. 

An implementation of the "merging" idea in Haskell is

primes''

in the attached file. It is 15% faster then the  sieve in calculating the 30000 
first primes.

The algorithm is realized as two mutually recursive functions 
noprimes and primes'', the latter  calculating the set difference between 
the non-primes  and the natural numbers, the former merging
the all multiples of all primes into a sorted list. It should be possible 
to substantially optimize the merging operation.

primes'''

is an efficient variant of primes'. Instead of a list it uses a binary tree 
for the management of the lists of multiples of the already found
primes, and thus requires some additional programming effort.

The complexity is reduced from O(i) to something
like O(Log(i)).

Compared with the sieve, primes''' needs only half 
the time to calculate the first 30000 primes.

(Tests with ghc 4.08, 64m heap)

Best,
Elke.


On 15-Dec-00 Shlomi Fish wrote:
> 
> Hi!
> 
> As some of you may know, a Haskell program that prints all the primes can be 
> as short as the following:
> 
> primes = sieve [2.. ] where
>          sieve (p:x) = p : sieve [ n | n <- x, n `mod` p > 0 ]
> 
> Now, this program roughly corresponds to the following perl program:
> 
>###### SNIP SNIP #####
>#!/usr/bin/perl
> 
> use strict;
> 
> my (@primes, $a, $p);
> @primes = (2);
> MAIN_LOOP: 
> for($a = 3; $a < 1000; $a++)
> {
>     foreach $p (@primes)
>     {
>         if ($a % $p == 0)
>         {
>             next MAIN_LOOP;
>         }
>     }
>     push @primes, $a;
> }
> print join(", ", @primes);
>####### SNIP SNIP #####
> 
> The program can be more optimized for both speed and code size, but I wanted
> to make it as verbose as possible.
> 
> The algorithm keeps a list of the primes, and for each new number checks if
> it
> is divisable by any of them and if not it adds it to the list.
> 
> There is a different algorithm which keeps a boolean map which tells whether
> the number at that position is prime or not. At start it is initialized to
> all
> trues. The algorithm iterates over all the numbers from 2 to the square root
> of the desired bound, and if it encounters a prime number it marks all the
> numbers p*p, p*p+p, p*p+2*p, p*p+3*p, etc. as not prime. It is generally
> considered a better algorithm than the previous one, because it uses less
> costier operations (multiplications and additions instead of modulos.)
> 
> The perl program that implements that algorithm is this:
> 
>#### SNIP SNIP #####
>#!/usr/bin/perl
> 
> use strict;
> 
> sub primes
> {
>     my $how_much = shift;
> 
>     my (@array, $bound, $a, $b, @primes);
> 
>     @array = (1) x $how_much;
> 
>     $bound = int(sqrt($how_much))+1;
> 
>     for($a=2;$a<=$bound;$a++)
>     {
>         if ($array[$a])
>         {
>             for($b=$a*$a;$b<$how_much;$b+=$a)
>             {
>                 $array[$b] = 0;
>             }
>             push @primes, $a;
>         }
>     }
>     for(;$a<$how_much;$a++)
>     {
>         if ($array[$a])
>         {
>             push @primes, $a;
>         }
>     }
> 
>     return @primes;
> }
> 
> print join(", ", primes(1000));
>##### SNIP SNIP ######
> 
> Now, I tried writing an equivalent Haskell program and the best I could do
> was
> the following:
> 
> ---- SNIP SNIP -----
> module Primes where
> 
> import Prelude
> import Array
> 
> how_much :: Int
> how_much = 1000 
> 
> initial_primes_map :: Array Int Bool 
> initial_primes_map = array (1, how_much) [ (i,True) | i <- [1 .. how_much] ]
> 
> mybound :: Int
> mybound = ceiling(sqrt(fromInteger(toInteger(how_much))))
> 
> next_primes_map :: Int -> Array Int Bool -> Array Int Bool
> next_primes_map a primes_map = 
>     if (a == mybound) 
>     then primes_map 
>     else next_primes_map (a+1) (
>         if primes_map!a
>         then primes_map // [ (i*a, False) | i <- [a .. (prime_bound a)] ]
>         else primes_map
>         )
>     
> prime_bound :: Int -> Int
> prime_bound a =
> (floor(fromInteger(toInteger(how_much))/fromInteger(toInteger(a))))
> 
> get_primes_map :: Array Int Bool
> get_primes_map = (next_primes_map 2 initial_primes_map)
> 
> list_primes :: Array Int Bool -> Int -> [Int]
> list_primes primes_map n = 
>     if (n > how_much) 
>     then [] 
>     else 
>     (
>         if primes_map!n 
>         then n:(list_primes primes_map (n+1)) 
>         else list_primes primes_map (n+1)
>     )
> 
> show_primes = show (list_primes get_primes_map 2)
> ---- SNIP SNIP -----
> 
> 
> The problem is that when running it on hugs98 on a Windows98 computer with
> 64MB of RAM, I cannot seem to scale beyond 30,000 or so, as my boundary. When
> entering how_much as 50,000 I get the following message:
> 
> ERROR: Garbage collection fails to reclaim sufficient space
> 
> In perl I can scale beyond 100,000, and if I modify the code to use a bit
> vector (using vec) to much more. So my question is what am I or hugs are
> doing
> wrong and how I can write better code that implements this specific
> algorithm.
> 
>>From what I saw I used tail recursion, (and hugs98 has proper tail recursion,
> right?), and there's only one primes_map present at each iteration (and thus,
> at all), so it shouldn't be too problematic. Does it have to do with the way
> hugs98 implements and Int to Bool array?
> 
> Regards,
> 
>     Shlomi Fish
> 
> ----------------------------------------------------------------------
> Shlomi Fish        shlomif@vipe.technion.ac.il 
> Home Page:         http://t2.technion.ac.il/~shlomif/
> Home E-mail:       shlomif@techie.com
> 
> The prefix "God Said" has the extraordinary logical property of 
> converting any statement that follows it into a true one.
> 
> 
> _______________________________________________
> Haskell mailing list
> Haskell@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell

---
Elke Kasimir
Skalitzer Str. 79
10997 Berlin (Germany)
fon:  +49 (030) 612 852 16
mail: elke.kasimir@catmint.de>  
see: <http://www.catmint.de/elke>

for pgp public key see:
<http://www.catmint.de/elke/pgp_signature.html>

--_=XFMail.1.3.p0.Linux:001217195636:327=_
Content-Disposition: attachment; filename="Primes.hs"
Content-Transfer-Encoding: base64
Content-Description: Primes.hs
Content-Type: application/octet-stream; name=Primes.hs; SizeOnDisk=3056

bW9kdWxlIFByaW1lcwp3aGVyZQoKaW1wb3J0IExpc3QKCi0tIDEuIHZlcnNpb24sIHNpZXZlCgpw
cmltZXMgCiAgICA9IHNpZXZlIFsyLi5dIAogICAgICAgd2hlcmUgc2lldmUgKHg6eHMpID0geCA6
IHNpZXZlIFsgbiB8IG4gPC0geHMgLCBuIGBtb2RgIHggPiAwIF0gCgoKLS0gMi4gdmVyc2lvbjog
a2VlcCBhbiAidXB0by1kYXRlIiBsaXN0IG9mIHRoZSBub24tcHJpbWVzIAotLSAgICAgICAgICAg
ICAoYSBmaW5pdGUgbGlzdCBvZiBpbmlmaW5pdGUgbGlzdHMpCi0tICAgICAgICAgICAgIGFuZCBj
YWxjdWxhdGUgdGhlIHByaW1lcyBmcm9tIHRoZW0uCgpwcmltZXMnCiAgICA9IG1rUHJpbWVzIFtd
IFsyLi5dIAogICAgICB3aGVyZQogICAgICAgbWtQcmltZXMgbm9uX3ByaW1lcyAoeDp4cykgCgkg
ICB8IG51bGwgd2l0aFggPSB4IDogbWtQcmltZXMgKG11bHQgeCA6IG5vbl9wcmltZXMpICAgICAg
ICB4cwoJICAgfCBvdGhlcndpc2UgID0gICAgIG1rUHJpbWVzIChtYXAgdGFpbCB3aXRoWCArKyB3
aXRob3V0WCkgeHMKCSAgIHdoZXJlCgkgICAod2l0aFgsd2l0aG91dFgpID0gcGFydGl0aW9uICgo
PT14KS4gaGVhZCkgbm9uX3ByaW1lcwoJICAgbXVsdCB4ICAgICAgICAgICA9IGl0ZXJhdGUgKCt4
KSAoeCt4KQoKCi0tIDMuIHZlcnNpb246IHByaW1lcyBhbmQgbm9uLXByaW1lcyBhcmUgbXV0dWFs
bHkgcmVjdXJzaXZlLgoKcHJpbWVzJycKICAgID0gMiA6IGRpZmYgWzMuLl0gbm9uX3ByaW1lcwoK
bm9uX3ByaW1lcyAKICAgID0gbWVyZ2UgKG1hcCBtdWx0IHByaW1lcycnKSAKICAgICAgd2hlcmUg
CiAgICAgIG11bHQgeCAgID0gaXRlcmF0ZSAoK3gpICh4K3gpICAgICAgCgptZXJnZSAoKHg6eHMp
OnJlc3QpCiAgICA9IHggOiBtZXJnZSAocmVhcnJhbmdlICh4czpyZXN0KSkKCnJlYXJyYW5nZSBs
QCh4bEAoeDp4cyk6KHk6eXMpOnJlc3QpIAogICAgfCB4IDw9IHkgICAgID0gbAogICAgfCBvdGhl
cndpc2UgID0gKHk6eGwpIDogcmVhcnJhbmdlICh5czpyZXN0KSAKCi0tIHNldCBkaWZmZXJlbmNl
IGZvciBvcmRlcmVkIGxpc3RzIC0gcmVzdWx0IGlzIGFsc28gb3JkZXJlZDoKZGlmZiA6OiBPcmQg
YSA9PiBbYV0gLT4gW2FdIC0+IFthXQpkaWZmIHhsQCh4OnhzKSB5bEAoeTp5cykgCiAgICB8IHgg
PCAgeSA9IHggOiBkaWZmIHhzIHlsCiAgICB8IHggPT0geSA9ICAgICBkaWZmIHhzIHlsCiAgICB8
IHggPiAgeSA9ICAgICBkaWZmIHhsIHlzCgoKLS0gNC4gdmVyc2lvbiwgbGlrZSAyLiwgYnV0IHVz
ZXMgYSB0cmVlIHRvIG1hbmFnZSBub24tcHJpbXNlOgoKcHJpbWVzJycnCiAgICA9IG1rUHJpbWVz
IEwgWzIuLl0gCiAgICAgIHdoZXJlCiAgICAgICBta1ByaW1lcyBub25fcHJpbWVzICh4OnhzKSAK
CSAgIHwgbnVsbCB3aXRoWCA9IHggOiBta1ByaW1lcyAodGluc2VydCAobXVsdCB4KSBub25fcHJp
bWVzKSAgICAgICAgICAgICB4cwoJICAgfCBvdGhlcndpc2UgID0gICAgIG1rUHJpbWVzIChmb2xk
ciB0aW5zZXJ0IHdpdGhvdXRYIChtYXAgdGFpbCB3aXRoWCkpIHhzCgkgICB3aGVyZQoJICAgKHdp
dGhYLHdpdGhvdXRYKSA9IHRwYXJ0aXRpb24gW3hdIG5vbl9wcmltZXMgCgkgICBtdWx0IHggICAg
ICAgICAgID0gaXRlcmF0ZSAoK3gpICh4K3gpCgotLSBhIGJpbmFyeSB0cmVlOgoKZGF0YSBUcmVl
ID0gTiBbSW50ZWdlcl0gVHJlZSBUcmVlIHwgTCBkZXJpdmluZyBTaG93CgotLSBydWxlcyBmb3Ig
cGxhY2luZyBpbnRlZ2VyIGxpc3RzOgoKbGVmdG9mLCByaWdodG9mIDo6IFtJbnRlZ2VyXSAtPiBU
cmVlIC0+IEJvb2wKCmxlZnRvZiAgKHg6eHMpIChOICh5OnlzKSBfIF8pID0geCA8PSB5CnJpZ2h0
b2YgKHg6eHMpIChOICh5OnlzKSBfIF8pID0geCA+IHkKCi0tIHJ1bGUgZm9yIG1hdGNoaW5nIGlu
dGVnZXIgbGlzdHM6CgptYXRjaGVzIDo6IFtJbnRlZ2VyXSAtPiBUcmVlIC0+IEJvb2wKbWF0Y2hl
cyAoeDp4cykgKE4gKHk6eXMpIF8gXykgPSB4ID09IHkKCi0tIGluc2VydGlvbjoKCnRpbnNlcnQg
OjogW0ludGVnZXJdIC0+IFRyZWUgLT4gVHJlZQp0aW5zZXJ0IHhsICAgTCA9IE4geGwgTCBMCnRp
bnNlcnQgeGwgdEAoTiB5bCB0MSB0MikgCiAgICB8IHhsIGBsZWZ0b2ZgICB0ID0gTiB5bCAodGlu
c2VydCB4bCB0MSkgdDIKICAgIHwgeGwgYHJpZ2h0b2ZgIHQgPSBOIHlsIHQxICh0aW5zZXJ0IHhs
IHQyKQoKLS0gZXh0cmFjdGlvbiAmIHJlbW92YWwgaW4gb25lIHN0ZXA6Cgp0cGFydGl0aW9uIDo6
IFtJbnRlZ2VyXSAtPiBUcmVlIC0+IChbW0ludGVnZXJdXSxUcmVlKQp0cGFydGl0aW9uIHhsIEwg
PSAoW10sTCkKdHBhcnRpdGlvbiB4bCB0QChOIHlsIHQxIHQyKSAKICAgIHwgeGwgYG1hdGNoZXNg
IHQgID0gIGxldCAoYSxiKSA9IHRwYXJ0aXRpb24nIHhsIHQxIGluICh5bDphLCByZW1vdmUgYiB0
MikKICAgIHwgeGwgYGxlZnRvZmAgIHQgID0gIGxldCAoYSxiKSA9IHRwYXJ0aXRpb24geGwgdDEg
aW4gKGEsIE4geWwgYiB0MikKICAgIHwgeGwgYHJpZ2h0b2ZgIHQgID0gIGxldCAoYSxiKSA9IHRw
YXJ0aXRpb24geGwgdDIgaW4gKGEsIE4geWwgdDEgYikKCnRwYXJ0aXRpb24nIHhsIEwgPSAoW10s
TCkgICAgICAtLSBjaGVjayBmb3IgbW9yZSBtYXRjaGVzCnRwYXJ0aXRpb24nIHhsIHRAKE4geWwg
dDEgdDIpIAogICAgfCB4bCBgbWF0Y2hlc2AgdCAgPSAgbGV0IChhLGIpID0gdHBhcnRpdGlvbicg
eGwgdDEgaW4gKHlsOmEsIHJlbW92ZSBiIHQyKQogICAgfCBvdGhlcndpc2UgICAgICAgPSAoW10s
dCkKCnJlbW92ZSBMICB0MiAgPSB0MgpyZW1vdmUgdDEgdDIgPSBsZXQgKGEsYikgPSByaWdodG1v
c3QgdDEgaW4gTiBhIGIgdDIKCnJpZ2h0bW9zdCAoTiB5bCB0MSAgTCkgPSAoeWwsdDEpCnJpZ2h0
bW9zdCAoTiB5bCB0MSB0MikgPSBsZXQgKGEsYik9cmlnaHRtb3N0IHQyIGluIChhLCBOIHlsIHQx
IGIpCiAgCgotLSB0ZXN0IGNvcnJlY3RuZXNzCgpwZGlmZiA9IFsgKGEsYixjLGQpIHwgCgkgKGEs
YixjLGQpPC16aXA0IHByaW1lcyBwcmltZXMnIHByaW1lcycnIHByaW1lcycnJywgCgkgYSAvPSBi
IHx8IGIgLz0gYyB8fCBjIC89IGQgCgkgXQoKCgoKCgoKCgo=

--_=XFMail.1.3.p0.Linux:001217195636:327=_--
End of MIME message