[Haskell-cafe] ANN: combinatorics

Sun Jan 29 23:30:34 CET 2012

On 1/29/12 5:48 AM, Roman Cheplyaka wrote:
> * wren ng thornton<wren at freegeek.org>  [2012-01-28 23:06:08-0500]
>> * Math.Combinatorics.Primes: provides the prime numbers via
>> Runciman's lazy wheel sieve algorithm. Provided here since efficient
>> algorithms for combinatorial functions often require prime numbers.
>> The current version memoizes the primes as an infinite list CAF,
>> which could lead to memory leaks in long-running programs with
>> irregular access to large primes. I'm looking into a GHC patch to
>> allow resetting individual CAFs from within compiled programs so that
>> you can explicitly decide when to un-memoize the primes. (In GHCi
>> when you reload a module all the CAFs are reset. However, there's no
>> way to access this feature from within compiled programs as yet.)
>
> Why not to make it more pure? That is, return a lazy list of Ints (but
> not a CAF), which user can throw away by the usual GC means?
>
> The functions from the other modules that use primes would have to be
> put in a Reader monad. That would make it a little bit more awkward to
> use, but personally I'd prefer that over memory leaks.

I'd also prefer a more pure solution, but I don't think that the Reader 
monad is the right tool here. I played around with that approach, but it 
requires extremely invasive changes to client code, obfuscating what 
should be simple mathematical formulae. And, it's too fragile, exposing 
the implementation in a way that breaks client code should I change a 
non-prime-using algorithm to a prime-using one, or vice versa. The 
fragility could be partially avoided by providing both prime-using and 
non-prime-using algorithms, but then that forces users to decide which 
one they want--- and if their only concern is performance, then they'd 
have to go through the code-breaking refactoring anyways, just to 
determine which is faster for their application.

One alternative I'm exploring is, rather than (only) making the primes 
not a CAF, instead make the prime-using functions non-CAFs as well. That 
is, provide a makePrimeUsingFunctions function which returns a 
record/tuple of all the functions, sharing a stream of primes. This way, 
after allocating the functions, they can be used purely just as in the 
current model; and when the client wants the primes to be GCed, they can 
drop their references to the allocated functions which use those primes 
(allocating new functions later, if necessary).

I've used that pattern before for similar resource issues, and it worked 
nicely. But how well it works depends a lot on the structure of the 
program needing those resources. In particular it can lead to needing to 
use the Reader monad (or similar) in order to pass around the functions, 
even though the functions themselves can be used purely. Unfortunately I 
don't have any large combinatorial programs on hand to assess whether 
this would be problematic in practice or not.

-- 
Live well,
~wren