[Haskell-cafe] Space usage and CSE in Haskell
westondan at imageworks.com
Tue Jul 24 19:54:57 EDT 2007
I think I might not have been lazy enough to get proper memoization.
This might be needed:
firstNprimes :: Nat -> [Integer]
firstNprimes Zero = 
firstNprimes ( Succ $ Zero) =
let p = firstNprimes Zero in 2 : p
firstNprimes (Succ . Succ $ Zero) =
let p = firstNprimes (Succ $ Zero) in 3 : p
Dan Weston wrote:
> > But this simple modification allows us to use only O(sqrt(n)) space at
> > the point we print the nth prime:
> I wouldn't call your modification simple. It appears that you are trying
> to put smarts into the garbage collector and memoization logic, the
> first step towards a priority queue of memoized results.
> Suppose you had
> data Nat = Zero | Succ Nat
> firstNprimes :: Nat -> [Integer]
> firstNprimes Zero = 
> firstNprimes ( Succ $ Zero) = 2 : firstNprimes Zero
> firstNprimes (Succ . Succ $ Zero) = 3 : firstNprimes (Succ $ Zero)
> The resulting sublists should be shared, so that each memoized partial
> evaluation is just a head and a pointer, with space O(2*n).
> Suppose further you could tell the garbage collector to discard the
> highest Nat firstNprimes sublists first, forcing a recomputation
> whenever needed again.
> Then, assuming you use only the one (outer) primes function, your primes
> function (which needs all the firstNprimes) has the lowest priority and
> gets recalculated on memory exhaustion, but only back to the highest
> known prime, which will eventually (and forever thereafter) be the
> highest firstNprimes that fits in memory.
> The code uses the most memory it can for efficiency, then continues on
> maximally efficiently from there on the fly.
> This is the sort of control you are getting on the cheap with your
> non-trivial use of two primes functions. It is the kind of logic that
> might be difficult to automate.
> Dan Weston
> Melissa O'Neill wrote:
>> When advocating functional languages like Haskell, one of the claims
>> I've tended to make is that referential transparency allows the
>> language to be much more aggressive about things like common
>> subexpression elimination (CSE) than traditional imperative languages
>> (which need to worry about preserving proper side-effect sequencing).
>> But a recent example has left me thinking that maybe I've gone too far
>> in my claims.
>> First, lets consider a simple consumer program, such as:
>>> printEveryNth c l n = do print (c', x)
>>> printEveryNth c' xs n
>>> where c' = c+n
>>> x:xs = drop (n-1) l
>> Note that we can pass this function an infinite list, such as [1..],
>> and it won't retain the whole list as it prints out every nth element
>> of the list.
>> Now let's consider two possible infinite lists we might pass to our
>> consumer function. We'll use a list of primes (inspired by the recent
>> discussion of primes, but you can ignore the exact function being
>> computed). Here's the first version:
>>> primes = 2 : [x | x <- [3,5..], all (\p -> x `mod` p > 0)
>>> (factorsToTry x)]
>>> factorsToTry x = takeWhile (\p -> p*p <= x) primes
>> As you might expect, at the point where we print the nth prime from
>> our infinite list, we will be retaining a list that requires O(n) space.
>> But this simple modification allows us to use only O(sqrt(n)) space at
>> the point we print the nth prime:
>>> primes =
>>> 2 : [x | x <- [3,5..], all (\p -> x `mod` p > 0) (factorsToTry x)]
>>> slowerPrimes =
>>> 2 : [x | x <- [3,5..], all (\p -> x `mod` p > 0)
>>> (factorsToTry x)]
>>> factorsToTry x = takeWhile (\p -> p*p <= x) slowerPrimes
>> Notice the gigantic common subexpression -- both primes and
>> slowerPrimes define exactly the same list, but at the point where
>> we're examining the nth element of primes, we'll only have advanced to
>> the sqrt(n)th element of slowerPrimes.
>> Clearly, "simplifying" the second version of primes into the first by
>> performing CSE actually makes the code much *worse*. This
>> "CSE-makes-it-worse" property strikes me as "interesting".
>> So, is it "interesting"...? Has anyone worked on characterizing CSE
>> space leaks (and avoiding CSE in those cases)? FWIW, it looks like
>> others have run into the same problem, since bug #947 in GHC (from
>> October 2006) seems to be along similar lines.
>> P.S. These issues do make massive difference in practice. There is a
>> huge difference between taking O(n) and O(sqrt(n)) space -- the
>> difference between a couple of megabytes for the heap and tens or
>> hundreds of megabytes.
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
More information about the Haskell-Cafe