[Haskell-beginners] When, if ever, does Haskell "calculate once"?

Thu May 6 15:37:20 EDT 2010

On Thu, May 06, 2010 at 11:35:15AM -0700, Travis Erdman wrote:
> Two questions here, I think they might be related, perhaps even the same, but I am not sure, so I will ask both:
> 
> Q1:  Re the function f below, I like the first implementation as it's "cleaner", but is the second implementation necessary for performance purposes?
> 
> -- g = some CPU intensive function
> 
> -- alternate 1
> f a b = c + (g b)
>     where
>         c = dosomethingelse a (g b)
>         
> -- alternate 2        
> f a b = c + saveit
>     where
>         saveit = g b    
>         c = dosomethingelse a saveit

You need alternative 2.  In general, GHC (and, I imagine, other
Haskell compilers) do not do much common subexpression elimination,
because in some cases it can be a *pessimization*.  The classic example is

  length [1..1000000] + length [1..1000000]

vs

  let a = [1..1000000] in length a + length a

The first will execute in constant space, since each list will be
lazily generated as needed by the length function and then the garbage
collector will come along behind length and get rid of the nodes that
have already been processed.  However, in the second expression, the
garbage collector cannot get rid of the nodes that are already
processed by the first call to length, since the second call to length
still needs the list.  So the entire list [1..1000000] will end up
being in memory at once.

So, if you want to be sure that something is only computed once, you
must introduce the necessary sharing yourself by giving it a name.

> 
> 
> Q2:  Imagine that I might be creating a custom data structure.  I need to access both the data in that structure, but also "features" (don't know proper comp sci term) of the data structure.  For instance, consider a Haskell list.  The stuff in the list is the data, whereas the length of the list is what I am calling a "feature" here.  Some of my features might be quite a bit more computationally intensive than "length", and may be referenced multiple times in many different places in my code.  For performance purposes, should I consider modifying my data structure to embed these "features" as part of the data to ensure that it is only calculated once?  Or can I rely on Haskell to do that for me?  For instance, do I need to create the equivalent of:
> 
> data MyList a = MyList {mylist::[a], mylength::Int}

There's no magic going on here, if you call a function to compute some
complicated feature of a data structure multiple places in your code,
it will be computed multiple times, just like in any other language.
Caching the features you need as in the above example is a good idea
if the data structures won't change often, and you really do need the
features many times.

-Brent