[Haskell-beginners] When, if ever, does Haskell "calculate once"?

Thu May 6 15:30:57 EDT 2010

On Thursday 06 May 2010 20:35:15, Travis Erdman wrote:
> Two questions here, I think they might be related, perhaps even the
> same, but I am not sure, so I will ask both:
>
> Q1:  Re the function f below, I like the first implementation as it's
> "cleaner", but is the second implementation necessary for performance
> purposes?
>
> -- g = some CPU intensive function
>
> -- alternate 1
> f a b = c + (g b)
>     where
>         c = dosomethingelse a (g b)
>

A Haskell implementation *might* calculate (g b) only once, but very 
probably it will calculate it twice.

> -- alternate 2       
> f a b = c + saveit
>     where
>         saveit = g b   
>         c = dosomethingelse a saveit

A Haskell implementation is allowed to calculate saveit twice, but a decent 
one won't.

If you want to share the result of a computation, give a name to it.

One reason why sharing the common subexpression (g b) in the first snippet 
is not done is that it often is a terrible idea.
If, for example, g b is a large list, sharing hogs a lot of memory. If 
dosomethingelse and (+) consume that list in a suitable manner, only a 
small portion of it need be in memory at any given time.

Of course, if (g b) takes only very little memory (certainly if it is an 
Int, Word, ..., Integer [almost certainly]), automatic sharing would be a 
good idea.
However, looking for common subexpressions to share depending on their type 
would be even more complicated than looking for all common subexpressions, 
so probably nobody implemented it.

>
>
> Q2:  Imagine that I might be creating a custom data structure.  I need
> to access both the data in that structure, but also "features" (don't
> know proper comp sci term) of the data structure.  For instance,
> consider a Haskell list.  The stuff in the list is the data, whereas the
> length of the list is what I am calling a "feature" here.  Some of my
> features might be quite a bit more computationally intensive than
> "length", and may be referenced multiple times in many different places
> in my code.  For performance purposes, should I consider modifying my
> data structure to embed these "features" as part of the data to ensure
> that it is only calculated once?

Yes, unless these features eat too much memory (then recalculating might be 
cheaper).

If you need the features only for the big total, a wrapper analogous to 
MyList below is a good choice

data FeatureThing = FT { feature1, feature2 :: Int, feature3 :: Bool, ..., 
thing :: Thing }

, otherwise incorporate the features directly in Thing (like the size is 
incorporated in Set/Map).

> Or can I rely on Haskell to do that for me?

No, you can't rely on that.

> For instance, do I need to create the equivalent of:
>
> data MyList a = MyList {mylist::[a], mylength::Int}
>
>
>
> Thanks again for all your generous advice.