[Haskell-beginners] [Haskell-Beginners] Laziness & referential transparency

Daniel Fischer daniel.is.fischer at googlemail.com
Sat Jan 29 16:18:31 CET 2011


On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote:
> Hi everybody,
>
> after reading a post by Don Stewart on
> http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast, I was not really
> satisfied with the necessity to rewrite
>
> mean xs = sum xs / fromIntegral (length xs)
>
> to an explicitly recursive function.

We'd all love a compiler clever enough to transform simple high-level 
specifications into efficient code. But that's very hard and while we're 
waiting, we have to go lower-level sometimes.

> I investigated a bit further and
> got really confused after the following GHCi session:
>
> Prelude> length [1..10^8] + length [1..10^8]
> 200000000
> Prelude> let xs = [1..10^8]
> Prelude> length xs + length xs
> <interactive>: out of memory
>
> Can someone explain this

Yes.

Prelude> length [1 .. 10^8] + length [1 .. 10^8]

creates two lists, each is generated lazily and garbage collected as it is 
consumed by length, so the entire calculation runs in constant space, just 
not particularly fast.

Since you haven't turned off the monomorphism restriction in ghci 
(otherwise the computation would run in constant space),

Prelude> let xs = [1 .. 10^8]

defines a list xs :: [Integer] (the definition yields the constraints Num a 
and Enum a, per the monomorphism restriction, the type variable a defaults 
to Integer), which will be stored for later reuse.

Prelude> length xs + length xs

then starts calculating the first length. That means xs will be completely 
evaluated in the course of that (if there is sufficient memory). Since the 
list is bound to a name in scope, it doesn't get garbage collected, so with 
enough memory, you'll have a 100 million long list of Integers in your 
memory (I always forget how many words per cons cell are required, but 
that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB, 
depending on whether you're on a 32-bit or a 64-bit system, iirc).
In your case, that exhausts your RAM, otherwise calculating the second 
length would be faster since the list need not be created a second time.

With the monomorphism restriction turned off,

Prelude> let xs = [1 .. 10^8]
(0.04 secs, 9183720 bytes)
Prelude> length xs + length xs
200000000
(8.02 secs, 8033302172 bytes)
Prelude> :t xs
xs :: (Enum a, Num a) => [a]

xs remains a polymorphic value, so doesn't get shared across computations 
(since each computation could use it at a different type).


But.
Trying such stuff at the ghci prompt doesn't always tell you much about how 
compiled code behaves.
For example:

module Length where

val :: Int
val = length [1 .. 10^8] + length [1 .. 10^8]

compiled with -O, gives the core (other stuff snipped)

Length.val :: GHC.Types.Int
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 6 2}]
Length.val =
  case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0
  of ww_ayU { __DEFAULT ->
  GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU)
  }

meaning that the value "length [1 .. 10^8]" is reused, thus giving only one 
list traversal (and the list can be garbage-collected as it is consumed).

> and give a simple, reasonable rule by which I
> can predict such things in the future?

No. Just a few rules of thumb.
If an expression gets a name (by a top level definition or a let/where 
binding), it is reused (unless polymorphism prohibits that, that's one of 
the reasons for the MR, without the MR, stuff expected to be shared will be 
recomputed).
GHC does little common subexpression elimination, so if an expression 
contains two identical subexpressions, they may not be shared. Sharing is 
more likely if the subexpressions are simple and 'close together' (for very 
fuzzy values of simple and close).
Unfortunately, one thing that GHC spots well is enumFromTo 
(sugared: [a .. b]), meaning that long lists are often shared if they 
shouldn't be for memory reasons.

> Isn't it a pitfall?

Yes, it is.




More information about the Beginners mailing list