[Haskell-beginners] [Haskell-Beginners] Laziness & referential transparency

Sat Jan 29 16:36:09 CET 2011

Daniel,

thanks for your explanations! For some time, I thaught laziness might be
some kind of holy grail. But now I realize that it takes both a lot of time
to understand properly and that it has more limitations than I expected.
Another example for this: I wrote a function

recursiveContents :: FilePath -> IO [FilePath]

and later on I thaught it was possible to run some actions on every file
from the list, in a lazy fashion. Then I had to find out that all files had
to be found before I could process the first one. By now, I think I
understand why, but it stops me from being too enthusiastic about laziness
:-(

Regards
Tim

2011/1/29 Daniel Fischer <daniel.is.fischer at googlemail.com>

> On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote:
> > Hi everybody,
> >
> > after reading a post by Don Stewart on
> > http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast<http://cgi.cse.unsw.edu.au/%7Edons/blog/2008/05/16#fast>,
> I was not really
> > satisfied with the necessity to rewrite
> >
> > mean xs = sum xs / fromIntegral (length xs)
> >
> > to an explicitly recursive function.
>
> We'd all love a compiler clever enough to transform simple high-level
> specifications into efficient code. But that's very hard and while we're
> waiting, we have to go lower-level sometimes.
>
> > I investigated a bit further and
> > got really confused after the following GHCi session:
> >
> > Prelude> length [1..10^8] + length [1..10^8]
> > 200000000
> > Prelude> let xs = [1..10^8]
> > Prelude> length xs + length xs
> > <interactive>: out of memory
> >
> > Can someone explain this
>
> Yes.
>
> Prelude> length [1 .. 10^8] + length [1 .. 10^8]
>
> creates two lists, each is generated lazily and garbage collected as it is
> consumed by length, so the entire calculation runs in constant space, just
> not particularly fast.
>
> Since you haven't turned off the monomorphism restriction in ghci
> (otherwise the computation would run in constant space),
>
> Prelude> let xs = [1 .. 10^8]
>
> defines a list xs :: [Integer] (the definition yields the constraints Num a
> and Enum a, per the monomorphism restriction, the type variable a defaults
> to Integer), which will be stored for later reuse.
>
> Prelude> length xs + length xs
>
> then starts calculating the first length. That means xs will be completely
> evaluated in the course of that (if there is sufficient memory). Since the
> list is bound to a name in scope, it doesn't get garbage collected, so with
> enough memory, you'll have a 100 million long list of Integers in your
> memory (I always forget how many words per cons cell are required, but
> that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB,
> depending on whether you're on a 32-bit or a 64-bit system, iirc).
> In your case, that exhausts your RAM, otherwise calculating the second
> length would be faster since the list need not be created a second time.
>
> With the monomorphism restriction turned off,
>
> Prelude> let xs = [1 .. 10^8]
> (0.04 secs, 9183720 bytes)
> Prelude> length xs + length xs
> 200000000
> (8.02 secs, 8033302172 bytes)
> Prelude> :t xs
> xs :: (Enum a, Num a) => [a]
>
> xs remains a polymorphic value, so doesn't get shared across computations
> (since each computation could use it at a different type).
>
>
> But.
> Trying such stuff at the ghci prompt doesn't always tell you much about how
> compiled code behaves.
> For example:
>
> module Length where
>
> val :: Int
> val = length [1 .. 10^8] + length [1 .. 10^8]
>
> compiled with -O, gives the core (other stuff snipped)
>
> Length.val :: GHC.Types.Int
> [GblId,
>  Str=DmdType,
>  Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
>         ConLike=False, Cheap=False, Expandable=False,
>         Guidance=IF_ARGS [] 6 2}]
> Length.val =
>  case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0
>  of ww_ayU { __DEFAULT ->
>  GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU)
>  }
>
> meaning that the value "length [1 .. 10^8]" is reused, thus giving only one
> list traversal (and the list can be garbage-collected as it is consumed).
>
> > and give a simple, reasonable rule by which I
> > can predict such things in the future?
>
> No. Just a few rules of thumb.
> If an expression gets a name (by a top level definition or a let/where
> binding), it is reused (unless polymorphism prohibits that, that's one of
> the reasons for the MR, without the MR, stuff expected to be shared will be
> recomputed).
> GHC does little common subexpression elimination, so if an expression
> contains two identical subexpressions, they may not be shared. Sharing is
> more likely if the subexpressions are simple and 'close together' (for very
> fuzzy values of simple and close).
> Unfortunately, one thing that GHC spots well is enumFromTo
> (sugared: [a .. b]), meaning that long lists are often shared if they
> shouldn't be for memory reasons.
>
> > Isn't it a pitfall?
>
> Yes, it is.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110129/b3a53c6e/attachment.htm>