[Haskell-cafe] Some random newbie questions

Sun Jan 9 14:02:25 EST 2005

>> I'm constantly surprised hearing from so many people about their space
>> problems. I cannot remember having space problems with my programs. I
>> don't know what everybody else is doing wrong :-)
>
> At least two common cases.
>
> Extracting compact data structures from large files.  The contents of
> the large file is read as a linked list (ugh) of pointers (double ugh)
> to 32-bit Chars (triple ugh) -- twelve times the size of the file, if
> my calculations are correct.  The contents can't be GC'ed before the
> extracted data is fully evaluated.  (Now if the file was an mmap'ed
> array, it wouldn't be so bad, perhaps in the next generation IO that
> people are discussing this will be easier?)
>
> Naive use of foldl.  I tend to think the default foldl should be
> strict (ie. replaced by foldl') -- are there important cases where it
> needs to be lazy?

Indeed, extracting a compact data structure from a large data structure
can easily cost much space because of lazy evaluation. "foldl" is probably
used mostly for that purpose. Me not having space problems is probably
related to the kind of programs I write. Most of my programs are program
transformations that take an abstract syntax tree as input and produce a
different abstract syntax tree (again a large structure). Trying to be
lazy then means trying to produce as much output as possible with
processing as little output as possible. More formally that means if there
is some partial input for a function such that for all completions of this
partial input to fully defined inputs all outputs of the function have a
common prefix, then the function should already yield this prefix as
output for the partial input.

In the example that I mentioned in my previous posting I did actually
originally compute size information for a large data structure, so did
extract something compact from something large. However, I then realised
that I only did some very basic arithmetic with the size information
before generating another large data structure of this computed size. So
then I decided to not to compute integers at all but do the arithmetic
directly on the algebraic data type. Gone was the space problem, without
using seq.

You might also want to look at Colin Runciman's paper "What about the
Natural Numbers?" in the Journal of Functional Programming in which he
argues for a type of lazy natural numbers, with the same semantics as data
Nat = Zero | Succ Nat. It fits much better for computing the size of a
lazy data structure.

I don't claim that all space problems can easily be dealt with.

Olaf