[Haskell-cafe] Confused about Cyclic struture
Bulat Ziganshin
bulatz at HotPOP.com
Sat Jul 9 05:12:39 EDT 2005
Hello Dinh,
Friday, July 08, 2005, 9:12:22 PM, you wrote:
DTTA> Another question, it's said in the book that using cyclic structure (like
DTTA> ones = 1:ones) , the list would be represented by a fixed amount of memory.
DTTA> Does it mean [1,1,1......] only occupy one cell of memory ?
DTTA> How about in " take 100 [1,1,...] " ?
in order to understand how Haskell datastructures uses memory, you
must remember that Haskell does LAZY evaluation. it means that
any data represented as CLOSURE - pointer to function which
upon call will compute this data, or VALUE - computed data itself. LAZY
avaluation means that all data computed only when needed - in the time
of USING them instead of time of ASSIGNMENT as in all other STRICT
languages (Pascal, C++, ML and so on). when the data used FIRST TIME,
function stored in its closure is evaluated and computed result
replaces pointer to function
if computed result is not a plain value (Int/Double/Char), but a
complex structure (list, tree and so on) it will be not computed
entirely, but only the highest level of this structure. all the
references to substructures will be represented by the closures -
again. so one function call (CLOSURE) representing all the
datastructure after evaluation will be replaced by cell which
represents highest level of structure and contains references to
closures representing next level and these closures will remain
unevaluated until the time they are actually used
when you use (consume) data in list, you can do it only sequentially.
each evaluated list cell contains two links - to value in this list element and
to tail of list (it is the list again). when you consume next list
element, closure that represents it is evaluated and replaced by cell
containing two abovementioned links. when you define "ones=1:ones" this
variable will contain unevaluated closure, which would call function
"(:)" with two arguments - "1" and "ones". when you start consuming
"ones", this closure will be evaluated and replaced by its result - cell
containing links to "1" and to closure representing "ones", i.e. this
cell itself! so, subsequent consuming of next list elements will goes
thorough this cyclic structure without any additional evaluations
"take 100 ones" itself will occupy just one memory cell (of 3
elementary values - link to function and to its 2 arguments) - because
it, like anything else, represented as CLOSURE. when you goes to
consume this value, you will do it sequentially. let's see at take
definition:
take 0 xs = []
take n (x:xs) = x : take (n-1) xs
when you consume first element of "take 100 ones", this function call
will be replaced by "1:take 99 ones", i.e. cell which contain
reference to "1" and to closure "take 99 ones". as you see, we can't
create cycle here just because "take 99 ones" is not the same as "take
100 ones" :-)
more thorough, when we compute "take 100 somelist", matching
"somelist" to "x:xs" will DECOMPOSE first element of list. evaluating
"x : take (n-1) xs" will COMPOSE new cell which contains links to "x"
and unevaluated closure "take (n-1) xs". so, recursive evaluation of
"take" will build new list - element-by-element, but this list will
contain links to the same elements (represented as closures) as
original list. even if you define
take 0 xs = xs
take n (x:xs) = x : take (n-1) xs
so that "take 100 xs" will return just the same list, Haskell on
evaluation of "take" will rebuild first 100 elements of list and only
after this will share the list remainder (of couse, i don't count for
possible smart compiler's optimization)
give attention to that while "take" will rebuild structure of list, it
will use links to list elements as is. this list can contain complex
elements - for example, another lists, but links to this sublists
(again represented with evaluated values or unevaluated closures) will be
just copied to new list. for example, "take 3 [ones, [1..1000000], map
(2*) ones, ones]" will only create 3 new cells which will share
pointers to sublists with original list. it is possible because in
Haskell data value never changed - it can be only represented as
unevaluated or evaluted, so we can share subelements of datastructure
without any risk to encounter a side behaviour
next, "take 100 ones" will require memory for all 100 elements only
when it is fully evaluated. but not all data necessarily goes to this
state. for example, this data may be completely ignored in
computation, or just a few elements will be evaluated as in "take 3
[1..100]". moreover, evaluated elements of this list may be
immediately consumed without creating actual cells, as in "sum
[1..100]". on the other side, Haskell compilers try to not re-evaluate
datastructures and when your usage of variable don't goes into a
"producer-consumer" pattern, evaluated closures are replaced with
their values, so on the next use of same variable it will be already
evaluated (to degree that was needed for previous usage). run the
following program and see at delays between numbers printed:
main = do let n = 12345678
let xs = [1, sum [1..n], 1, sum [1..n-1]]
print$ sum (take 1 xs)
print$ sum (take 2 xs)
print$ sum (take 3 xs)
print$ sum (take 4 xs)
you can also see how many memmory this program require to run - under
"GHC -O2" it allocated 2.7 gb in heap but all this memory was
immediately reused so maximum memory usage will be only 5 kb
so i can say that possibility to create cyclic datastructures in
Haskell is basically the same as in C, Pascal or any other language
having poiners/references/whatever. the real difference is what you do
this not by ASSIGNMENTS but by declaration of structure itself without any
problems with referring to structures that will be declared later:
let a = 1:b
b = 2:c
c = 3:a
powers_of_2 = 1 : map (2*) powers_of_2
fibonachi = 1 : 1 : zipWith (+) fibonachi (tail fibonachi)
sharing of parts of datastructures are also possible (and easy) in imperative
languages. but SAFE sharing is possible only in Haskell
just because datastructures never change their values, so we never
need to copy datastructure just to give the same value to another
variable. as a result, it is widely used in function definitions and
even in compiler optimisations
hope this will help :)
--
Best regards,
Bulat mailto:bulatz at HotPOP.com
More information about the Haskell-Cafe
mailing list