[Haskell-cafe] Re: Difference lists and ShowS

Wed Jan 9 16:18:17 EST 2008

Albert Y. C. Lai wrote:
> apfelmus wrote:
>> I don't know a formalism for easy reasoning about time in a lazy 
>> language. Anyone any pointers? Note that the problem is already 
>> present for difference lists in strict languages.
> 
> http://homepages.inf.ed.ac.uk/wadler/topics/strictness-analysis.html
> 
> especially "strictness analysis aids time analysis".

Ah, of course, thanks. Together with

   D. Sands. Complexity Analysis for a Lazy Higher-Order Language.
   http://citeseer.ist.psu.edu/291589.html

for the higher-order case, a satisfactory analysis can be put together.

The formalism is basically as follows: for a function f, let f^T denote 
the time needed to execute it to weak normal form given that it's 
arguments are already in weak normal form. Weak normal form = full 
normal form for algebraic values and lambda-abstraction for functions = 
what you'd expect in a strict language. Plain values = nullary 
functions. For instance

   (++)^T []     ys = 1 + ys^T = 1
   (++)^T (x:xs) ys = 1 + (x:(xs ++ ys))^T
                    = 1 + (++)^T xs ys + xs^T + ys^T  -- (:) is free
                    = 1 + (++)^T xs ys
  ==>
   (++)^T xs ys = O(length xs)

Substituting a function application by the function body is counted as 1 
time step, that's where the  1 +  comes from.

For difference lists, we have

   (.)^T f g = O(1)

since it immediately returns the lambda-abstraction  \x -> f(g x) . Now, 
we missed something important about difference lists namely the function

   toList f = f []

that turns a difference list into an ordinary list and this function is 
O(n). In contrast, The "pendant" for ordinary lists, i.e. the identity 
function, is only O(1). Why is it O(n)? Well, (.) itself may be O(1) but 
it constructs a function that needs lots of time to run. In particular

   (f . g)^T [] = ((\x->f (g x))[])^T
                = 1 + (f (g []))^T
                = 1 + f^T (g []) + (g [])^T
                = 1 + f^T (g []) + g^T []

So, to analyze higher-order functions, we simply have to keep track of 
the "size" of the returned functions (more precisely, Sands uses 
"cost-closures"). The above reduces to

   (f . g)^T [] = 1 + f^T [] + g^T []

Since our difference lists don't care of what they are prepended to

   f^T xs = f^T []

Cheating a bit with the notation, we can write

   toList^T (f . g) = 1 + toList^T f + toList^T g

This means that a difference list build out of  n  elements by  m 
applications of (.) will take  O(n + m) time. This is the same as O(m) 
because m >= n , our lists are concatenations of singletons. That's not 
O(n) as anticipated, but it's alright: a concatenation of  m  empty 
lists is empty but clearly takes O(m) time, so the number of 
concatenations matters.

Since difference lists offer such a good concatenation, why not replace 
ordinary lists entirely? Well, the problem is that we have another 
function that should run fast, namely  head . For ordinary lists,

   head^T xs = O(1)

but for difference lists, we have

   (head . toList)^T f = O(m) which is >= O(n)

in the worst case, lazy evaluation notwithstanding. How to analyze lazy 
evaluation? Wadler's approach is to add an extra argument to every 
expression which says how much of the expression is to be evaluated. 
This extra information can be encoded via projections. But I think it's 
sufficient here to let (head expr)^T symbolize the time to reduce  expr 
  to weak head normal form. For example,

   (head . toList)^T (f . g) = 1 + (head . toList)^T f

assuming that  f  is nonempty. But due to the  1 + , any left-nested 
composition like

   head . toList $ (((a . b) . c) . d) . e

still needs O(m) time. So, difference lists are no "eierlegende 
wollmilchsau" either.

Regards,
apfelmus