laziness in `length'

Daniel Fischer daniel.is.fischer at web.de
Mon Jun 14 10:51:22 EDT 2010


On Monday 14 June 2010 16:25:06, Serge D. Mechveliani wrote:
> Dear people and GHC team,
>
> I have a naive question about the compiler and library of  ghc-6.12.3.
> Consider the program
>
>   import List (genericLength)
>   main = putStr $ shows (genericLength [1 .. n]) "\n"
>          where
>          n = -- 10^6, 10^7, 10^8 ...
>
> (1) When it is compiled under  -O,  it runs in a small constant space
>     in  n  and in a time approximately proportional to  n.
> (2) When it is compiled without -O,  it takes at the run-time the
>     stack proportional to  n,  and it takes enormousely large time
>     for  n >= 10^7.
> (3) In the interpreter mode  ghci,   `genericLength [1 .. n]'
>     takes as much resource as (2).
>
> Are the points (2) and (3) natural for an Haskell implementation?
>
> Independently on whether  lng  is inlined or not, its lazy evaluation
> is, probably, like this:
>  lng [1 .. n] =
>  lng (1 : (list 2 n)) =  1 + (lng $ list 2 n) =
>  1 + (lng (2: (list 3 n))) = 1 + 1 + (lng $ list 3 n) =
>  2 + (lng (3: (list 4 n)))   -- because this "+" is of Integer
>  = 2 + 1 + (lng $ list 4 n) =
>  3 + (lng $ list 4 n)
>  ...
> And this takes a small constant space.

Unfortunately, it would be

lng [1 .. n]
~> 1 + (lng [2 .. n])
~> 1 + (1 + (lng [3 .. n]))
~> 1 + (1 + (1 + (lng [4 .. n])))
~>

and that builds a thunk of size O(n).

The thing is, genericLength is written so that for lazy number types, the 
construction of the result can begin before the entire list has been 
traversed. This means however, that for strict number types, like Int or 
Integer, it is woefully inefficient.

In the code above, the result type of generic length (and the type of list 
elements) is defaulted to Integer.
When you compile with optimisations, a rewrite-rule fires:

-- | The 'genericLength' function is an overloaded version of 'length'.  In
-- particular, instead of returning an 'Int', it returns any type which is
-- an instance of 'Num'.  It is, however, less efficient than 'length'.
genericLength           :: (Num i) => [b] -> i
genericLength []        =  0
genericLength (_:l)     =  1 + genericLength l

{-# RULES
  "genericLengthInt"     genericLength = (strictGenericLength :: [a] -> 
Int);
  "genericLengthInteger" genericLength = (strictGenericLength :: [a] -> 
Integer);
 #-}

strictGenericLength     :: (Num i) => [b] -> i
strictGenericLength l   =  gl l 0
              where
                gl [] a     = a
                gl (_:xs) a = let a' = a + 1 in a' `seq` gl xs a'

which gives a reasonabley efficient constant space calculation.

Without optimisations and in ghci, you get the generic code, which is slow 
and thakes O(n) space.

> Thank you in advance for your explanation,
>
> -----------------
> Serge Mechveliani
> mechvel at botik.ru



More information about the Glasgow-haskell-users mailing list