[Haskell-cafe] Re: Newbie Haskell optimization question

Tue Nov 21 10:54:17 EST 2006

> generate'' :: LSystemRules -> LSystem -> Int -> LSystem
> generate'' rules axiom steps =
>    concatMap (iterate f axiom !!) (ind !! steps)
>    where
>      ind = [0] : [g x | x <- ind]
>          where
>            g [] = []
>            g [x] = [x, x + 1]
>            g xs = xs ++ g (drop (length xs `div` 2) xs)
>      f = concatMap $ \elem -> Data.Map.findWithDefault (\x -> [x])
> (fst elem) rules elem

I absolutely don't know what (ind) (hereby called the "index list")
means, but it can be improved (slightly, see note at the end).

Currently, you define

    ind = iterate g [0]
    (this is exactly the same as [0] : [g x | x <- ind])

and the first iterations of g yield

  [0]
  [0, 1]
  [0, 1, 1,2]
  [0, 1, 1,2, 1,2,2,3]
  [0, 1, 1,2, 1,2,2,3, 1,2,2,3,2,3,3,4]
  [0, 1, 1,2, 1,2,2,3, 1,2,2,3,2,3,3,4, 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5]
   ...

Clearly,

    g xs = xs ++ map (+1) xs

but your recursion scheme is more clever and exploits the fact that g is
iterated and most of the additions (+1) already have been calculated.

Yet it is unnecessary to generate all intermediate lists: by starting
the iteration at [1], you can generate all "differences":

    iterate g [1] ==
        [[1], [1,2], [1,2,2,3], [1,2,2,3,2,3,3,4], ...

and get the equivalent to (ind !! steps):

    ind steps = 0 : (concat . take steps . iterate g $ [1])

Concerning (drop (length xs `div` 2)), you can carry (n = length xs)
around so that you don't have to recalculate the length everytime.

Alternatively, you can even make the recursion structure explicit and
avoid the rescan involved with (drop) entirely

    data List a = Zero | One a | List a :++: List a

    g  x@(One k)      = x  :++: One (k + 1)
    g xs@(_ :++: xs') = xs :++: g xs'

    flatten' Zero         = id
    flatten' (One x)      = (x :)
    flatten' (xs :++: ys) = (xs ++) . (ys ++)

    flatten = flip flatten' []

    ind steps =
       0 : (flatten . foldr1 (:++:) . take steps . iterate g $ One 1)

Going even further, you can fuse the lindenmaier iteration into this
(starting the iteration of g at 0 again):

    generate rules axiom steps = flatten .
            flip (!!) (steps + 1) . iterate g $ Zero
        where
        g     Zero        = One axiom
        g  x@(One lsys)   = x  :++: One (f lsys)
        g xs@(_ :++: xs') = xs :++: g xs'

        f = ...

Look, the Ints are gone! (k + 1) only meant (f k)! The data structures
are lightweight and now perfectly fit the structure of the calculation.
This is the best you can do, for all languages: In C, the most native
data structure is (int) and pointer arithmetic is the way to win. In
LISP, lists are favored above all else. And in Haskell, all algebraic
data types are native and together with laziness, they pave the road to
speed.

As a last note, all these things are only able to improve a constant
factor (measured relative to the exponential length of the outcoming
index list) and only benchmarks can show which ones really improve
things. Constant factors are quite sensitive on how the code looks like.
Of course, "ghc -O2" is the best constant factor improver known :)

Regards,
apfelmus

PS: You can even improve upon the logarithmic factor that comes from
repeatedly using Data.Map by decorating each LSystemElement with its rule:

    data LSystemElement' = LSE LSystemElement LSystem'
    type LSystem'        = [LSystemElement']

    a l@[x,y] = LSE ('A',l) $
        if y <= 2
            then [a [x + 2, y + 2]]
            else [b [2], a [x - 1, y - 1]]
    b l@[x]   = LSE ('B',l) $
        if x <= 2
            then [c []]
            else [b [x - 1]]
    c []      = LSE ('C',[]) [c]

    axiom = [a [2,2]]

    generate = ...
        where
        ...
        f = concatMap (\(LSE _ s) -> s)

This decoration can be derived from LSystemRules if you want to keep
your traditional rules.