[Haskell-beginners] Re: Memoization

Mon Dec 1 05:02:08 EST 2008

abdullah abdul Khadir wrote:
> Hi,
>       I am a student and we had an assignment in Haskell. The question, was
> given a string of the form  "1-2+3*5-7+4-6+3" i.e., any sequence of integers
> as well as some operators between them we had to find a maximum possible
> value for the expression as well as the expression itself . So for maxval
> "1-2+3*5-7+4-6+3" it is (76,"(1-((2+3)*((5-(7+4))-(6+3))))"). The function
> we had to write was maxval :: String -> (Int,String). For further details on
> the question, have a look at our sir's web page
> here<http://www.cmi.ac.in/%7Emadhavan/courses/programming08/assignment6.txt>.
>
> I solved the question, we had to use memoization, and submitted the
> solution. It is given below. Now the problem is I am just wondering if it
> can be solved in a better manner. Translation : Is there some way in Haskell
> to do it in a more simpler way and as well as to reduce the number of lines
> of the program.

Yes, your solution can be made more beautiful. Let me show how.

First of all, we should separate *parsing* the input into a list of
numbers and operations from *computing* the result.

    maxval :: String -> (Int, String)
    maxval = compute . parse

In other words,  parse  extracts the numbers and arithmetic operations
from the input. One example implementation is

    type Op = Char

    parse :: String -> ([Int], [Op])
    parse [] = ([],[])
    parse s  = let (n,s2) = parseInt s in
        case s2 of
            []    -> ([n],[])
            op:s3 -> let (ns,ops) = parse s3 in (n:ns,op:ops)

    parseInt :: String -> (Int, String)
    parseInt s = (n, s')
        where
        (digits, s') = span isDigit s
        n            = foldl (\x c -> 10*x + fromDigit c) 0 digits
        fromDigit c  = ord c - ord '0'

but it's more complicated then necessary. We should at least use the
reads  functions from the  Prelude . And in any case, *parser
combinators* are the best way to parse something. But for now, the
straightforward way above shall suffice.

Second, we can considerably clarify things by defining new types. Our
first abstraction is the *expression*

    type Expr   = (Int,String)

    value :: Expr -> Int
    value = fst

which consists of a value and its textual representation. For instance,

  (20, "((6*3)+2)")
  (30, "(6*(3+2))")

are expressions. We can combine two expression by applying one of our
arithmetic operations both to the value and the textual representation

    applyExpr :: Op -> Expr -> Expr -> Expr
    applyExpr op (x,ex) (y,ey) =
        (f op x y, "(" ++ ex ++ [op] ++ ey ++ ")")
        where
        f '+' = (+)
        f '-' = (-)
        f '*' = (*)

Our main algorithm will choose maximal and minimal values from a set of
possible expressions. Therefore, we introduce the following type

    data MinMax = M Expr Expr deriving (Show)

which represents a range of values by recording expressions of minimal
and maximal value.

    maxexpr :: MinMax -> Expr
    maxexpr (M _ e) = e

We can merge two such ranges ("union") by choosing the lower first and
the higher second part:

    merge :: MinMax -> MinMax -> MinMax
    merge (M x y) (M x2 y2) = M emin emax
        where
        emin = if value x < value x2 then x else x2
        emax = if value y > value y2 then y else y2

    merges :: [MinMax] -> MinMax
    merges = foldr1 merge

    fromExpr :: Expr -> MinMax
    fromExpr e = M e e

Now, we also want to apply arithmetic expressions to these ranges. For
'+','-' and '*', the following function does the right thing:

    applyMinMax :: Op -> MinMax -> MinMax -> MinMax
    applyMinMax op (M x y) (M x2 y2) =
        merges [fromExpr (applyExpr op z z2) | z<-[x,y], z2<-[x2,y2]]

With these preliminaries, we can now express the algorithm. The main
ingredient is a function  (f :: Int -> Int -> MinMax)  that calculates
the range of possible values for expression that only utilize numbers
between the positions  i  and  j  in the list. And as common in dynamic
programming, we employ a memo table to store the intermediate results.

    compute :: ([Int], [Op]) -> Expr
    compute (xs,ops) = maxexpr (f 1 n)
        where
        n = length xs
        f = memoize n f'
        f' i j
            | i == j    = let x = xs !! (i-1)
                              e = (x,show x)
                          in fromExpr e
            | otherwise = merges [applyMinMax (ops !! (k-1))
                                  (f i k) (f (k+1) j)
                                 | k <- [i..(j-1)]]

Where exactly is the memo table? It's hidden in

    memoize :: Int -> (Int -> Int -> a) -> (Int -> Int -> a)
    memoize n f = \i j -> table ! (i,j)
        where
        table = array ((1,1),(n,n))
                 [((i,j), f i j) | i<-[1..n], j<-[1..n]]

which takes a function of two arguments from 1 to n and tabulates its
values in an array. (You need to  import Data.Array  for the arrays.) In
other words,  f  tabulates the results of  f'  which in turn uses the
tabulated values returned by  f  to compute its results. Thanks to lazy
evaluation, this "tabulate the result before it's available" works.

To summarize, the key points of the new solution are

  * Parse input.
  * Abstractions.
  * Memoization is a simple higher order function.

But there is more. Namely, there are many different ways to implement
the memoization. I used an array, you were asked to use a linked list.
The former is O(1) the latter O(n). There is a way to do it with plain
trees but still O(1), see also section 3 of

  Richard Bird and Ralf Hinze.
  "Trouble Shared is Trouble Halved"
  http://www.informatik.uni-bonn.de/~ralf/publications/HW2003.pdf

And there is even more! Namely, we knew that the problem is an instance
of dynamic programming, we knew the algorithm before implementing it.
But how to find the algorithm in the first place? Well, the usual answer
is "by thinking hard". However, there are very systemic ways to derive
dynamic programming algorithms from just the problem specification! In a
sense, much of the work of R. Bird centers this topic. The book "Algebra
of Programming"

http://web.comlab.ox.ac.uk/oucl/research/pdt/ap/pubs.html#Bird-deMoor96:Algebra

is one of the cornerstones.

The systematic derivation of dynamic programming algorithms has been
rediscovered in a more direct but less general fashion

   http://bibiserv.techfak.uni-bielefeld.de/adp/

Regards,
H. Apfelmus