[Haskell-cafe] Re: Longest increasing subsequence

Fri Apr 11 15:33:33 EDT 2008

My late night suggestions were nearly correct.  I have actually written the code 
now.  Once keeping track of indices, and a second time without them:

> {-# LANGUAGE BangPatterns #-}
> -- By Chris Kuklewicz, copyright 2008, BSD3 license
> -- Longest increasing subsequence
> -- (see http://en.wikipedia.org/wiki/Longest_increasing_subsequence)
> import Data.List (foldl')
> import Data.Map (Map)
> import qualified Data.Map as M (empty,null,insert,findMin,findMax
>                                ,splitLookup,deleteMin,delete)
> 
> type DList a = [a] -> [a]
> 
> lnds :: Ord a => [a] -> [a]
> lnds = lnds_decode . lnds_fold
> 
> lnds_fold :: Ord a => [a] -> Map a (DList a)
> lnds_fold = foldl' process M.empty where
>   -- The Map keys, in sorted order, are the input values which
>   --   terminate the longest increasing chains of length 1,2,3,…
>   process mu x =
>     case M.splitLookup x mu of
>       (_,Just {},_) -> mu -- ignore x when it is already an end of a chain
> 
>       (map1,Nothing,map2) | M.null map2 ->
>         -- insert new maximum element x
>         if M.null mu
>           then M.insert x (x:) mu -- x is very first element
>           else let !xs = snd (M.findMax mu)
>                in M.insert x (xs . (x:)) mu
> 
>                           | M.null map1 ->
>         -- replace minimum element with smaller x
>         M.insert x (x:) (M.deleteMin mu)
> 
>                           | otherwise ->
>         -- replace previous element oldX with slightly smaller x
>         let !xs = snd (M.findMax map1)
>             !oldX = fst (M.findMin map2) -- slightly bigger key
>             !withoutOldX = M.delete oldX mu
>         in M.insert x (xs . (x:)) withoutOldX
> 
> lnds_decode :: Ord a => Map a (DList a) -> [a]
> lnds_decode mu | M.null mu = []
>                | otherwise = snd (M.findMax mu) []
> 
> tests =  [ ['b'..'m'] == (lnds $ ['m'..'s'] ++ ['b'..'g'] ++ ['a'..'c'] ++ ['h'..'k'] ++ ['h'..'m'] ++ ['d','c'..'a'])
>          , "" == lnds ""
>          , "a" == lnds "a"
>          , "a" == lnds "ba"
>          , "ab" == lnds "ab"
>          ]

Comparing to wikipedia:
   The X[M[1]],X[M[2]],… sequence is strictly increasing.  These are the ends of 
the current increasing chains of length 1,2,… and they are the keys to the Map 
in my code.

   The values of the map are the subsequences themselves, in DList form. 
Instead of pointing to the index of the previous element I just lookup '!xs' and 
append '(x:)' to that.

Complexity:
   The strictness annotations ensure that the garbage collector can destroy any 
unreachable DList entries.  The space usage is thus O(N) and may be O(1) for 
certain inputs (such as the best case of never-increasing input list).  A 
strictly increasing input list is the worst case for space usage.

The naive time complexity of 'process' for the i'th input value is O(log i). 
This can be double checked by looking at the time complexity of everything I 
import from Data.Map.

Peak performance could be had by
   (1) adding the first element before the foldl' to avoid checking for this 
case in process
   (2a) accessing the internal map structure to optimize the 
splitLookup->delete->insert case into a single operation
   (2b) Using something like a zipper to access the to-be-deleted-and-replaced 
element of the map
The (2a) and (2b) work because we know the changed key will go into the same 
'slot' of the map as the old one.

-- 
Chris