[Haskell-cafe] How to improve the running time of my algorithm

William Yager will.yager at gmail.com
Sun Jan 14 21:47:38 UTC 2018

```Hello Dominik,

I'm not sure what exactly your algorithm is, but one thing that stands out
to me is the use of (Text,Text) index pairs instead of something more
efficient.

Here is an algorithm that I wrote which (without any optimization tricks)
seems to be fast and correct: https://pastebin.com/rDxUFbt3

Dynamic Programming in Haskell is a pleasure due to laziness, especially if
the choice variables are dense in the integers.

In this case, I have a Vector (Vector Bool), where the outer vector is
indexed by position into the lowercase string and the inner vector is
indexed by position into the uppercase string. vec ! i ! j is True iff
there is a solution to the problem for the first i characters in the
lowercase string and the first j characters in the uppercase string.
Because Vectors are lazy (unless you use one of the packed varieties) you
can assign each vector element the value corresponding to its solution -
before you even know what the solution is!

To get the final solution, you simply look at the vector element
corresponding to using the entirety of both strings. This will force
evaluation of the last cell, which will force the evaluation of some other
cells, which will force the evaluation of some other cells, etc. etc. If a
cell is ever accessed more than once, it still only gets computed one time,
so we have memoization.

This form is a little weird and took me a while to get the first time I saw
it, but I was delighted when I fully understood it.

If your choice variables are not dense in the integers, you can do the same
approach using a memo-trie, although there is a constant factor performance
loss compared to vectors.

--Will

On Sun, Jan 14, 2018 at 3:14 PM, Dominik Bollmann <dominikbollmann at gmail.com
> wrote:

>
>
> While playing with dynamic programming problems, I've been trying to
> solve the "Abbreviation" problem found on hackerrank.com at
> https://www.hackerrank.com/challenges/abbr/problem.
>
> Briefly, this problem asks to decide whether a source string s can be
> abbreviated into a target string t by capitalizing some of the
> characters in s and deleting its afterwards remaining lowercase
> characters. For example, the string s = "aBbdD" can be abbreviated as
> target t = "BBD", but target t' = "XYZZ" is not an abbreviation for
> source s' = "xyz".
>
> My solution to this problem is the following memoization-based
> function `isAbbreviation`:
>
>
> ```
> import Data.Char
> import Data.Map.Strict (Map)
> import qualified Data.Map.Strict as Map
> import Data.Maybe
> import Data.String
> import Data.Text (Text)
> import qualified Data.Text as Text
> import System.IO
>
> type Store = Map (Text, Text) Bool
>
> isAbbrMemo :: Text -> Text -> State Store Bool
> isAbbrMemo s t
>   | Text.null t = extend s t \$ return (Text.all isLower s)
>   | Text.null s = extend s t \$ return False
>   | otherwise   =
>       let (a, as) = fromJust \$ Text.uncons s
>           (b, bs) = fromJust \$ Text.uncons t
>       in extend s t \$ matches a as b bs
>   where
>     matches a as b bs
>       | isLower a && toUpper a /= b = isAbbrMemo as (b `Text.cons` bs)
>       | isLower a && toUpper a == b = (||) <\$> isAbbrMemo as bs
>                                            <*> isAbbrMemo as (b
> `Text.cons` bs)
>       | isUpper a && a /= b         = return False
>       | isUpper a && a == b         = isAbbrMemo as bs
>
> extend :: Text -> Text -> State Store Bool -> State Store Bool
> extend s t m = do
>   st <- get
>   case Map.lookup (s,t) st of
>     Just v  -> return v
>     Nothing -> do
>       v <- m
>       modify \$ Map.insert (s,t) v
>       return v
>
> isAbbreviation :: Text -> Text -> Bool
> isAbbreviation s t = evalState (isAbbrMemo s t) Map.empty
>
> main :: IO ()
> main = do
>   let answers = map yesNo \$ map (uncurry isAbbreviation) queries
>
> yesNo :: Bool -> String
> yesNo True  = "YES"
> yesNo False = "NO"
>
> readQueries :: IsString a => Handle -> IO [(a, a)]
>   numQueries <- read <\$> hGetLine h :: IO Int
>   forM [1..numQueries] \$ \_qid -> do
>     s <- hGetLine h
>     t <- hGetLine h
>     return (fromString s, fromString t)
> ```
>
> However, running `isAbbreviation` on Hackerrank's input #13 still
> takes around 38 seconds on my machine and is therefore too slow to be an
> accepted solution. The input of question is attached as a text file.
>
> My question is therefore: Where could I further improve the running time
> of the function `isAbbreviation`? Is there any low-hanging fruit to
> improve upon? Or is my dynamic-programming based approach somehow
> flawed in general? (in which I should rather rethink the problem?)
>
> Any observations, remarks, and improvements on the above code snippet
> are greatly appreciated :-)
>
> Thanks, Dominik.
>
>
> _______________________________________________