[Haskell-beginners] Simplifying code

Tue Feb 9 16:49:18 EST 2010

Am Dienstag 09 Februar 2010 21:11:55 schrieb edgar klerks:
> Hi All,
>
> I wrote a program which permutates a list words with some rules. For
> example if I feed it a list like:
>
> banana
> waterloo
> vraag
>
> It gives back the list:
>
> banana
> b at nana
> ban at na
> b at n@na
> banan@
> b at nan@
> ban at n@
> b at n@n@
> waterloo
> wa+erloo
> water|oo
> waterl0o
> etc
>
> However I have the feeling I am doing things to complicated. I am still
> a beginner. Would someone like to help me simplify somethings.

Sure. If you don't mind that the mutations come in a different order, one 
thing that works wonders is "sequence",

sequence :: Monad m => [m a] -> m [a]

In particular, for m = [], sequence :: [[a]] -> [[a]]. Then, knowing what 
sequence does, we can write

import Control.Monad (sequence)

generateAll :: String -> [String]
generateAll word = sequence (map f word)
      where
        f c = case lookup c leat of
                Just r  -> [c,r]
                Nothing -> [c]

For each letter in the word, we generate the list of all possible 
substitutions (map f),

"woot" ~> [['w'],['o','0'],['o','0'],['t','+']]

and then sequence them, choosing each combination of substitutions once.

A little more efficient than sequence is

generateAll :: String -> [String]
generateAll word = allCombs (map f word)
      where
        f c = case lookup c leat of
                Just r  -> [c,r]
                Nothing -> [c]

allCombs :: [[a]] -> [[a]]
allCombs (l:ls) = [h:t | t <- allCombs ls, h <- l]
allCombs [] = [[]]

-- sequence (l:ls) = [h:t | h <- l, t <- sequence ls]
-- with the generators in reverse order, cf.
-- http://www.haskell.org/pipermail/haskell-cafe/2009-December/070149.html

> If you think this is inappropriate please state also.

How could it be? This list is for helping people understand Haskell better, 
exactly what you're after.

> I am not offended then. I understand you are offering your
> spare time to help me.

If one thinks one's time isn't worth it, one can just ignore the post.

>
> The first thing I don't get is this. I recognize some things could be
> rewritten with a bind operator (because of the concat $ fmap), but I am
> puzzled how:
>
> mutateWords :: [Char] -> [[Char]]
> mutateWords word = nub.concat $ fmap snd <$> fmap unzip <$> ( foldr(\x z
> ->
>
>                         let char = snd x
>                             nm = number word
>                             lst = fst x
>                         in (insertAt char nm <$> lst) : z
>                 ) [[]] $ mw word )
>
>
>
>
> Here is the full code:
>
>
> import Data.List
> import System
> import System.IO
> import Control.Applicative
>
>
> ---CONFIG section
>
> leat = ['s' ==> '$', 't' ==> '+', 'l' ==> '|', 'o' ==> '0','e' ==> '3',
> 'a' ==> '@', 'v' ==> '^']
>
> leata = fst.unzip $ leat

leata = map fst leat
leatb = map snd leat

> leatb = snd.unzip $ leat
>
> -- Perl like assoc lists
> infixl 1 ==>
> a ==> b = (a, b)
>
>
> -- Flipped fmap sometimes nicer
> infixl 4 <$$>
>
> xs <$$> f = f <$> xs
>
>
> -- first I need to find  the positions of the mutatable charachters.

No, you don't need to do that, it's in general more efficient to not care 
about positions when dealing with lists.

> findPositions :: [Char] -> [[Int]]
> findPositions xs = take (length index) $ index <*> [xs]
>         where index = elemIndices <$> leata

[f1, ..., fm] <*> [x1, ..., xn]

produces a list of length m*n, so

length (index <*> [xs]) == length index * length [xs] == length index

~> remove "take (length index) $"

>
> -- And generate all subsequences
> findSubSeq :: [Char] -> [[[Int]]]
> findSubSeq  = fmap subsequences <$> findPositions
>
>
> -- Only change elements which needs to be changed
> insertAt :: Char -> [(Int, Char)] -> [Int] -> [(Int,Char)]
> insertAt c xs ps = xs <$$> (\x ->
>                 if (fst x) `elem` ps
>                         then (fst x , c)
>                         else x
>                 )
> -- Couples character to mutable positions
> mw word = (findSubSeq word) `zip` leatb
>
> number = zip [0..]
>
> mutateWords :: [Char] -> [[Char]]
> mutateWords word = nub.concat $ fmap snd <$> fmap unzip <$> ( foldr(\x z
> ->
>
>                         let char = snd x
>                             nm = number word
>                             lst = fst x
>                         in (insertAt char nm <$> lst) : z
>                 ) [[]] $ mw word )

Okay, I give up, that's too complicated :)
One general remark.
When you have an Ord instance, "nub" is an extremely bad idea (unless your 
lists are really short), as it's quadratic in the length of the list.

map head . group . sort

or

import Data.Set

toList . fromList

are much better [O(l * log l) where l = length xs]

>
> generateAll :: [Char] -> [[Char]]
> generateAll word = g lea $ mutateWords word
>     where   g 0 words = words
>                 g n words = g (n - 1) (nub  $  words >>= mutateWords )
>                 lea = length leata
> main = do
>         filename <- getArgs
>         wordlist <- readFile $ filename !! 0
>         let a = (words wordlist) >>= generateAll
>         mapM_ putStrLn  a