Another beginner's memory consumption problem...

Bayley, Alistair Alistair_Bayley at ldn.invesco.com
Wed Oct 1 17:09:11 EDT 2003


I'm trying the add-a-gram challenge from here:
http://www.itasoftware.com/careers/programmers-archive.php

... and I'm also experiencing runaway memory consumption. If I load the
supplied list of words (a 1.6M file) and search for shorter strings, things
are OK. Memory consumption increases dramatically as I search for longer and
longer trails. The function searchWord starts with a particular map and
searches the maps below it (i.e. those containing shorter words), so when I
search for longer trails, a larger number of maps must be examined. When I
search for short trails, only a few (2-3 say) of the maps are examined. I
think this implies some kind of lazy evaluation problem, but I'm not sure
where to go from here.

Was it a bad idea to use a Map of Maps as a data structure? Initially I had
a list of Maps, but this wouldn't even load the full file (so I suppose lazy
evaluation has allowed me to get a bit further...). Maps seemed like the
right data structures to use for this problem. Is there some way I could
force evaluation of the maps as the file is loaded? (Or is blaming the maps
a red herring?)


Program below:

> module Main where

> import Data.FiniteMap
> import Data.List
> import System.IO


Data structure is a Map of Maps. The top-level Map holds Maps indexed by Int
(for words of size n).
Each of these Maps holds words of that size, where a key is the sorted list
of chars for a word,
and the value is the list of words whose sorted characters are that key
i.e. all words constructed from the same set of characters.

> type WordMaps = FiniteMap Int WordMap
> type WordMap = FiniteMap String [String]
> type MyWord = String


ail + s = 
sail + n = 
nails + e = 
aliens + t = 
salient + r = 
entrails + c = 
clarinets + e = 
interlaces + d = 
CREDENTIALS (length 11)


mar + c = 
cram + h = 
march + s = 
charms + o = 
chromas + n = 
monarchs + i = 
harmonics + a = 
maraschino + n = 
ANACHRONISM (length 11) 

-----------------------------------------------------------


> addWordToMap :: String -> WordMap -> WordMap
> addWordToMap word m =
>   case lookupFM m key of
>     Just wlist -> addToFM m key (word:wlist)
>     Nothing -> addToFM m key [word]
>     where key = sort word


> addWord :: String -> WordMaps -> WordMaps
> addWord word maps = addToFM maps key newmap
>   where key = length word
>         newmap = addWordToMap word oldmap
>         oldmap = case (lookupFM maps key) of
>             Just m -> m
>             Nothing -> emptyFM


> populateMaps :: WordMaps -> IO WordMaps
> populateMaps maps = do
>   eof <- isEOF
>   if eof then return maps else do
>     word <- getLine
>     if (length word) > 0  -- test for blank lines
>       then populateMaps (addWord word maps)
>       else populateMaps maps


-----------------------------------------------------------


> getMap :: WordMaps -> Int -> WordMap
> getMap maps key = case (lookupFM maps key) of
>   Nothing -> emptyFM
>   Just m -> m

> showWordList :: [String] -> String
> showWordList = concat . intersperse ", "

> printTrail :: WordMaps -> [String] -> String
> printTrail _ [] = ""
> printTrail maps (w:ws) = (showWordList elts) ++ "\n" ++ (printTrail maps
ws)
>   where m = getMap maps (length w)
>         elts = lookupWithDefaultFM m [""] w


> removeEachChar :: String -> [String]
> removeEachChar word = removeEachChar' word (length word) []

> removeEachChar' :: String -> Int -> [String] -> [String]
> removeEachChar' []   _ list = list
> removeEachChar' _    0 list = list
> removeEachChar' word n list = removeEachChar' word (n-1) (newword:list)
>   where newword = (take (n-1) word) ++ (drop n word)


> unfussyHead :: [[String]] -> [String]
> unfussyHead [] = []
> unfussyHead l = head l


Given a word (and a trail so far) this will return either an empty trail,
or a trail to the first 3-char word that completes the search.
A trail is just the list of keys, shortest first.

> searchWord :: WordMaps -> [String] -> String -> [String]
> searchWord maps trail word
>   | length word <= 3 = word:trail
>   | otherwise =
>       case lookupFM m word of
>         Nothing -> []
>         Just _ -> unfussyHead $ filter ([] /=) $ map cont (removeEachChar
word)
>         where cont = searchWord maps (word:trail)
>               m = getMap maps (length word)


For each word (key) in a level, call searchWord.
The first to return a non-empty list is the winner.
If the list for this level is empty, then recurse with the next level down.

> searchLevel :: WordMaps -> Int -> [String]
> searchLevel _    3 = []
> searchLevel maps level =
>   case (lookupFM maps level) of
>     Nothing -> []
>     Just m -> let result = filter ([] /= ) $ map (searchWord maps [])
(keysFM m) in
>       if result == []
>         then searchLevel maps (level-1)
>         else head result


Kick off search at top-level.

> start :: WordMaps -> [String]
> start maps = searchLevel maps (last (keysFM maps))


Look for a specific trail. For testing/debugging.

> findw :: WordMaps -> [String]
> findw maps = searchWord maps [] (sort "entrails")

 findw maps = searchWord maps [] (sort "nails")
 findw maps = searchWord maps [] (sort "salient")
 findw maps = searchWord maps [] (sort "entrails")
 findw maps = searchWord maps [] (sort "clarinets")


> readWords :: IO String
> readWords = do
>   wordMaps <- populateMaps emptyFM
>   return (printTrail wordMaps (findw wordMaps))
>   -- This is how we'd normally start:
>   --return (printTrail wordMaps (start wordMaps))


> main :: IO ()
> main = do
>    s <- readWords
>    putStrLn s
>    return ()


*****************************************************************
The information in this email and in any attachments is 
confidential and intended solely for the attention and use 
of the named addressee(s). This information may be 
subject to legal professional or other privilege or may 
otherwise be protected by work product immunity or other 
legal rules.  It must not be disclosed to any person without 
our authority.

If you are not the intended recipient, or a person 
responsible for delivering it to the intended recipient, you 
are not authorised to and must not disclose, copy, 
distribute, or retain this message or any part of it.
*****************************************************************



More information about the Haskell-Cafe mailing list