[Haskell-cafe] Associative array updating
cgibbard at gmail.com
Mon Jan 3 03:06:28 EST 2005
I found the following implementation using finite maps to work rather
well. (See http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.FiniteMap.html)
isLetter c = isAlphaNum c || c=='\''
normalize = map toLower . filter isLetter
nwords = filter (not . null) . map normalize . words
wordCount w = addListToFM_C (+) emptyFM (zip (nwords w) [1,1 ..])
main = do
s <- getContents
mapM_ print $ fmToList $ wordCount s
On a file with 264724 words and 17391 distinct words, this had the
It was compiled with -O using ghc 6.2.2. I had to increse the stack
size slightly with a commandline option.
Most of the time was actually spent rendering the output. I didn't
bother to wait for the original code to finish, as it was reporting
only a few distinct words per second.
Hope this helps
On Sat, 01 Jan 2005 13:28:41 +0000, Dominic Fox
<dominic.fox1 at ntlworld.com> wrote:
> Another newbie question.
> The following is a naive attempt at a word count function, that takes a
> string and returns a list of word/count pairs.
> import Char
> isLetter c = isAlphaNum c || c=='\''
> normalize = map toLower . filter isLetter
> isWord  = False
> isWord s = True
> nwords = filter isWord . map normalize . words
> update w  = [(w, 1)]
> update w ((x, n):xs) =
> if x==w
> then (x, n+1) : xs
> else (x, n) : (update w xs)
> wordCount = foldr update  . nwords
> It works fine, but it's obviously very inefficient. I could make it less
> inefficient by implementing the word/count dictionary as a hash table or
> binary tree, but that's what standard libraries are for. So - what is
> there in the standard libraries that I could use in this scenario? I've
> looked at the Array module, but don't think it's quite suitable (we
> don't know the array bounds ahead of time, and the indices are strings).
> Data.HashTable looks better, but involves IO as it mutates the
> dictionary entries in place. Is there an alternative that could be used
> with a fold as in the code above?
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
More information about the Haskell-Cafe