[Haskell-cafe] Associative array updating

Cale Gibbard cgibbard at gmail.com
Mon Jan 3 03:06:28 EST 2005


Hello,

I found the following implementation using finite maps to work rather
well. (See http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.FiniteMap.html)
-------------------
import Char
import Data.FiniteMap

isLetter c = isAlphaNum c || c=='\''

normalize = map toLower . filter isLetter

nwords = filter (not . null) . map normalize . words

wordCount w = addListToFM_C (+) emptyFM (zip (nwords w) [1,1 ..])

main = do
    s <- getContents
    mapM_ print $ fmToList $ wordCount s
-----------------

On a file with 264724 words and 17391 distinct words, this had the
following times:
real    0m38.208s
user    0m4.302s
sys     0m0.214s

It was compiled with -O using ghc 6.2.2. I had to increse the stack
size slightly with a commandline option.

Most of the time was actually spent rendering the output. I didn't
bother to wait for the original code to finish, as it was reporting
only a few distinct words per second.

Hope this helps
 - Cale
 
On Sat, 01 Jan 2005 13:28:41 +0000, Dominic Fox
<dominic.fox1 at ntlworld.com> wrote:
> Another newbie question.
> 
> The following is a naive attempt at a word count function, that takes a
> string and returns a list of word/count pairs.
> 
>      import Char
> 
>     isLetter c = isAlphaNum c || c=='\''
> 
>     normalize = map toLower . filter isLetter
> 
>     isWord [] = False
>     isWord s = True
> 
>     nwords = filter isWord . map normalize . words
> 
>     update w [] = [(w, 1)]
>     update w ((x, n):xs) =
>         if x==w
>             then (x, n+1) : xs
>             else (x, n) : (update w xs)
> 
>     wordCount = foldr update [] . nwords
> 
> It works fine, but it's obviously very inefficient. I could make it less
> inefficient by implementing the word/count dictionary as a hash table or
> binary tree, but that's what standard libraries are for. So - what is
> there in the standard libraries that I could use in this scenario? I've
> looked at the Array module, but don't think it's quite suitable (we
> don't know the array bounds ahead of time, and the indices are strings).
> Data.HashTable looks better, but involves IO as it mutates the
> dictionary entries in place. Is there an alternative that could be used
> with a fold as in the code above?
> 
> Dominic
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>


More information about the Haskell-Cafe mailing list