[Haskell-cafe] Associative array updating
Cale Gibbard
cgibbard at gmail.com
Mon Jan 3 03:06:28 EST 2005
Hello,
I found the following implementation using finite maps to work rather
well. (See http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.FiniteMap.html)
-------------------
import Char
import Data.FiniteMap
isLetter c = isAlphaNum c || c=='\''
normalize = map toLower . filter isLetter
nwords = filter (not . null) . map normalize . words
wordCount w = addListToFM_C (+) emptyFM (zip (nwords w) [1,1 ..])
main = do
s <- getContents
mapM_ print $ fmToList $ wordCount s
-----------------
On a file with 264724 words and 17391 distinct words, this had the
following times:
real 0m38.208s
user 0m4.302s
sys 0m0.214s
It was compiled with -O using ghc 6.2.2. I had to increse the stack
size slightly with a commandline option.
Most of the time was actually spent rendering the output. I didn't
bother to wait for the original code to finish, as it was reporting
only a few distinct words per second.
Hope this helps
- Cale
On Sat, 01 Jan 2005 13:28:41 +0000, Dominic Fox
<dominic.fox1 at ntlworld.com> wrote:
> Another newbie question.
>
> The following is a naive attempt at a word count function, that takes a
> string and returns a list of word/count pairs.
>
> import Char
>
> isLetter c = isAlphaNum c || c=='\''
>
> normalize = map toLower . filter isLetter
>
> isWord [] = False
> isWord s = True
>
> nwords = filter isWord . map normalize . words
>
> update w [] = [(w, 1)]
> update w ((x, n):xs) =
> if x==w
> then (x, n+1) : xs
> else (x, n) : (update w xs)
>
> wordCount = foldr update [] . nwords
>
> It works fine, but it's obviously very inefficient. I could make it less
> inefficient by implementing the word/count dictionary as a hash table or
> binary tree, but that's what standard libraries are for. So - what is
> there in the standard libraries that I could use in this scenario? I've
> looked at the Array module, but don't think it's quite suitable (we
> don't know the array bounds ahead of time, and the indices are strings).
> Data.HashTable looks better, but involves IO as it mutates the
> dictionary entries in place. Is there an alternative that could be used
> with a fold as in the code above?
>
> Dominic
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
More information about the Haskell-Cafe
mailing list