Help with a Shootout program
Alson Kemp
Alson.Kemp at sloan.mit.edu
Thu Feb 24 07:40:49 EST 2005
All,
In order to teach myself Haskell, I've been tinkering with some of
the Shootout (http://shootout.alioth.debian.org/great/) programs.
Substantially improved the Mandelbrot program. Then started to work on the
Spellcheck program, since Haskell seemed to do quite poorly at it. However,
my revision appears to be slower than the original. I was hoping that I
could get some help from y'all.
Original program:
import Data.Set
main = do
d <- readFile "Usr.Dict.Words"
let misspelled x = not $ x `elementOf` (mkSet (lines d))
interact $ unlines . filter misspelled . Lines
The original program is nice and short, but uses Set to hold the 80,000 word
dictionary. The Input file is 80,000 words, too, so I figured that
searching the Set dictionary would be an O(n^2) task (80,000 spellchecks x
linear search on 80,000 word dictionary Set). All of the other languages
used hash tables for this task (O(n) (80,000 spellchecks x hash search on
dictionary), so I revised the program to do the same.
----------------------------------------
Revised program:
import Data.HashTable
import IO
import GHC.Int
buildHash :: (String -> Int32) -> String -> IO (HashTable String Bool)
buildHash hash_fn list = do
table <- new (==) hash_fn
sequence_ [ insert table k True | k <- lines list ]
return table
spellcheck :: HashTable String Bool -> String -> String -> IO String
spellcheck ds "" accum = return accum
spellcheck ds xs accum =
let (x, xs') = break (== '\n') xs
in do
r <- Data.HashTable.lookup ds x
case r of
Just b -> spellcheck ds (tail xs')
accum
_ -> spellcheck ds (tail xs')
(x ++ accum)
main = do
ds <- readFile "Usr.Dict.Words" >>= buildHash hashString
hGetContents stdin >>= (\xs -> spellcheck ds xs "") >>= print
----------------------------------------
The Revised program is 33% faster than the Original program using
the original dataset. However, if I boost the Input to 1.4M words while
keeping the dictionary the same, then the two programs are equal in speed.
This suggests that loading the dictionary and building the Hash/Set is
faster in the Revised program, but that _lookups_ are _slower_ in the
Revised program. Is Hashtable.lookup really slower than Set.elementOf?
I'm missing something and I don't know what it is. Frustrating.
- Alson
More information about the Glasgow-haskell-users
mailing list