[Haskell-cafe] Re: Optimizing spelling correction program
wren ng thornton
wren at freegeek.org
Wed Jun 24 20:49:33 EDT 2009
Kamil Dworakowski wrote:
> On Jun 22, 10:03 am, Eugene Kirpichov <ekirpic... at gmail.com> wrote:
>> Hey, you're using String I/O!
>>
>> nWORDS <- fmap (train . map B.pack . words) (readFile "big.txt")
>>
>> This should be
>>
>> WORDS <- fmap (train . B.words) (B.readFile "big.txt")
>>
>> By the way, which exact file do you use as a misspellings file? The
>> corpus linked to at Norvig's page has many.
>> And do you have a driver program that I could run and obtain your timings?
>
> Yep, Don pointed that out and I have changed the program accordingly.
> It didn't make any difference though. The time spent on building the
> dictionary is a small portion of the overall run time.
>
> Please see the repository contents for the current version of the
> program:
> http://patch-tag.com/r/spellcorrect/snapshot/current/content/pretty
>
> The eval-bytestring.hs there is the program I used for timing. Inside
> of it you will find the name of the misspellings file needed.
>
> Thanks all for the suggestions. I'll try them when I get home tonight.
Another suggestion, is that you should try to make sure that the lists
constructed by getCommonSpellingMistakesWithCorrections get optimized
away. As written I'm not sure there will be sufficient inlining to
ensure that. If you want to be explicit about removing them, something
like the following should help:
> module Main where
> import Prelude hiding (words)
> import SpellingCorrection
> import qualified Data.ByteString.Char8 as B
> import Data.Char
> import Data.IORef
> import Control.Monad (forM_)
>
> main = do
> corrector <- getCorrector
> misspell_corpus <- B.readFile "FAWTHROP1DAT.643"
> n <- newIORef (0::Int)
> wrong <- newIORef (0::Int)
> forM_ (B.lines misspell_corpus) $ \line -> do
> modifyIORef' n (1+)
> let [ms,c] = map (B.map toLower) . B.words $ line
> if corrector ms /= c
> then modifyIORef' wrong (1+)
> else return ()
> accuracy <- do
> n' <- readIORef n
> w' <- readIORef wrong
> return $! 100 * fromIntegral w' / fromIntegral n'
> putStrLn $ "accuracy" ++ show (100 - accuracy) ++ "%"
>
> modifyIORef' :: IORef a -> (a -> a) -> IO ()
> modifyIORef' r f = readIORef r >>= (writeIORef r $!) . f
> {-# INLINE modifyIORef' #-}
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list