[Haskell-cafe] Re: Optimizing spelling correction program

wren ng thornton wren at freegeek.org
Wed Jun 24 20:49:33 EDT 2009


Kamil Dworakowski wrote:
> On Jun 22, 10:03 am, Eugene Kirpichov <ekirpic... at gmail.com> wrote:
>> Hey, you're using String I/O!
>>
>> nWORDS <- fmap (train . map B.pack . words) (readFile "big.txt")
>>
>> This should be
>>
>> WORDS <- fmap (train . B.words) (B.readFile "big.txt")
>>
>> By the way, which exact file do you use as a misspellings file? The
>> corpus linked to at Norvig's page has many.
>> And do you have a driver program that I could run and obtain your timings?
> 
> Yep, Don pointed that out and I have changed the program accordingly.
> It didn't make any difference though. The time spent on building the
> dictionary is a small portion of the overall run time.
> 
> Please see the repository contents for the current version of the
> program:
> http://patch-tag.com/r/spellcorrect/snapshot/current/content/pretty
> 
> The eval-bytestring.hs there is the program I used for timing. Inside
> of it you will find the name of the misspellings file needed.
> 
> Thanks all for the suggestions. I'll try them when I get home tonight.


Another suggestion, is that you should try to make sure that the lists 
constructed by getCommonSpellingMistakesWithCorrections get optimized 
away. As written I'm not sure there will be sufficient inlining to 
ensure that. If you want to be explicit about removing them, something 
like the following should help:

 > module Main where
 > import Prelude hiding (words)
 > import SpellingCorrection
 > import qualified Data.ByteString.Char8 as B
 > import Data.Char
 > import Data.IORef
 > import Control.Monad (forM_)
 >
 > main = do
 >     corrector       <- getCorrector
 >     misspell_corpus <- B.readFile "FAWTHROP1DAT.643"
 >     n               <- newIORef (0::Int)
 >     wrong           <- newIORef (0::Int)
 >     forM_ (B.lines misspell_corpus) $ \line -> do
 >         modifyIORef' n (1+)
 >         let [ms,c] = map (B.map toLower) . B.words $ line
 >         if corrector ms /= c
 >             then modifyIORef' wrong (1+)
 >             else return ()
 >     accuracy <- do
 >         n' <- readIORef n
 >         w' <- readIORef wrong
 >         return $! 100 * fromIntegral w' / fromIntegral n'
 >     putStrLn $ "accuracy" ++ show (100 - accuracy) ++ "%"
 >
 > modifyIORef'    :: IORef a -> (a -> a) -> IO ()
 > modifyIORef' r f = readIORef r >>= (writeIORef r $!) . f
 > {-# INLINE modifyIORef' #-}

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list