Data.HashTable bug

Simon Marlow simonmar at microsoft.com
Fri Mar 19 12:15:37 EST 2004


Yup, that's it.  It does indeed go nice and fast with GHC 5.04.3, but
slows down with 6.2.  This is because 6.2 is using the new
Data.PackedString library, which is unoptimised.  (the code also needs a
fix to hashPS for 6.2: indexPS is 0-based rather than 1-based).
 
Cheers,
	Simon

> Does the code below look familiar?
> 
> 	/Josef
> 
> \begin{code}
> -- compile with: ghc -O -package lang
> -- run with:     ./a.out +RTS -H10m -K4m <Input.txt
> 
> import MArray
> import PackedString
> import IOExts
> import IO
> import Char
> import Monad
> 
> arr_size = 20000
> 
> main = do
>   h <- openFile "Usr.Dict.Words" ReadMode
>   sz <- hFileSize h
>   ps <- hGetPS h (fromIntegral sz)
> 
>   tbl <- newArray (0,arr_size) []
>   mapM (addToHashTable tbl) (linesPS ps)
> 
>   h <- openFile "Input.txt" ReadMode
>   sz <- hFileSize h
>   ps <- hGetPS h (fromIntegral sz)
> 
>   let test s = do b <- elemHashTable s tbl
> 		  when (not b) (putStrLn (unpackPS s))
>   mapM test (linesPS ps)
> 
> type HashTable = IOArray Int [PackedString]
> 
> -- Looks bad, but GHC does a great job of optimising it:
> hashPS :: PackedString -> Int
> hashPS ps = foldr f 0 (map (ord.indexPS ps) [1..lengthPS ps])
>   where f n m = n + m * 128 `mod` 1048583
> 
> addToHashTable :: HashTable -> PackedString -> IO ()
> addToHashTable tbl s = do
>   let h = hashPS s
>       index = h `mod` arr_size
>   r <- readArray tbl index
>   if (s `elem` r) then return () else do
>   writeArray tbl index (s : r)
> 
> elemHashTable :: PackedString -> HashTable -> IO Bool
> elemHashTable s tbl = do
>   let h = hashPS s
>       index = h `mod` arr_size
>   r <- readArray tbl index
>   return (s `elem` r)
> \end{code}
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> 


More information about the Glasgow-haskell-users mailing list