[Haskell-cafe] Digrams

Dominic Steinitz dominic.steinitz at blueyonder.co.uk
Sat Feb 11 09:16:14 EST 2006


On Saturday 11 Feb 2006 1:09 pm, Jon Fairbairn wrote:
> On 2006-02-11 at 12:25GMT Dominic Steinitz wrote:
> > I've quickly put this together to measure frequencies of pairs of letters
> > (e.g. 1st and 2nd) in words. It works fine on a small test data sets but
> > I have a feeling that it will perform poorly as it spends a lot of time
> > updating a 26*26 array. Before I throw a dictionary at it, does anyone
> > have any suggestions?
>
> I think this is the sort of thing for which accumArray was
> invented.
Jon, Much better. Thanks, Dominic.

-------------- next part --------------
import System.IO
import Data.Char
import Data.Array
import Data.List

main =
   do h <- openFile "girls2005.txt" ReadMode
      c <- hGetContents h
      let xs = map putStrLn . 
               map show . 
               reverse . 
               sort . 
               map Cell . 
               assocs $ f 1 2 (lines c)
      sequence_ xs
      putStrLn "Finished"

newtype Cell = Cell ((Char,Char),Int)
   deriving Eq

instance Ord Cell where
   Cell (_,i) <= Cell (_,j) = i <= j

instance Show Cell where
   show (Cell ((i,j),f)) = i : ',' : j : ',' : show f

hit m n l =
   (toUpper (l!!(m-1)), toUpper (l!!(n-1)))

f m n c =
   accumArray (+) 0 (('A','A'),('Z','Z')) [(hit m n l,1) | l <- c]


More information about the Haskell-Cafe mailing list