[Haskell-cafe] Digrams
Dominic Steinitz
dominic.steinitz at blueyonder.co.uk
Sat Feb 11 09:16:14 EST 2006
On Saturday 11 Feb 2006 1:09 pm, Jon Fairbairn wrote:
> On 2006-02-11 at 12:25GMT Dominic Steinitz wrote:
> > I've quickly put this together to measure frequencies of pairs of letters
> > (e.g. 1st and 2nd) in words. It works fine on a small test data sets but
> > I have a feeling that it will perform poorly as it spends a lot of time
> > updating a 26*26 array. Before I throw a dictionary at it, does anyone
> > have any suggestions?
>
> I think this is the sort of thing for which accumArray was
> invented.
Jon, Much better. Thanks, Dominic.
-------------- next part --------------
import System.IO
import Data.Char
import Data.Array
import Data.List
main =
do h <- openFile "girls2005.txt" ReadMode
c <- hGetContents h
let xs = map putStrLn .
map show .
reverse .
sort .
map Cell .
assocs $ f 1 2 (lines c)
sequence_ xs
putStrLn "Finished"
newtype Cell = Cell ((Char,Char),Int)
deriving Eq
instance Ord Cell where
Cell (_,i) <= Cell (_,j) = i <= j
instance Show Cell where
show (Cell ((i,j),f)) = i : ',' : j : ',' : show f
hit m n l =
(toUpper (l!!(m-1)), toUpper (l!!(n-1)))
f m n c =
accumArray (+) 0 (('A','A'),('Z','Z')) [(hit m n l,1) | l <- c]
More information about the Haskell-Cafe
mailing list