[Haskell-cafe] Digrams
Dominic Steinitz
dominic.steinitz at blueyonder.co.uk
Sat Feb 11 07:25:11 EST 2006
I've quickly put this together to measure frequencies of pairs of letters
(e.g. 1st and 2nd) in words. It works fine on a small test data sets but I
have a feeling that it will perform poorly as it spends a lot of time
updating a 26*26 array. Before I throw a dictionary at it, does anyone have
any suggestions?
Thanks, Dominic.
-------------- next part --------------
import System.IO
import Data.Char
import Data.Array
import Data.List
main =
do h <- openFile "girls2005.txt" ReadMode
c <- hGetContents h
let freqs1 = g 1 2 (lines c) digramArr
xs = map putStrLn .
map show .
reverse .
sort .
map Cell .
assocs $ freqs1
sequence_ xs
putStrLn "Finished"
newtype Cell = Cell ((Char,Char),Int)
deriving Eq
instance Ord Cell where
Cell (_,i) <= Cell (_,j) = i <= j
instance Show Cell where
show (Cell ((i,j),f)) = i : ',' : j : ',' : show f
letters = ['A'..'Z']
digramElems = [((i,j),0) | i <- letters, j <- letters]
digramArr = array (('A','A'),('Z','Z')) digramElems
f n m s a =
a // [((i,j),x+1)]
where i = toUpper (s!!(n-1))
j = toUpper (s!!(m-1))
x = a!(i,j)
g n m [] a = a
g n m (s:ss) a = g n m ss (f n m s a)
More information about the Haskell-Cafe
mailing list