[Haskell-cafe] Nice way to calculate character frequency in a string

Lemmih lemmih at gmail.com
Tue Oct 25 06:20:33 EDT 2005


On 10/25/05, Charles SDudu <iwin_1 at hotmail.com> wrote:
> Hello, I need to calculate the frequency of each character in a String. And
> if I can do this really well in C, I dont find a nice (and fast) answer in
> haskell. I tried several functions, listed below, and even the fastest do a
> lot of unnecessary things :
>
> calc :: String -> [ (Char, Int) ]
>
> -- 3.0s normally (without profiling)
> -- time 10-12% alloc 59% (info from profiling)
> -- so it's the fastest when I profile but not when I compile normally
> -- mutable array may be better but it's to complicated for me
>
> calc =  filter (\p -> snd p > 0) . assocs .
>                 foldl (\k c -> unsafeReplace k [(fromEnum c, (unsafeAt k (fromEnum c))+1)]
> ) k
>                 where k = array (toEnum 0, toEnum 255) [(toEnum i, 0) | i <- [0 .. 255]]
> :: UArray Char Int
>
>
> -- 2.1s normally
> -- time 15-19% alloc 40% (info from profiling)
> -- so for true, it's the best but the sort and group probably do unnecessary
> things
> calc s = map (\l -> (head l, length l)) $ group $ sort s
>
> -- 3.4s normally
> -- time 58% alloc 0% (info from profiling)
> -- this one dont do unnecessary things but has to read the file again for
> each character
> -- calc s = map (\c -> (c, foldl (\a b -> if b==c then a+1 else a) 0 s)) $
> nub s
>
> -- 22s normally
> -- time 85% alloc 92% (info from profiling)
> -- this one read the file only one time but is really slow
> calc = foldl (addfreq) []
>         where addfreq f c =     let
>                                                 xs1 = takeWhile (\f -> fst f /= c) f
>                                                 xs2 = dropWhile (\f -> fst f /= c) f
>                                                 xs = if null xs2 then [(c,1)] else ((fst . head) xs2, (snd . head) xs2
> + 1) : tail xs2
>                                                 in  xs1 ++ xs
>
> -- I have a lot of even slower version but I wont include them
> -- each compilation was done with GHC 6.4.1 with the -O flag and with -O
> -prof -auto-all for profiling

Try this:

> import Data.Array.ST
> import Data.Array.Base
> import Control.Monad
> import Control.Monad.ST
> import Data.Word
> import Data.Char
>
> main = do c <- getContents
>           print (frequency c)
>
> frequency str = runST (frequency' str)
>
> frequency' :: String -> ST s [(Char,Int)]
> frequency' str = do arr <- newArray ('\0','\255') 0 :: ST s (STUArray s Char Int)
>                     flip mapM_ str  $ \c -> do r <- unsafeRead arr (ord c)
>                                                unsafeWrite arr (ord c) (r+1)
>                     liftM (filter (\(c,n) -> n>0)) (getAssocs arr)


--
Friendly,
  Lemmih


More information about the Haskell-Cafe mailing list