[Haskell-cafe] Nice way to calculate character frequency in
simonmar at microsoft.com
Tue Oct 25 08:46:05 EDT 2005
On 25 October 2005 11:59, Jon Fairbairn wrote:
> On 2005-10-25 at 12:20+0200 Lemmih wrote:
>> On 10/25/05, Charles SDudu <iwin_1 at hotmail.com> wrote:
>>> Hello, I need to calculate the frequency of each
>>> character in a String. And if I can do this really well
>>> in C, I dont find a nice (and fast) answer in haskell. I
>>> tried several functions, listed below, and even the
>>> fastest do a lot of unnecessary things :
>>> calc :: String -> [ (Char, Int) ]
>>> calc = filter (\p -> snd p > 0) . assocs .
>>> foldl (\k c -> unsafeReplace k [(fromEnum c,
>>> (unsafeAt k (fromEnum c))+1)] ) k where k = array
>>> (toEnum 0, toEnum 255) [(toEnum i, 0) | i <- [0 .. 255]]
>>>>> UArray Char Int
> [snip even more disagreable code]
> Ugh! These are all horrid. If something on the lines of
>> calc = accumArray (+) 0 (minBound, maxBound) . (map (\x->(x,1)))
> isn't fast enough, complain to the implementors! What's the
> point of functional programming if one has to twist into a
> shape that allows inspection of one's own fundament to get
> stuff to run in decent time?
To make that go fast, you'd want to use a UArray. Could the compiler
figure this out for itself? Not really, it might be able to figure out
certain special cases where changing an Array to a UArray doesn't affect
the semantics, but not in general (and GHC certainly makes no attempt to
do it). Sometimes, you just need to give the compiler a bit of help.
And you really want to read the file as a packed string, not a String.
Or hope that GHC manages to deforest calc with whatever is generating
the String. It's possible that calc deforests, but highly unlikely that
hGetContents does, so I'm afraid you'll have to do the packed string
To make it go even faster, you need to inline & specialise accumArray.
I'm not sure how much of this GHC currently does.
More information about the Haskell-Cafe