[Haskell-cafe] Nice way to calculate character frequency in a string

Udo Stenzel u.stenzel at web.de
Wed Oct 26 08:35:23 EDT 2005


Pedro Baltazar Vasconcelos wrote:
> 
> Two solutions using immutable and mutable arrays and no unsafe operations:

Both solutions certainly count as nice, but both exhibit an ugly memory
leak.  As usual, this is due to too much laziness: no intermediate
result is ever evaluated until it is too late.  GHCi dies on both (freq1
$ replicate 1000000 'x') and (freq2 $ replicate 1000000 'x') with a
stack overflow.

Both versions can be fixed by using unboxed arrays, which is fine when
counting Ints, but already impossible with Intergers.  The mutable
version has an easy general fix:

> -- using mutable ST arrays
> hist2 :: String -> STArray s Char Int -> ST s ()
> hist2 str arr = sequence_ [do { i<-readArray arr c; writeArray arr c $! 1+i }
>                           | c<-str]

Note the strict application                                            ^^

What I find unsettling, is that the nice solution, the only one not to
rely on GHC specific extensions, cannot be fixed:

> hist1 str = accumArray (+) 0 ('\0','\255') [(c,1) | c<-str]

No amount of strictness annotations can fix this, the correct place
would be *inside* accumArray.  The same problem arises with other
containers that provide accum-like functions, notably
Data.Map.insertWith and Data.Map.mapAccum.

That raises the question: Should combining functions on containers be
provided in a strict variant?  Should strict application be the default?
If something turns out to be too strict, I can always wrap it in a data
type; if it turns out to be too lazy, I'm hosed.  Or am I overlooking
something?



Udo.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20051026/ca3b6dec/attachment.bin


More information about the Haskell-Cafe mailing list