[Haskell-cafe] Nice way to calculate character frequency in a
string
Udo Stenzel
u.stenzel at web.de
Wed Oct 26 08:35:23 EDT 2005
Pedro Baltazar Vasconcelos wrote:
>
> Two solutions using immutable and mutable arrays and no unsafe operations:
Both solutions certainly count as nice, but both exhibit an ugly memory
leak. As usual, this is due to too much laziness: no intermediate
result is ever evaluated until it is too late. GHCi dies on both (freq1
$ replicate 1000000 'x') and (freq2 $ replicate 1000000 'x') with a
stack overflow.
Both versions can be fixed by using unboxed arrays, which is fine when
counting Ints, but already impossible with Intergers. The mutable
version has an easy general fix:
> -- using mutable ST arrays
> hist2 :: String -> STArray s Char Int -> ST s ()
> hist2 str arr = sequence_ [do { i<-readArray arr c; writeArray arr c $! 1+i }
> | c<-str]
Note the strict application ^^
What I find unsettling, is that the nice solution, the only one not to
rely on GHC specific extensions, cannot be fixed:
> hist1 str = accumArray (+) 0 ('\0','\255') [(c,1) | c<-str]
No amount of strictness annotations can fix this, the correct place
would be *inside* accumArray. The same problem arises with other
containers that provide accum-like functions, notably
Data.Map.insertWith and Data.Map.mapAccum.
That raises the question: Should combining functions on containers be
provided in a strict variant? Should strict application be the default?
If something turns out to be too strict, I can always wrap it in a data
type; if it turns out to be too lazy, I'm hosed. Or am I overlooking
something?
Udo.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20051026/ca3b6dec/attachment.bin
More information about the Haskell-Cafe
mailing list