[Haskell-cafe] Efficient functional idiom for histogram

Fri Jul 31 17:14:28 EDT 2009

Am Freitag 31 Juli 2009 22:39:53 schrieb Paul Moore:
> How would I efficiently write a function in Haskell to count
> occurrences of unique elements in a (potentially very large) list? For
> example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
> [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
> representation).
>
> Clearly, this won't be possible for infinite input, but I would like
> it to be as efficient as possible for (lazy) input lists containing
> many millions of elements. So an implementation based on group . sort
> is not going to work.
>
> In an imperative language like Python, I'd use a dictionary as an
> accumulator - something like
>
>     for el in input:
>         accums[i] = accums.get(i, 0) + 1
>
> That's about as efficient as you can hope for (assuming an efficient
> dictionary implementation). How would I code something equivalent in
> Haskell?

If the elements come from a relatively small range and are suitable for array indexing,

import Data.Array.IArray
import Data.Array.Unboxed

accumArray :: (IArray a e, Ix i)
           => (e -> e' -> e)     -- ^ An accumulating function
           -> e                  -- ^ A default element
           -> (i,i)              -- ^ The bounds of the array
           -> [(i, e')]          -- ^ List of associations
           -> a i e              -- ^ Returns: the array

accumArray (+) 0 (mini,maxi) $ zip list (repeat 1)

is pretty fast (beats the hell out of Data.Map). If your elements can't be unboxed, the 
accumArray function from Data.Array does it too, albeit much slower (still faster than 
Data.Map, in my experience).

>
> Thanks,
> Paul.