[Haskell-cafe] Efficient functional idiom for histogram

Sat Aug 1 09:44:39 EDT 2009

2009/7/31 Paul Moore <p.f.moore at gmail.com>:
> 2009/7/31 Gregory Collins <greg at gregorycollins.net>:
>> Paul Moore <p.f.moore at gmail.com> writes:
>>
>>> How would I efficiently write a function in Haskell to count
>>> occurrences of unique elements in a (potentially very large) list? For
>>> example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
>>> [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
>>> representation).
>>
>>    import qualified Data.Map as Map
>>    import           Data.Map (Map)
>>
>>    histogram :: Ord a => [a] -> [(a,Int)]
>>    histogram = Map.assocs . foldl f Map.empty
>>      where
>>        f m k = Map.insertWith (+) k 1 m
>
> Right. I see how that works, and can work out how to think about this
> sort of thing from your example. Thanks very much.
>
> BTW, I did know that Haskell had an efficient map implementation, I
> just wasn't sure how to use it "functionally" - I probably should have
> searched a bit harder for examples before posting. Thanks for the help
> in any case.

Hmm, I'm obviously still mucking up the performance somehow. My full
program (still a toy, but a step on the way to what I'm aiming at) is
as follows. It's rolling 3 6-sided dice 100000 times, and printing a
summary of the results.

import System.Random
import qualified Data.Map as Map
import Data.Map (Map)
import Data.List

dice :: Int -> Int -> IO Int
dice 0 n = return 0
dice m n = do
  total <- dice (m - 1) n
  roll <- randomRIO (1, n)
  return (total + roll)

simulate count m n = do
  mapM (dice m) (replicate count n)

histogram :: Ord a => [a] -> [(a,Int)]
histogram = Map.assocs . foldl f Map.empty
  where
    f m k = Map.insertWith (+) k 1 m

simulation = do
  lst <- simulate 100000 3 6
  return (histogram lst)

main = do
  s <- simulation
  putStrLn (show s)

When compiled, this takes over twice as long as a naively implemented
Python program.

What am I doing wrong here? I'd have expected compiled Haskell to be
faster than interpreted Python, so obviously my approach is wrong. I'm
expecting the answer to be that I've got unnecessary laziness - which
is fine, but ultimately my interest is in ease of expression and
performance combined, so I'm looking for beginner-level improvements
rather than subtle advanced techniques like unboxing.

Thanks,
Paul.

PS I know my code is probably fairly clumsy - I'd appreciate style
suggestions, but my main interest here is whether a beginner, with a
broad programming background, a basic understanding of Haskell, and
access to Google, put together a clear, efficient, program (ie, the
case where my usual scripting language is too slow and I want to knock
something up quickly in a high-level, high-performance language).