[Haskell-cafe] Histogram creation

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Mon Nov 10 15:25:07 EST 2008

Alexey Khudyakov wrote:
> Hello!
> I'm tryig to write efficient code for creating histograms. I have following
> requirements for it:
> 1. O(1) element insertion
> 2. No reallocations. Thus in place updates are needed.
> accumArray won't go because I need to fill a lot of histograms (hundrends) from
> vely long list of data (possibly millions of elements) and it will traverse
> input data for each histogram.

Sorry, Duncan is right. I misread here.

My first idea would still be to use accumArray though, or rather, accum,
processing the input data in chunks of an appropriate size (which depends
on the histogram sizes.)

But actually, the ST code isn't bad (I hope this isn't homework):

import Control.Monad.ST
import Control.Monad
import Data.Array.ST
import Data.Array.Unboxed

stuArray :: ST s (STUArray s i e) -> ST s (STUArray s i e)
stuArray = id

-- Create histograms.
-- Each histogram is described by a pair (f, (l, u)), where 'f' maps
-- a data entry to an Int index, and l and u are lower and upper bounds
-- of the indices, respectively.
mkHistograms :: [(a -> Int, (Int, Int))] -> [a] -> [UArray Int Int]
mkHistograms hs ds = runST collect where
    -- Why is the type signature on 'collect' required here?
    collect :: ST s [UArray Int Int]
    collect = do
        -- create histogram arrays of appropriate sizes
        histograms <- forM hs $ \(_, range) -> do
            stuArray $ newArray range 0

        -- iterate through the data
        forM_ ds $ \d -> do
            -- iterate through the histograms
            forM_ (zip hs histograms) $ \((f, _), h) -> do
	    	-- update appropriate entry
                writeArray h (f d) . succ =<< readArray h (f d)

        -- finally, freeze the histograms and return them
        -- (using unsafeFreeze is ok because we're done modifying the
        -- arrays)
        mapM unsafeFreeze histograms

test = mkHistograms [((`mod` 3), (0,2)), ((`mod` 5), (0,4))] [0..10]

-- test returns
-- [array (0,2) [(0,4),(1,4),(2,3)],
--  array (0,4) [(0,3),(1,2),(2,2),(3,2),(4,2)]]


P.S. Ryan is right, too - I'm not sure where I got confused there.
  runST $ foo  didn't work in ghc 6.6; I knew that it works in
  ghc 6.8.3, but I thought this was changed again.

More information about the Haskell-Cafe mailing list