[Haskell-cafe] Histogram creation
Bertram Felgenhauer
bertram.felgenhauer at googlemail.com
Mon Nov 10 15:25:07 EST 2008
Alexey Khudyakov wrote:
> Hello!
>
> I'm tryig to write efficient code for creating histograms. I have following
> requirements for it:
>
> 1. O(1) element insertion
> 2. No reallocations. Thus in place updates are needed.
>
>
> accumArray won't go because I need to fill a lot of histograms (hundrends) from
> vely long list of data (possibly millions of elements) and it will traverse
> input data for each histogram.
Sorry, Duncan is right. I misread here.
My first idea would still be to use accumArray though, or rather, accum,
processing the input data in chunks of an appropriate size (which depends
on the histogram sizes.)
But actually, the ST code isn't bad (I hope this isn't homework):
------------------------------------------------------------------------
import Control.Monad.ST
import Control.Monad
import Data.Array.ST
import Data.Array.Unboxed
stuArray :: ST s (STUArray s i e) -> ST s (STUArray s i e)
stuArray = id
-- Create histograms.
--
-- Each histogram is described by a pair (f, (l, u)), where 'f' maps
-- a data entry to an Int index, and l and u are lower and upper bounds
-- of the indices, respectively.
--
mkHistograms :: [(a -> Int, (Int, Int))] -> [a] -> [UArray Int Int]
mkHistograms hs ds = runST collect where
-- Why is the type signature on 'collect' required here?
collect :: ST s [UArray Int Int]
collect = do
-- create histogram arrays of appropriate sizes
histograms <- forM hs $ \(_, range) -> do
stuArray $ newArray range 0
-- iterate through the data
forM_ ds $ \d -> do
-- iterate through the histograms
forM_ (zip hs histograms) $ \((f, _), h) -> do
-- update appropriate entry
writeArray h (f d) . succ =<< readArray h (f d)
-- finally, freeze the histograms and return them
-- (using unsafeFreeze is ok because we're done modifying the
-- arrays)
mapM unsafeFreeze histograms
test = mkHistograms [((`mod` 3), (0,2)), ((`mod` 5), (0,4))] [0..10]
-- test returns
-- [array (0,2) [(0,4),(1,4),(2,3)],
-- array (0,4) [(0,3),(1,2),(2,2),(3,2),(4,2)]]
------------------------------------------------------------------------
Bertram
P.S. Ryan is right, too - I'm not sure where I got confused there.
runST $ foo didn't work in ghc 6.6; I knew that it works in
ghc 6.8.3, but I thought this was changed again.
More information about the Haskell-Cafe
mailing list