[Haskell-beginners] histogram over large data

Wed Jun 6 16:54:08 CEST 2012

Ian Knopke <ian.knopke at gmail.com> writes:

> Hi,
>
> I'd like to build a histogram, or even just a frequency count of some
> categorical data from a large database. In a language such as perl I'd
> do something like this:
>
> my %freq; # hash
> while (my $item = get_next_from_db()){
>    $freq{$item}++;
> }
>
> and then sum the hash values and divide the value of each key by the
> sum to get the histogram.
>
> Is there an easy way to do the same thing in Haskell? It looked like
> an easy task but I seem to be having a lot of trouble getting this to
> work properly, as it doesn't seem to be behaving very lazily. I'm
> guessing I should be doing something with the State monad, but I'm not
> very good at using that yet.
>
I have a package[1] which I'll eventually put up on Hackage for plotting
histograms with Chart[2]. The underlying histogram implementation (for
homogenous bin widths only) is intended for use with dense data and
internally keeps its accumulator in a mutable vector in the ST
monad. Feel free to borrow the code
(chart-histogram/Numeric/Histogram.hs).

Cheers,

- Ben

[1] https://github.com/bgamari/chart-histogram
[2] http://hackage.haskell.org/package/Chart