[Haskell-beginners] histogram over large data

Stephen Tetley stephen.tetley at gmail.com
Tue Jun 5 22:48:38 CEST 2012


Hi Ian

You really don't want to use a sort as it needs the whole list in
memory (or length as that counts its way through a list...).

If your data is integers and sparse - IntMap (provided you don't
overflow Int size) or the HashMaps suggested by Radoslaw should work.

If the data isn't sparse there are mutable arrays or (probably)
Data.Vector - I haven't got round to using the latter yet so can't
really comment on its suitability with respect to mutability.

If your data isn't integers you need to work out how to make a good
key from it so you can map key to count.

You can "fold" a Map or mutable array through a traversal of the data
- lists provide the usual fold (foldr) but if you have data stored in
a database there should be libraries for folding with an "iteraratee"
- again I haven't got round to using any of the iteratee packages yet
so couldn't comment on which one is the most appropriate.

Best wishes

Stephen



More information about the Beginners mailing list