[Haskell-cafe] hamming distance allocation

Daniel Fischer daniel.is.fischer at web.de
Mon Apr 19 13:31:35 EDT 2010

Am Montag 19 April 2010 17:17:11 schrieb Arnoldo Muller:
> The strings will not be longer than 30 characters.

For 20 -30 character strings, using ByteStrings should be better, in my 
tests about 40% faster, allocation figures slightly lower, resident memory 
much lower and bytes copied during GC very much lower.
For a sample of english text (many short words, few long), ByteStrings were 
about 25% faster, allocation figures very slightly lower, resident memory 
much lower, bytes copied much lower (relative difference not as large as 
for longer strings).

> I am doing sets of 2000  (total of 2000^2 distance computations)

That shouldn't give memory problems either way.

> I am expecting that all the operations will be lazyly performed but at
> some point I get a memory error.

My guess is a bad consumption pattern.

> Most of the memory is being allocated for the hamming distance and I am
> still unable to find the source of my memory leak.

Allocation as such is not a problem, resident memory is the important 
Try heap profiling to see what holds on to memory (+RTS -hc would be a good 
first run).

> Regards,
> Arnoldo
> On Mon, Apr 19, 2010 at 3:47 PM, Daniel Fischer 
<daniel.is.fischer at web.de>wrote:
> > Am Montag 19 April 2010 14:37:33 schrieb John Lato:
> > > Is it really necessary to use Strings?  I think a packed type, e.g.
> > > Vector or ByteString, would be much more efficient here.
> >
> > Not very much if the strings are fairly short (and the list isn't too
> > long, so there's not a big difference in cache-friendliness).
> > If eight-bit characters aren't enough, packing the strings into
> > UArray Int Char gives performance quite close to ByteStrings.
> >
> > > Of course this is only likely to be a benefit if you can move away
> > > from String entirely.
> > >
> > > I suspect that "hamming2" would perform better then.
> > >
> > > John

More information about the Haskell-Cafe mailing list