[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

Claus Reinke claus.reinke at talk21.com
Thu Jun 24 16:40:30 EDT 2010


>> I'll work with Simon to investigate the runtime, but would welcome any
>> ideas on further speeding up cafe4.
> 
> An update on this: with the help of Alex I tracked down the problem (an 
> integer overflow bug in GHC's memory allocator), and his program now 
> runs to completion.

So this was about keeping the program largely unchanged in 
order to keep the GHC issue repeatable for tracking? Or have 
you also looked into removing space leaks in the code (there 
still seemed to be some left in the intern/cafe5 version, iirc)?

Alexy: what does the latest version of the code look like - is 
there an uptodate text connecting all the versions/branches/tags, 
so that one can find the latest version, and is there a small/tiny 
data source for profiling purposes?

> This is the largest program (in terms of memory requirements) I've ever 
> seen anyone run using GHC.  In fact there was no machine in our building 
> capable of running it, I had to fire up the largest Amazon EC2 instance 
> available (68GB) to debug it - this bug cost me $26.  Here are the stats 
> from the working program:
> 
>  392,908,177,040 bytes allocated in the heap

Ouch! If you keep on doing that, we Haskellers will be paged 
out of reality to make room for the heaps of GHC's executables!-)

Claus
 


More information about the Haskell-Cafe mailing list