[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

Simon Marlow marlowsd at gmail.com
Thu Jun 24 08:10:32 EDT 2010

On 17/06/2010 06:23, braver wrote:
> WIth @dafis's help, there's a version tagged cafe3 on the master
> branch which is better performing with ByteString.  I also went ahead
> and interned ByteString as Int, converting the structure to IntMap
> everywhere.  That's reflected on the new "intern" branch at tag cafe4.
> Still it can't do the full 35 days for all users.  It comes close,
> however, to 30 days under ghc 6.12 with the IntMap -- just where 6.10
> was with Map ByteString.  Some profiling is in prof/ subdirectory,
> with the tag responsible and RTS profiling option in the file
> name; .prof are -P, and the rest are -hX.
> When I downsize the sample data to 1 million users, the whole run,
> with -P profiling, is done in 7.5 minutes.  Something happens when
> tripling that amount.  For instance, making -A10G may cause sefgault,
> after a fast run up to 10 days, then seeming stalling, and a dump of
> days up to 28 before the segfault.  -A5G comes closest, to 30 days,
> when coupled with -H1G.  It's not clear to me how to work -A and -H
> together.
> I'll work with Simon to investigate the runtime, but would welcome any
> ideas on further speeding up cafe4.

An update on this: with the help of Alex I tracked down the problem (an 
integer overflow bug in GHC's memory allocator), and his program now 
runs to completion.

This is the largest program (in terms of memory requirements) I've ever 
seen anyone run using GHC.  In fact there was no machine in our building 
capable of running it, I had to fire up the largest Amazon EC2 instance 
available (68GB) to debug it - this bug cost me $26.  Here are the stats 
from the working program:

  392,908,177,040 bytes allocated in the heap
  174,455,211,920 bytes copied during GC
   24,151,940,568 bytes maximum residency (6 sample(s))
   36,857,590,520 bytes maximum slop
            64029 MB total memory in use (1000 MB lost due to fragmentation)

   Generation 0:    62 collections,     0 parallel, 352.35s, 357.13s elapsed
   Generation 1:     6 collections,     0 parallel, 180.63s, 209.19s elapsed

   INIT  time    0.00s  (  0.11s elapsed)
   MUT   time  1201.47s  (1294.29s elapsed)
   GC    time  532.98s  (566.33s elapsed)
   EXIT  time    0.00s  (  5.34s elapsed)
   Total time  1734.46s  (1860.74s elapsed)

   %GC time      30.7%  (30.4% elapsed)

   Alloc rate    327,020,156 bytes per MUT second

   Productivity  69.3% of total user, 64.6% of total elapsed

The slop calculation is off a bit, because slop for pinned objects 
(ByteStrings) isn't being calculated properly, I should really fix that.


