[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

Tue Jun 15 15:07:55 EDT 2010

deliverable:
> > If you just want to optimize it and not compare exactly equal idiomatic code,
> > you should stop using functional data structures and use a structure that fits
> > your problem (the ST monad has been designed for that in Haskell), because
> > compilers do not detect single-threaded usage and rewrite all your code to
> > something mutable and thereby avoid O(log n) costs.
> >
> > In practice it is probably easier to write the whole thing against the parallel
> > Boost Graph Library (a C++ library), since that library provides the
> > abstractions that you would want. If you go this path, it will probably end up
> > being 10-100 times faster.
> 
> Surely I can rewrite it in C++ or just C and gain speedup, but not
> maintainability and ease of prototyping algorithms -- the main reasons
> I switched to FP, with added reliability, conciseness, reasonable
> syntax bonuses.  It turned out Clojure is quite capable of doing it
> well, but has Null Pointer Exceptions and "debug data shape by
> running" annoying to an ML programmer; trying Haskell for speedup is
> worth it!  It proves less predictable than expected -- e.g. with OCaml
> if it typechecks and runs, it usually runs fast.  The point of this
> exercise is to identify typical performance bottlenecks in the domain
> and see how they can be avoided.  The jury's still out on this one.

Following the algorithm for performance tuning in the previous email
should be enough. Have you obtained heap profiles yet?

-- Don