[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

Claus Reinke claus.reinke at talk21.com
Mon Jun 28 15:27:04 EDT 2010

> Claus -- cafe5 is pretty much where it's at.  You're right, the proggy
> was used as the bug finder, actually at cafe3, still using ByteString.

It would be useful to have a really tiny data source - no more than 
100 entries per Map should be sufficient to confirm or reject hunches 
about potential leaks by profiling. As it stands, my poor old laptop 
with a 32bit GHC won't be much use with your sample data, and 
now that the GHC bug is fixed, the size of the samples would only 
hide the interesting aspects (from a profiling perspective).
> Having translated it from Clojure to Haskell to OCaml, 

Translating quickly between strict-by-default and non-strict-by-default
languages is always a warning sign: not only is it unlikely to make
best use of each language's strengths, but typical patterns in one
class of languages simply don't translate directly into the other.

> I'm now debugging the logic and perhaps the conceptual 
> data structures.  Then better maps will be tried.  

No matter what Maps you try, if they are strict in keys and 
non-strict in values, translating code from strict language
needs careful inspection. Most of the higher-order functions
in Maps have issues here (eg, repeated use of insertWith
is going to build up unevaluated thunks, and so on). I'm
not even sure how well binary fares with nested IntMaps
(not to mention the occasional "too few bytes" error 
depending on strictness or package version - it would be 
useful to have a cabal file, or a README listing the versions 
of libraries you used).

To binary package users/authors: is there a typed version 
of binary (that is, one that records and checks a representation 
of the serialized type before actual (de-)serialization)? It
would be nice to have such a type check, even though it
wouldn't protect against missing bytes or strictness changes. 

> Then a giant shootout will ensue, now that
> Haskell finishes!  I'll post here when it's ready.

Just make sure Haskell isn't running with brakes on!-)



More information about the Haskell-Cafe mailing list