[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure
Claus Reinke
claus.reinke at talk21.com
Mon Jun 28 15:27:04 EDT 2010
> Claus -- cafe5 is pretty much where it's at. You're right, the proggy
> was used as the bug finder, actually at cafe3, still using ByteString.
It would be useful to have a really tiny data source - no more than
100 entries per Map should be sufficient to confirm or reject hunches
about potential leaks by profiling. As it stands, my poor old laptop
with a 32bit GHC won't be much use with your sample data, and
now that the GHC bug is fixed, the size of the samples would only
hide the interesting aspects (from a profiling perspective).
> Having translated it from Clojure to Haskell to OCaml,
Translating quickly between strict-by-default and non-strict-by-default
languages is always a warning sign: not only is it unlikely to make
best use of each language's strengths, but typical patterns in one
class of languages simply don't translate directly into the other.
> I'm now debugging the logic and perhaps the conceptual
> data structures. Then better maps will be tried.
No matter what Maps you try, if they are strict in keys and
non-strict in values, translating code from strict language
needs careful inspection. Most of the higher-order functions
in Maps have issues here (eg, repeated use of insertWith
is going to build up unevaluated thunks, and so on). I'm
not even sure how well binary fares with nested IntMaps
(not to mention the occasional "too few bytes" error
depending on strictness or package version - it would be
useful to have a cabal file, or a README listing the versions
of libraries you used).
To binary package users/authors: is there a typed version
of binary (that is, one that records and checks a representation
of the serialized type before actual (de-)serialization)? It
would be nice to have such a type check, even though it
wouldn't protect against missing bytes or strictness changes.
> Then a giant shootout will ensue, now that
> Haskell finishes! I'll post here when it's ready.
Just make sure Haskell isn't running with brakes on!-)
Claus
More information about the Haskell-Cafe
mailing list