[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure
wren ng thornton
wren at freegeek.org
Tue Jun 15 16:18:16 EDT 2010
braver wrote:
> On Jun 14, 11:40 am, Don Stewart <d... at galois.com> wrote:
>> Oh, you'll want insertWith'.
>>
>> You might also consider bytestring-trie for the Graph, and IntMap for
>> the AdJList ?
>
> Yeah, I saw jsonb using Trie and thought there's a reason for it. But
> it's very API-poor compared with Map, e.g. there's not even a fold --
> should one toListBy first?
I find that surprising. Have you looked in Data.Trie.Convenience? The
API of Data.Map is rather bloated so I've pushed most of it out of the
main module in order to clean things up. There are only a small number
of functions in the Data.Map interface I haven't had a chance to
implement yet.
For folding, the `foldMap`, `foldr`, and `foldl` functions are provided
via the Data.Foldable interface. The Data.Traversable class is also
implemented if you need to make changes to the trie along the way. These
all give generic folding over the values stored in the trie. If you need
access to the keys during folding you can use `foldrWithKey`, though it
has to reconstruct the keys, which doesn't sound good for your use case.
`toListBy` is a convenience wrapper around `foldrWithKey` which supports
list fusion, so it has the same advantages and disadvantages compered to
the Foldable/Traversable functions.
If there's a particular function you still need, let me know and I can
add an implementation for it.
In terms of optimizing your code, one thing you'll surely want to do is
to construct an intern table (Trie Int, IntMap ByteString) so that you
only have to deal with Ints internally rather than ByteStrings. I
haven't looked at your code yet to see how this would fit in, but it's
almost always a requisite trick for handling large text corpora.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list