[Haskell-cafe] Layered maps
rpglover64 at gmail.com
Sat Oct 9 02:02:05 EDT 2010
This came up as I was doing homework for natural language processing.
I'm constructing a trigram model from training data, but I also need the
bigram and unigram counts. I could, for each triple of characters, add the
3 tuple to a trigram map and increment its count (I know I'm not actually
mutating anything), add the last two letters to a bigram map, and add the
last character to a unigram map.
But this looks very much like a tree... each node contains the character
(the key) and list of characters that can appear after it. (I think this is
a trie, but I don't know very much about tries.) The problem is that lookup
is more horribly inefficient than I can stand, if I use lists.
As for what constitutes convenient, I haven't thought about it that much; I
just noted that my naive attempt was full of boilerplate and decided that
since there was already python code provided I used that. I'm planning to
port the code *after* I have the assignment finished.
On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton <wren at freegeek.org> wrote:
> On 10/8/10 5:46 PM, Thomas DuBuisson wrote:
>> The containers library can do this already - there are no constraints
>> on the elements of a Map. For example:
>> type TripleNestedMap a = Map Int (Map Char (Map String a))
>> But this is rather silly as you can just do:
>> type MapOfTriples a = Map (Int ,Char, String) a
>> for most uses.
> However, Map is a lot less efficient than IntMap when dealing with Ints.
> And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ...
> <=< lookup n) instead of just one lookup. Unfortunately, when you're
> interested in performance issues, the standard tricks for implementing
> polyvariadic functions aren't very useful.
> FWIW, the monadic combinators are usually sufficient to create your own
> functions legiblely (e.g., using (<=<) for lookup), but it's still a lot
> noiser than it could be--- especially if you want a trie instead of a
> product map, since you have to add fst and snd everywhere. I've been playing
> around with some ad-hoc tries like these a lot lately, both for HMM tagging
> and for hunting down performance issues in bytestring-trie.
> Live well,
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe