[Haskell] Re: performance tuning Data.FiniteMap
oleg at pobox.com
oleg at pobox.com
Tue Mar 2 23:00:17 EST 2004
[BTW, should we move to Haskell-Cafe?]
> Because updates are not so infrequent that I want to pay the cost of
> replicating the entire array every update (or every ten!). I'm
> willing to exchange *some* read time for faster update. Also, because
> small array copies may be sufficiently faster than tree traversals
> that I may pay very little extra for faster reads.
> FYI, my current code looks like this:
I'm afraid I'm somewhat confused. The hash-table related code makes
the copy of the whole array every purgatory-size times. Thus, given the
sequence of 'n' unique inserts (which don't trigger the
major_rebuild), at most 2*n/|purgatory| elements will be moved.
As I understand your code, in particular,
> insert (ArrMap proto toBase ht) key elt = ArrMap proto toBase newHT
> where newHT= insert' proto ht (toBase key) elt
> insert' _ (HT x _)  = HT x
> insert' proto (HT Nothing y) key = insert' proto (HT (Just proto) y) key
> insert' p (HT (Just ar) y) (k:ey) = \val -> HT (Just $ newArray val) y
> where newArray val = ar//[(k,insert' p (ar!k) ey val)]
you make a copy of an array of the size |base| |key| times. _If_ the
tree is kept balanced and filled, then the sequence of n inserts will
copy (log n)/(log |base|)*|base| elements. For small n and large
|base|, that can be a lot. For example,
> testMap=newMap (chr 0) (chr 255) id
> main = do print $ lookup (insert testMap "abc" (Just "def")) "abc"
involves copying a 256-element array three times. Right?
I guess we have come to the point where we really need to know the
distribution of reads and writes, the length of the key (and if it is
bounded), and the distribution of key values. We must also be sure of
the cost basis. So far, we have concentrated only on the traversal
through and moving of elements as the function of the size of the
map. This is clearly not sufficient, as Andrew Bromage pointed out.
More information about the Haskell