[Haskell] performance tuning Data.FiniteMap

Tue Feb 24 18:32:23 EST 2004

Ok.  I just looked more carefully at FiniteMap and
the Data.HashTable documentation and coded what I
had incorrectly imagined would be there.  Isn't
the following more efficient than FiniteMap
without requiring the IO Monad?

  -------------------------------------------------------------
  class MaxRange a where maxRange::(a,a)

  data HashTable key elt = HashTable (Maybe (Array key (HashTable key elt))) (Maybe elt)
  emptyHT=HashTable Nothing Nothing

  hLookup (HashTable x y) [] = y
  hLookup (HashTable Nothing _) _ = Nothing
  hLookup (HashTable (Just ar) _) (k:ey) = hLookup (ar!k) ey

  insert (HashTable x _) [] val = HashTable x val
  insert (HashTable Nothing y) (k:ey) val = HashTable (Just initArray) y
	where
	initArray = array maxRange [(x,if x/=k then emptyHT
					else insert emptyHT ey val) |
					x<-[(fst maxRange)..(snd maxRange)]]
  insert (HashTable (Just ar) y) (k:ey) val =
        HashTable (Just $ ar//[(k,insert (ar!k) ey val)]) y

  --support String keys
  instance MaxRange Char where maxRange=(chr 0,chr 255)

  --------------------------------------------------------------

It seems like the depth of the tree and therefore
the speed of lookups is dependent on the size of
maxRange and faster than the repetitive lookups in
FiniteMap.  I don't know how lookups compare to
Data.HashTable.

It seems like updates could be very fast because I
assume // is implemented with a fast memcpy....
(though not as fast as the destructive updates in
Data.HashTable)

Note: I don't know how to avoid the namespace
conflict with GHC.List.lookup so its hLookup.

-Alex-

_________________________________________________________________
S. Alexander Jacobson                  mailto:me at alexjacobson.com
tel:917-770-6565                       http://alexjacobson.com

On Tue, 24 Feb 2004, JP Bernardy wrote:

>
> > I believe FiniteMap works by representing the data
> > in binary trees.  It is therefore O(log2(n)) to
> > read and update.
> >
> > However, if my application reads many more times
> > than it writes, then perhaps I can get a
> > substantial performance boost by increasing the
> > branch factor on the tree.  For example, if each
> > node was an array of 256 elements, reads would be
> > O(log256(n)), a 128x improvement!
>
> Not quite.
>
> In fact, O(log256(n)) is equivalent to O(log2(n)),
> because there is only a constant factor between the
> two. That's why basis of logarithms are usually
> omitted in O() expressions.
>
> Besides, the ratio between log256(n) and log2(n) is
> more like 8 than 128. (And you'd loose this factor
> in searching the right subtree, as Ketil pointed out)
>
> Tuning Data.FiniteMap probably is not what you want.
>
> I don't know, but you can have a look at
> Data.Hashtable.
>
> Just my 2 cents,
> JP.
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail SpamGuard - Read only the mail you want.
> http://antispam.yahoo.com/tools
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell
>