DiffArray Performance

Mon Oct 27 18:21:20 EST 2003

On Monday 27 Oct 2003 3:17 pm, Alastair Reid wrote:
> You don't mention _how_ you use diffarrays so I'll guess you're keeping the
> array in ascending order, finding the insertion point by binary search and
> inserting by copying bigger elements one place to the right.

> Using binary trees (the code you copied), the cost of N insertions will be
> O(N * log N) assuming the input is in random order.

Thanks for your reply. Maybe I'm being dense, but I'm affraid I don't
understand the relevance of your answer to my problem :-(

I guess I should have been a bit clearer in my explanation of what the
code is doing. Here's the code again..

> -- Search Tree data type
> newtype STree = STree (Array Int (STree,[Match]))
> -- Initial value for Search Tree
> sTree0 :: STree
> sTree0 = STree (array (0,9) [(n,(sTree0,[]))| n <- [0..9]])
>
> -- Make the search tree from a list of words
> makeSTree :: [String] -> STree
> makeSTree ws = foldl' putWord sTree0 pairs where
> pairs = [let ps = packString w in ps `seq` (word2keys w, MatchW ps) | w<-ws]
> word2keys cs = [getKey (toUpper c) | c <- cs, c /= '"' , c /= '-' ]
> putWord stree (keys,m) = put keys stree
>     where put []     _         = error "makeSTree: empty Keys"
>           put [k]    (STree a) = let (t,ms) = a ! k
>                                  in STree (a // [(k,(t,m:ms))])
>           put (k:ks) (STree a) = let (t,ms) = a ! k
>                                      t'     = put ks t
>                                  in t' `seq` STree (a // [(k,(t',ms))])

This generates a 10 way (Decary?) search tree, each vertex of which is an
ordinary Haskell98 Array of 10 elements. Each edge is annotated by any
string which encode a particular sequence of decimal digits (keys), which
give the path to the vertex in question.

To find any words which encode a particular sequence of digits the
corresponding search function (not included above) just looks down the
relevant branches (1 for each digit), until no more digits are left.
Whatever annotations are on the last edge are the words that encode
the digit sequence (if any). The search is very fast, it's the
construction of the search tree in the first place that seems a bit
slow (extremely slow using DiffArrays).

The test used DiffArray instead of a Haskell98 array..

> -- Search Tree data type
> newtype STree = STree (DiffArray Int (STree,[Match]))

The code is otherwise identical, so any difference in execution time
must be caused the difference between reading/writing the respective
arrays. I wouldn't expect them to be identical but for DiffArrays
to be over 100 times slower seems a bit strange (especially for
a relatively small array of 10 elements). That's an O(n^2) difference
somewhere (neglecting any constant factors).

I'm just wondering if there's a bug in their implementation?
or am I using them incorrectly? and generally seeking advice
about any faster arrays I could try.

Regards
--
Adrian Hey