# mkSet on sorted lists

Tomasz Zielonka t.zielonka at students.mimuw.edu.pl
Sun Jan 11 22:15:56 EST 2004

On Wed, Jan 07, 2004 at 07:30:35PM +0100, Tomasz Zielonka wrote:
> On Wed, Jan 07, 2004 at 05:41:04PM +0100, Wolfgang Jeltsch wrote:
> > Hello,
> >
> > does Data.Set.mkSet run in linear time when applied to a sorted list?
>
> No, at least not in the version from GHC 6.0. This implementation is
> based on FiniteMap. mkSet translates to listToFM, which incrementally
> constructs the result map using foldl on the input list.
>
> Hmmm, I may try to fix it in a day or two... it shouldn't be too
> difficult.

Well, implementing the algorithm wasn't difficult, but keeping memory
usage low was quite tricky. In the first version the 'build' function
returned a pair with a tree and the rest of the input list. The cost
of this was so high that the linear version was still slower than
traditional listToFM for lists with 100000 elements.

I checked that using unboxed tuples helps, but I wasn't satisfied with
such solution. In the end I found that transforming the function to use
continuation passing style gives similar performance. Now the speed gain
is easily noticable (3x on a 100000 element list and a couple of
lookups).

The code given below won't immediately affect mkSet. This is only a
prototype, so the name and signature are probably not the best.

The function could also cope with non-decreasing lists by omitting
duplicated keys, but I didn't wanted to make it too complex yet.

listToFM could work as follows - take the sorted prefix of the list,
build a tree for this prefix in linear time and then fall back to
inserting elements one by one.

It would be nicer to extract the longest sorted sublist from the list,
but I am afraid that this has Theta(N log N) complexity already.

Well, what do you think about it?
Would you like such improvements to FiniteMap and Set?

Best regards,
Tom

---------------------------------------------------------------------------
-- Building tree from a sorted list

trySortedListToFM :: (Ord key OUTPUTABLE_key)
=> [(key, elt)] -> Maybe (FiniteMap key elt)
trySortedListToFM list
| sorted    = Just (build (length list) list const)
| otherwise = Nothing
where
sorted  = and (zipWith (\(l,_) (r,_) -> l < r) list (tail list))

build 0 xs cont =
cont EmptyFM xs
build 1 ((key, elt) : xs) cont =
cont (unitFM key elt) xs
build n xs cont =
build leftSize xs (\left ((key, elt) : ys) ->
build rightSize ys (\right zs ->
cont (mkBranch 14{-which-} key elt left right) zs))
where
leftSize = n div 2
rightSize = n - leftSize - 1

--
.signature: Too many levels of symbolic links