Linear time IntSet / IntMap construction from sorted input

Scott Dillard sedillard at ucdavis.edu
Tue May 20 19:24:21 EDT 2008


I've gone ahead and made a patch, which is attached. I don't have
access to the full test suite, but it passes the QuickCheck test for
just the prop_Ordered property, which is no longer a tautology. Who
wrote that test anyway? :)

Scott


On Mon, May 19, 2008 at 5:43 PM, Scott Dillard <sedillard at ucdavis.edu> wrote:
> Hi,
>
> It's always bugged me that IntSet.fromAscList is an alias for
> fromList, which is basically (foldr insert), even though a sorted list
> is (almost) already in tree order. Is there a reason for this? Or was
> it just that fromList is good enough? (which it is.)
>
> Assuming the latter, and somewhat bored, I wrote a function to build
> an IntSet from sorted input in linear time. This goes in
> Data.IntSet.hs :
>
>
>
> data Stack = Push {-# UNPACK #-} !Prefix !IntSet !Stack | Nada
>
> fromAsc xs = make Nada xs
>  where
>    make stk []      = finish stk
>    make Nada (z:zs) = make (Push z (Tip z) Nada) zs
>    make stk@(Push _ y Nada) (z:zs)  = make (Push z (Tip z) stk) zs
>    make stk@(Push py y stk') zs@(z:zs') =
>      make (Push z (Tip z) (reduce (branchMask py z) stk)) zs'
>
>    reduce _ stk@(Push _ x Nada) = stk
>    reduce m stk@((Push px x (Push py y stk'))) =
>      let !mxy = branchMask px py
>          !pxy = mask px mxy
>      in  if shorter m mxy
>            then reduce m (Push pxy (Bin pxy mxy y x) stk')
>            else stk
>
>    finish Nada = Nil
>    finish (Push _ x Nada) = x
>    finish (Push px x (Push py y stk)) = finish (Push p (join px x py y) stk)
>      where m  = branchMask px py
>            p  = mask px m
>
>
> Every trie node is allocated exactly once, and so the burden on the
> heap is significantly less, even if its only a little bit faster. I
> tested it by building a list from -n to n for big n, and computing the
> sum. It alloc's 50% less and the MUT time is like 40% less, but the GC
> time is more (why would that be?) so it runs in maybe 7/8ths the time,
> versus fromList.
>
> I have another, prettier version that uses [] instead of this hacky
> Stack type, but its slower, breaking even with fromList on speed,
> still allocating much less.
>
> Is this something you guys feel would be worth including? Should I
> make a proper patch? One could even use this to write an O(n)
> mapKeysMonotonic, bringing the IntMap and Map APIs further into
> alignment. Or is 8/7x speedup not worth the extra code weight?
>
> Scott
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fromAscList.patch
Type: text/x-diff
Size: 71718 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/libraries/attachments/20080520/81c3e37c/fromAscList-0001.bin


More information about the Libraries mailing list