A different approach to nubOrd (Re: Proposal #2629)
Bart Massey
bart at cs.pdx.edu
Wed Oct 1 04:19:41 EDT 2008
Bart Massey <bart <at> cs.pdx.edu> writes:
> Any suggestions from anyone about how to proceed? Or are we just done at this
> point? So close...
OK, I've got a new version of nubOrd together. This one is just nubOrd, no
nubWith, no nubInt, nothing fancy. There's two variants, nubOrd' and nubOrd''.
The latter gets a slight performance win by omitting some work that would
preserve the order of output with respect to nub.
The basic strategy of nubOrd' is to pull off a prefix l of the input list as
long as the current stoplist s, nubSort l, merge l with s to get a new stoplist
s', use s in a merge step to filter l, then sort the remainder of l back to the
original order. Finally, the resulting l' and s' are prepended to a recursive
call on the rest of the list.
nubOrd' has roughly the same time complexity as the set-based nubOrd did. This
is because each step of nubOrd requires log m time to search the stoplist, and
each collection of steps in nubOrd' requires amortized log m time for the sorts.
Let's do some side by side comparison:
nub nubSort nubOrd nubOrd' nubOrd''
laziness full no full spine spine
worst-case O(mn) O(n log n) O(n log m) O(n log m) O(n log m)
complexity
(m symbols,
length n)
relative 1x 50x 1.1x 5x 5x
runtime with
m = 1
absolute 84s 0.14s 0.16s 0.35s 0.21s
runtime with
m = n = 10^5
dependencies no no Data.Set no no
outside (containers)
Data.List
outputs in yes no yes yes no
order of
first mention
code small tiny smallish wtf wtf-lite
complexity
This table suggests to me that nubOrd' is viable, assuming spine-laziness is
sufficient. It solves the time problems of nub and nubSort, solves the
dependency problem of nubOrd, and preserves the order of its outputs. I don't
like the 5x slowdown for small m, or the non-laziness on elements, but I guess I
am willing to take them to get the other stuff. Maybe someone or the compiler
can do some optimization later.
The code for nubOrd', in all its hideous glory, is at the end of this post.
Comments, corrections and improvements welcome.
Bart Massey
bart <at> cs.pdx.edu
merge :: Ord e => [e] -> [e] -> [e]
merge l1 [] = l1
merge [] l2 = l2
merge l1@(e1 : e1s) l2@(e2 : e2s)
| e1 < e2 = e1 : merge e1s l2
| e1 > e2 = e2 : merge l1 e2s
| otherwise = merge l1 e2s
nubOrd' :: Ord e => [e] -> [e]
nubOrd' [] = []
nubOrd' (e : es) = e : go [e] es where
go _ [] = []
go s l = l1' ++ go s' l2 where
(l1, l2) = splitAt (length s) l
curl = filterDups $ sort $ zip l1 ([1..] :: [Integer])
s' = merge s (map fst curl)
l1' = map fst $ sortBy flipcmp $ stopMerge s curl
(a, b) `flipcmp` (c, d) = (b, a) `compare` (d, c)
filterDups [] = []
filterDups (s1@(e1, _) : (e2, _) : ss) | e1 == e2 = filterDups (s1 : ss)
filterDups (s1 : ss) = s1 : filterDups ss
stopMerge _ [] = []
stopMerge [] m2 = m2
stopMerge m1@(e1 : e1s) m2@(s2@(e2, _) : s2s)
| e1 < e2 = stopMerge e1s m2
| e1 > e2 = s2 : stopMerge m1 s2s
| otherwise = stopMerge m1 s2s
More information about the Libraries
mailing list