powerset
Graham Klyne
GK@ninebynine.org
Wed, 04 Jun 2003 21:04:35 +0100
At 14:00 04/06/03 +0100, Keith Wansbrough wrote:
>Looks fine, but you're right - the use of "combinations" is
>inefficient. It's better to iterate over the elements, rather than
>the length. Then as you add each element, you consider all the sets
>you have so far, and either add the new element to each or not. This
>doubles the number of sets. Hence:
>
>powerset :: [a] -> [[a]]
>powerset [] = [[]]
>powerset (x:xs) = xss ++ map (x:) xss
> where xss = powerset xs
Neat!
It happens that for the application I have in mind, it is important to
generate shorter subsets first, as the required results are almost always
going to come from smaller subsets of a possibly large base set, and I'm
aiming to exploit lazy evaluation to keep things tractable.
Looking at your code, I'm wondering if it would be possible to use some
alternate form of composition instead of ++, and some auxiliary functions
to pull the results out in the short-to-long sequence.
I'm thinking in terms of building a list of trees..
[[
data NTree a = NTree { nthead::a, ntbody::[NTree a] }
instance Functor NTree where
fmap f (NTree h ts) = NTree (f h) (map (fmap f) ts)
powerset1 :: [a] -> [NTree [a]]
powerset1 (x:xs) = (NTree [x] (map (fmap (x:)) xss)) : xss
where xss = powerset1 xs
powerset1 [] = []
listPowerset :: [NTree [a]] -> [[a]]
listPowerset [] = []
listPowerset ts = (map nthead ts) ++
listPowerset bodylist
where
bodylist = concat $ filter (not . null) $ map ntbody ts
testN1 = listPowerset $ powerset1 [1,2,3,4]
testN2 = listPowerset $ powerset1 "abcdefgh"
]]
The list/tree structure looks something like this:
[1]
[2] [2,1]
[3] [3,1]
[3,2] [3,2,1]
[4] [4,1]
[4,2] [4,2,1]
[4,3] [4,3,1]
[4,3,2] [4,3,2,1]
etc.
The list function picks off the members by columns (w.r.t. to above diag)
>Notice how important it is to include the empty set in the set of
>subsets - it won't work at all if you omit it.
Yes, I noticed something similar in my original version.
I've chosen not include the empty subset in my results, but that's easily
adjusted.
>This formulation is particularly nice because in memory, you *share*
>all of the lists from the previous iteration, rather than making
>copies.
I *think* my revised formulation achieves this.
[...]
>My solution isn't perfect, though - the use of append (++) is
>inefficient; if this could be avoided, it would be faster.
I didn't see any easy way to avoid (++) in my list readout, but I think I
can claim that the length of the leading list if never more than O(log N)
the tree size.
If I'm evaluating and using the list lazily, using a typical recursive
traversal pattern (like the powerset function itself), is there any cause
for the "++" to actually be evaluated? I suspect not, but can't be sure.
#g
-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E