[Haskell-beginners] Re: randomize the order of a list
Heinrich Apfelmus
apfelmus at quantentunnel.de
Wed Sep 8 07:55:17 EDT 2010
Gabriel Scherer wrote:
> I must apologize : a part of my post about quicksort didn't make sense
> : if one choose the pivot position randomly, elements shouldn't be
> splitted with even probability, because there would be no control over
> the size of the results list.
No worries, my statement about the probability of the length of the left
part being k doesn't make sense, either, since the probabilities 1/(n
`over` k) don't even add up to a total of one. What is clear is that
there is no *a priori* reason that 1/2 should work.
> If I understand it correctly, your solution doesn't pick a pivot
> position, but dynamically adapt list size probabilities during
> traversal.
Yes. Note, however, that I intended to pick a pivot *element*, i.e. an
element which, just like in ordinary quick sort, will not be permuted
subsequently. This is subtly different from a pivot *position* that
divides the list into two parts but has no element associated to it. I
think this is important when trying to analyze the naive 1/2 scheme, but
it's immaterial in your proposal:
> I have a different solution, that pick a pivot position, then choose
> the elements with carefully weighted probability to get the right
> left-hand and right-hand sizes. The key idea comes from your analysis
> (more precisely, from the presence of the n `over` k probabilities) :
> for a given k (the pivot), choose a random subset of k elements as the
> left-hand side of the pivot.
>
> import Random (Random, StdGen, randomRIO)
> import Control.Monad (liftM)
>
> quickshuffle :: [a] -> IO [a]
> quickshuffle [] = return []
> quickshuffle [x] = return [x]
> quickshuffle xs = do
> (ls, rs) <- partition xs
> sls <- quickshuffle ls
> srs <- quickshuffle rs
> return (sls ++ srs)
>
> -- The idea is that to partition a list of length n, we choose a pivot
> -- position randomly (1 < k < n), then choose a subset of k elements
> -- in the list to be on the left side, and the other n-k on the right
> -- side.
> --
> -- To choose a random subset of length k among n, scan the list and
> -- add each element with probability
> -- (number of elements left to pick) / (number of elements left to scan)
> --
> partition :: [a] -> IO ([a], [a])
> partition xs = do
> let n = length xs
> k <- randomRIO (1, n-1)
> split n k ([], []) xs
> where split n k (ls, rs) [] = return (ls, rs)
> split n k (ls, rs) (x:xs) = do
> p <- randomRIO (1, n)
> if p <= k
> then split (n - 1) (k - 1) (x:ls, rs) xs
> else split (n - 1) k (ls, x:rs) xs
Yes, this algorithm should work. Of course, while the probability
(number of elements left to pick) / (number of elements left to scan)
for picking an element x seems to be the only sensible choice, one still
has to prove that this gives a uniform distribution. (Embarrassingly, my
article about "merge shuffle" lacks a proof, too, I plan to rewrite it
at some point.)
Proving uniformity proceeds in two steps:
First, we have to argue that picking the first k elements uniformly
and then permuting them does give a uniform *total* permutation. This is
fairly obvious, though. Namely, the set of possible permutations of n
elements can be partitioned into (n `over` k) classes, where two
permutations belong to the same class if they have the same first k
elements (in any order). For instance,
k = 3, n = 5
[1,2,3,5,4] is the same class as [3,2,1,4,5]
because the first k elements are {1,2,3}
[1,2,3,5,4] is in a different class than [4,2,1,3,5]
because the first k elements are {1,2,3} and {1,2,4} resp.
Now, to pick a random permutation, we first pick a class at random and
then pick a permutation from this class. The point is that for reasons
of symmetry, all classes have the same size! Namely, there are k! *
(n-k)! permutations in every class. That means we should pick the class
with uniform probability, and that's exactly what the split function
intends to do.
Second, we have to argue that split does indeed pick a class (= a set
of k elements) *uniformly*. The reasoning for that is as follows:
Consider a particular element x . There are
(n-1 `over` k-1) classes that contain x
(n-1 `over` k ) classes that don't contain x
(This correctly adds up to (n `over` k) classes in total). Hence, the
probability that the class contains x is
classes with x / total classes
= (n-1 `over` k-1) / (n `over` k)
= ( (n-1)! / ((k-1)!(n-k)!) ) / ( n! / (k!(n-k)!) )
= (n-1)! / n! * k! / (k-1)!
= k / n
= elements left to pick / elements left to scan
This shows that split does indeed use the correct probability for
including x .
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Beginners
mailing list