Random Permutations

oleg@pobox.com oleg@pobox.com
Thu, 6 Mar 2003 17:39:20 -0800 (PST)


> 1. Generate N random numbers r_k, say, uniform between 0 and 1.
> 2. Form N pairs (1,r_1), (2,r_2), (3, r_3) ... (N,r_n).
> 3. Sort this list/vector wrt the *secon* (random) number.
> 4. In the sorted list the firs values (indices) give you the result.

> This is of course quite general, not restricted to any Haskell
> peculiarities.

I'm sorry but this algorithm does NOT in general provide the perfect
random permutation. Here's an excerpt from

	http://pobox.com/~oleg/ftp/Haskell/perfect-shuffle.txt
that deals extensively with this issue:

A commonly used shuffle algorithm attaches random tags to elements
of a sequence and then sorts the sequence by the tags. Although this
algorithm performs well in practice, it does not achieve the perfect
shuffling. 
 
Let us consider the simplest example (which happens to be the worst
case): a sequence of two elements, [a b]. According to the
shuffle-by-random-keys algorithm, we pick two binary random numbers,
and associate the first number with the 'a' element, and the second
number with the 'b' element. The two numbers are the tags (keys) for
the elements of the sequence. We then sort the keyed sequence in the
ascending order of the keys. We assume a stable sort algorithm. There
are only 4 possible combinations of two binary random
numbers. Therefore, there are only four possible tagged sequences:
        [(0,a) (0,b)]
        [(0,a) (1,b)]
        [(1,a) (0,b)]
        [(1,a) (1,b)]
After the sorting, the sequences become:
        [(0,a) (0,b)]
        [(0,a) (1,b)]
        [(0,b) (1,a)]
        [(1,a) (1,b)]
As we can see, after the sorting, the sequence [a, b] is three times
more likely than the sequence [b, a]. That can't be a perfect shuffle.

Furthermore, if we have a sequence of N elements and associate with
each element a key -- a random number uniformly distributed within [0,
M-1] (where N!>M>=N), we have the configurational space of size M^N
(i.e., M^N ways to key the sequence). There are N! possible
permutations of the sequence. Alas, for N>2 and M<N!, M^N is not
evenly divisible by N!. Therefore, certain permutations are bound to
be a bit more likely than the others.


The referred article presents two algorithms -- a naive one and a more
advanced one. The latter has a complexity O(n log(N)) and is _proven_
to give the perfect shuffling. The complexity is also proven.