Random Permutations

Fri, 07 Mar 2003 08:51:27 +0100

oleg@pobox.com comments my suggestion:

>>1. Generate N random numbers r_k, say, uniform between 0 and 1.
>>2. Form N pairs (1,r_1), (2,r_2), (3, r_3) ... (N,r_n).
>>3. Sort this list/vector wrt the *second* (random) number.
>>4. In the sorted list the first values (indices) give you the result.

> I'm sorry but this algorithm does NOT in general provide the perfect
> random permutation. Here's an excerpt from
> 
> 	http://pobox.com/~oleg/ftp/Haskell/perfect-shuffle.txt
> that deals extensively with this issue:

Oleg, I have read that, I know also the comments here:
http://www.nist.gov/dads/HTML/perfectShuffle.html
This is a known issue, and I am not *so* ignorant.

> Let us consider the simplest example (which happens to be the worst
> case): a sequence of two elements, [a b]. According to the
> shuffle-by-random-keys algorithm, we pick two binary random numbers,
> and associate the first number with the 'a' element, and the second
> number with the 'b' element. The two numbers are the tags (keys) for
> the elements of the sequence. We then sort the keyed sequence in the
> ascending order of the keys. We assume a stable sort algorithm. There
> are only 4 possible combinations of two binary random
> numbers. Therefore, there are only four possible tagged sequences:
>         [(0,a) (0,b)]
>         [(0,a) (1,b)]
>         [(1,a) (0,b)]
>         [(1,a) (1,b)]
...

In this context the generator of the random tags should be a little
more serious than choosing randomly two binary digits, don't you think?
Your criticism is perfectly valid, as most hair-splitting objections
are, but it is really not practical.

> Furthermore, if we have a sequence of N elements and associate with
> each element a key -- a random number uniformly distributed within [0,
> M-1] (where N!>M>=N), we have the configurational space of size M^N
> (i.e., M^N ways to key the sequence). There are N! possible
> permutations of the sequence. Alas, for N>2 and M<N!, M^N is not
> evenly divisible by N!. Therefore, certain permutations are bound to
> be a bit more likely than the others.

    I am still waiting to see *any* practical consequences of this
    theoretical conclusion, when N is of order of dozens, hundreds or
    more. Nobody interested in practical generation of those
    permutations would dream of generating *all* of them; if this is
    the case, the exhaustive enumeration would be better.... So,
    the fact that some permutations are slightly more likely than
    others is practically not too meaningful.

The swapping perfect shuffle method is obviously quite fine. Why did I
suggested the sorting method? Because I just glimpsed over the original
query, and I was not sure whether the author of the posting used
vectors or lists. With lists the swapping algorithm becomes inefficient,
as you may clearly see. With mergesort I believe that you get your
permutations faster. Anyway, I will not defend the sorting approach
as something ideal.

Jerzy Karczmarczuk