[Haskell-beginners] randomize the order of a list

Gabriel Scherer gabriel.scherer at gmail.com
Thu Sep 2 09:45:23 EDT 2010


A small remark I forgot.

First, It seems to me that the choice of k need not be random, but
only arbitrary. I believe that any fixed k would work. In particular,
it is interesting to compare this algorithm with k = n/2, and Heinrich
Apfelmus's algorithm based on mergesort (
http://apfelmus.nfshost.com/articles/random-permutations.html ):

- mergeshuffle split the unshuffled list directly, shuffle each part,
then merge them carefully
- quickshuffle split the unshuffled list carefully, shuffle each part,
then merge them directly

mergeshuffle's `merge` and quickshuffle's `split` are very similar
and, in a sense, dual.
`merge` reason on the probability (number of elements in the left list
to merge) / (total number of elements to merge)
`split` reason on the probability (number of elements left to put in
the left list) / (total number of elements left to split)

On Thu, Sep 2, 2010 at 3:34 PM, Gabriel Scherer
<gabriel.scherer at gmail.com> wrote:
> I must apologize : a part of my post about quicksort didn't make sense
> : if one choose the pivot position randomly, elements shouldn't be
> splitted with even probability, because there would be no control over
> the size of the results list.
>
> If I understand it correctly, your solution doesn't pick a pivot
> position, but dynamically adapt list size probabilities during
> traversal.
>
> I have a different solution, that pick a pivot position, then choose
> the elements with carefully weighted probability to get the right
> left-hand and right-hand sizes. The key idea comes from your analysis
> (more precisely, from the presence of the n `over` k probabilities) :
> for a given k (the pivot), choose a random subset of k elements as the
> left-hand side of the pivot.
>
> import Random (Random, StdGen, randomRIO)
> import Control.Monad (liftM)
>
> quickshuffle :: [a] -> IO [a]
> quickshuffle [] = return []
> quickshuffle [x] = return [x]
> quickshuffle xs = do
>             (ls, rs) <- partition xs
>             sls <- quickshuffle ls
>             srs <- quickshuffle rs
>             return (sls ++ srs)
>
> -- The idea is that to partition a list of length n, we choose a pivot
> -- position randomly (1 < k < n), then choose a subset of k elements
> -- in the list to be on the left side, and the other n-k on the right
> -- side.
> --
> -- To choose a random subset of length k among n, scan the list and
> -- add each element with probability
> --   (number of elements left to pick) / (number of elements left to scan)
> --
> partition :: [a] -> IO ([a], [a])
> partition xs = do
>          let n = length xs
>          k <- randomRIO (1, n-1)
>          split n k ([], []) xs
>  where split n k (ls, rs) [] = return (ls, rs)
>        split n k (ls, rs) (x:xs) = do
>            p <- randomRIO (1, n)
>            if p <= k
>               then split (n - 1) (k - 1) (x:ls, rs) xs
>               else split (n - 1) k       (ls, x:rs) xs
>
>
>
> I have also written an implementation for my former algorithm :
>
> import Data.List (mapAccumL, sortBy)
> import System.Random (RandomGen, split, randoms)
> import Data.Ord (Ordering)
> import Data.Function (on)
> -- compare two real numbers as infinite sequences of booleans
> real_cmp :: [Bool] -> [Bool] -> Ordering
> real_cmp (True:_) (False:_) = LT
> real_cmp (False:_) (True:_) = GT
> real_cmp (_:xs) (_:ys) = real_cmp xs ys
> -- weight each element with a random real number
> weight_list :: RandomGen g => g -> [a] -> [([Bool], a)]
> weight_list g = snd . mapAccumL weight g
>                where weight g x =
>                                 let (g1, g2) = split g in
>                                 (g1, (randoms g2, x))
> -- shuffle by sorting on weights
> shuffle :: RandomGen g => g -> [a] -> [a]
> shuffle g = map snd . sort_on_weights . weight_list g
>        where sort_on_weights = sortBy (real_cmp `on` fst)
>
>> Interesting question! Adapting quick sort is not that easy, though.
>>
>> First, you can skip choosing the pivot position because it is already
>> entailed by the choices of elements left and right to it.
>>
>> Second, probability 1/2 won't work, it doesn't give a uniform
>> distribution. In order to get that, the remaining input  xs  has to be
>> partitioned into two lists  xs = (ls,rs)  such that
>>
>>      probability that  length ys == k  is   1/(n `over` k)
>>
>> where
>>
>>      n `over` k = n! / (k! * (n-k)!)
>>
>> is the binomial coefficient. After all, calling "quickrandom"
>> recursively on each of the two lists will give two permutations with
>> probability  1/k!  and  1/(n-k)!  and the probability for a compound
>> permutation is
>>
>>     1/(n `over` k) * 1/k! * 1/(n-k)! = 1/n!
>>
>> as desired. In contrast, distributing elements with probability 1/2
>> would give
>>
>>     probability that  length ys == k  is  (n `over` k) * 2^(-n)
>>
>> which would skew the distribution heavily towards permutations where the
>> pivot element is in the middle.
>>
>>
>> However, it is possible to divide the list properly, though I haven't
>> worked out the exact numbers. The method would be
>>
>>      divide (x:xs) = do
>>           (ls,rs) <- divide xs
>>           random  <- uniform (0, 1) :: Random Double
>>           if random <= p (length xs) (length ls)
>>               then return (x:ls, rs)
>>               else return (ls, x:rs)
>>
>> where the probability  p  of putting the element  x  into the left part
>> has to be chosen such that
>>
>>     1/(n `over` k) =
>>         1/(n-1 `over` k-1) * p (n-1) (k-1)
>>       + 1/(n-1 `over` k  ) * (1 - p (n-1) k)
>>
>>
>> Regards,
>> Heinrich Apfelmus
>


More information about the Beginners mailing list