[Haskell-beginners] randomize the order of a list

Thu Sep 2 09:34:07 EDT 2010

I must apologize : a part of my post about quicksort didn't make sense
: if one choose the pivot position randomly, elements shouldn't be
splitted with even probability, because there would be no control over
the size of the results list.

If I understand it correctly, your solution doesn't pick a pivot
position, but dynamically adapt list size probabilities during
traversal.

I have a different solution, that pick a pivot position, then choose
the elements with carefully weighted probability to get the right
left-hand and right-hand sizes. The key idea comes from your analysis
(more precisely, from the presence of the n `over` k probabilities) :
for a given k (the pivot), choose a random subset of k elements as the
left-hand side of the pivot.

import Random (Random, StdGen, randomRIO)
import Control.Monad (liftM)

quickshuffle :: [a] -> IO [a]
quickshuffle [] = return []
quickshuffle [x] = return [x]
quickshuffle xs = do
             (ls, rs) <- partition xs
             sls <- quickshuffle ls
             srs <- quickshuffle rs
             return (sls ++ srs)

-- The idea is that to partition a list of length n, we choose a pivot
-- position randomly (1 < k < n), then choose a subset of k elements
-- in the list to be on the left side, and the other n-k on the right
-- side.
--
-- To choose a random subset of length k among n, scan the list and
-- add each element with probability
--   (number of elements left to pick) / (number of elements left to scan)
--
partition :: [a] -> IO ([a], [a])
partition xs = do
          let n = length xs
          k <- randomRIO (1, n-1)
          split n k ([], []) xs
  where split n k (ls, rs) [] = return (ls, rs)
        split n k (ls, rs) (x:xs) = do
            p <- randomRIO (1, n)
            if p <= k
               then split (n - 1) (k - 1) (x:ls, rs) xs
               else split (n - 1) k       (ls, x:rs) xs

I have also written an implementation for my former algorithm :

import Data.List (mapAccumL, sortBy)
import System.Random (RandomGen, split, randoms)
import Data.Ord (Ordering)
import Data.Function (on)
-- compare two real numbers as infinite sequences of booleans
real_cmp :: [Bool] -> [Bool] -> Ordering
real_cmp (True:_) (False:_) = LT
real_cmp (False:_) (True:_) = GT
real_cmp (_:xs) (_:ys) = real_cmp xs ys
-- weight each element with a random real number
weight_list :: RandomGen g => g -> [a] -> [([Bool], a)]
weight_list g = snd . mapAccumL weight g
               where weight g x =
                                let (g1, g2) = split g in
                                (g1, (randoms g2, x))
-- shuffle by sorting on weights
shuffle :: RandomGen g => g -> [a] -> [a]
shuffle g = map snd . sort_on_weights . weight_list g
       where sort_on_weights = sortBy (real_cmp `on` fst)

> Interesting question! Adapting quick sort is not that easy, though.
>
> First, you can skip choosing the pivot position because it is already
> entailed by the choices of elements left and right to it.
>
> Second, probability 1/2 won't work, it doesn't give a uniform
> distribution. In order to get that, the remaining input  xs  has to be
> partitioned into two lists  xs = (ls,rs)  such that
>
>      probability that  length ys == k  is   1/(n `over` k)
>
> where
>
>      n `over` k = n! / (k! * (n-k)!)
>
> is the binomial coefficient. After all, calling "quickrandom"
> recursively on each of the two lists will give two permutations with
> probability  1/k!  and  1/(n-k)!  and the probability for a compound
> permutation is
>
>     1/(n `over` k) * 1/k! * 1/(n-k)! = 1/n!
>
> as desired. In contrast, distributing elements with probability 1/2
> would give
>
>     probability that  length ys == k  is  (n `over` k) * 2^(-n)
>
> which would skew the distribution heavily towards permutations where the
> pivot element is in the middle.
>
>
> However, it is possible to divide the list properly, though I haven't
> worked out the exact numbers. The method would be
>
>      divide (x:xs) = do
>           (ls,rs) <- divide xs
>           random  <- uniform (0, 1) :: Random Double
>           if random <= p (length xs) (length ls)
>               then return (x:ls, rs)
>               else return (ls, x:rs)
>
> where the probability  p  of putting the element  x  into the left part
> has to be chosen such that
>
>     1/(n `over` k) =
>         1/(n-1 `over` k-1) * p (n-1) (k-1)
>       + 1/(n-1 `over` k  ) * (1 - p (n-1) k)
>
>
> Regards,
> Heinrich Apfelmus