[Haskell-beginners] randomize the order of a list
Gabriel Scherer
gabriel.scherer at gmail.com
Thu Sep 2 09:34:07 EDT 2010
I must apologize : a part of my post about quicksort didn't make sense
: if one choose the pivot position randomly, elements shouldn't be
splitted with even probability, because there would be no control over
the size of the results list.
If I understand it correctly, your solution doesn't pick a pivot
position, but dynamically adapt list size probabilities during
traversal.
I have a different solution, that pick a pivot position, then choose
the elements with carefully weighted probability to get the right
left-hand and right-hand sizes. The key idea comes from your analysis
(more precisely, from the presence of the n `over` k probabilities) :
for a given k (the pivot), choose a random subset of k elements as the
left-hand side of the pivot.
import Random (Random, StdGen, randomRIO)
import Control.Monad (liftM)
quickshuffle :: [a] -> IO [a]
quickshuffle [] = return []
quickshuffle [x] = return [x]
quickshuffle xs = do
(ls, rs) <- partition xs
sls <- quickshuffle ls
srs <- quickshuffle rs
return (sls ++ srs)
-- The idea is that to partition a list of length n, we choose a pivot
-- position randomly (1 < k < n), then choose a subset of k elements
-- in the list to be on the left side, and the other n-k on the right
-- side.
--
-- To choose a random subset of length k among n, scan the list and
-- add each element with probability
-- (number of elements left to pick) / (number of elements left to scan)
--
partition :: [a] -> IO ([a], [a])
partition xs = do
let n = length xs
k <- randomRIO (1, n-1)
split n k ([], []) xs
where split n k (ls, rs) [] = return (ls, rs)
split n k (ls, rs) (x:xs) = do
p <- randomRIO (1, n)
if p <= k
then split (n - 1) (k - 1) (x:ls, rs) xs
else split (n - 1) k (ls, x:rs) xs
I have also written an implementation for my former algorithm :
import Data.List (mapAccumL, sortBy)
import System.Random (RandomGen, split, randoms)
import Data.Ord (Ordering)
import Data.Function (on)
-- compare two real numbers as infinite sequences of booleans
real_cmp :: [Bool] -> [Bool] -> Ordering
real_cmp (True:_) (False:_) = LT
real_cmp (False:_) (True:_) = GT
real_cmp (_:xs) (_:ys) = real_cmp xs ys
-- weight each element with a random real number
weight_list :: RandomGen g => g -> [a] -> [([Bool], a)]
weight_list g = snd . mapAccumL weight g
where weight g x =
let (g1, g2) = split g in
(g1, (randoms g2, x))
-- shuffle by sorting on weights
shuffle :: RandomGen g => g -> [a] -> [a]
shuffle g = map snd . sort_on_weights . weight_list g
where sort_on_weights = sortBy (real_cmp `on` fst)
> Interesting question! Adapting quick sort is not that easy, though.
>
> First, you can skip choosing the pivot position because it is already
> entailed by the choices of elements left and right to it.
>
> Second, probability 1/2 won't work, it doesn't give a uniform
> distribution. In order to get that, the remaining input xs has to be
> partitioned into two lists xs = (ls,rs) such that
>
> probability that length ys == k is 1/(n `over` k)
>
> where
>
> n `over` k = n! / (k! * (n-k)!)
>
> is the binomial coefficient. After all, calling "quickrandom"
> recursively on each of the two lists will give two permutations with
> probability 1/k! and 1/(n-k)! and the probability for a compound
> permutation is
>
> 1/(n `over` k) * 1/k! * 1/(n-k)! = 1/n!
>
> as desired. In contrast, distributing elements with probability 1/2
> would give
>
> probability that length ys == k is (n `over` k) * 2^(-n)
>
> which would skew the distribution heavily towards permutations where the
> pivot element is in the middle.
>
>
> However, it is possible to divide the list properly, though I haven't
> worked out the exact numbers. The method would be
>
> divide (x:xs) = do
> (ls,rs) <- divide xs
> random <- uniform (0, 1) :: Random Double
> if random <= p (length xs) (length ls)
> then return (x:ls, rs)
> else return (ls, x:rs)
>
> where the probability p of putting the element x into the left part
> has to be chosen such that
>
> 1/(n `over` k) =
> 1/(n-1 `over` k-1) * p (n-1) (k-1)
> + 1/(n-1 `over` k ) * (1 - p (n-1) k)
>
>
> Regards,
> Heinrich Apfelmus
More information about the Beginners
mailing list