[Haskell-beginners] randomize the order of a list
Gabriel Scherer
gabriel.scherer at gmail.com
Thu Sep 2 09:45:23 EDT 2010
A small remark I forgot.
First, It seems to me that the choice of k need not be random, but
only arbitrary. I believe that any fixed k would work. In particular,
it is interesting to compare this algorithm with k = n/2, and Heinrich
Apfelmus's algorithm based on mergesort (
http://apfelmus.nfshost.com/articles/random-permutations.html ):
- mergeshuffle split the unshuffled list directly, shuffle each part,
then merge them carefully
- quickshuffle split the unshuffled list carefully, shuffle each part,
then merge them directly
mergeshuffle's `merge` and quickshuffle's `split` are very similar
and, in a sense, dual.
`merge` reason on the probability (number of elements in the left list
to merge) / (total number of elements to merge)
`split` reason on the probability (number of elements left to put in
the left list) / (total number of elements left to split)
On Thu, Sep 2, 2010 at 3:34 PM, Gabriel Scherer
<gabriel.scherer at gmail.com> wrote:
> I must apologize : a part of my post about quicksort didn't make sense
> : if one choose the pivot position randomly, elements shouldn't be
> splitted with even probability, because there would be no control over
> the size of the results list.
>
> If I understand it correctly, your solution doesn't pick a pivot
> position, but dynamically adapt list size probabilities during
> traversal.
>
> I have a different solution, that pick a pivot position, then choose
> the elements with carefully weighted probability to get the right
> left-hand and right-hand sizes. The key idea comes from your analysis
> (more precisely, from the presence of the n `over` k probabilities) :
> for a given k (the pivot), choose a random subset of k elements as the
> left-hand side of the pivot.
>
> import Random (Random, StdGen, randomRIO)
> import Control.Monad (liftM)
>
> quickshuffle :: [a] -> IO [a]
> quickshuffle [] = return []
> quickshuffle [x] = return [x]
> quickshuffle xs = do
> (ls, rs) <- partition xs
> sls <- quickshuffle ls
> srs <- quickshuffle rs
> return (sls ++ srs)
>
> -- The idea is that to partition a list of length n, we choose a pivot
> -- position randomly (1 < k < n), then choose a subset of k elements
> -- in the list to be on the left side, and the other n-k on the right
> -- side.
> --
> -- To choose a random subset of length k among n, scan the list and
> -- add each element with probability
> -- (number of elements left to pick) / (number of elements left to scan)
> --
> partition :: [a] -> IO ([a], [a])
> partition xs = do
> let n = length xs
> k <- randomRIO (1, n-1)
> split n k ([], []) xs
> where split n k (ls, rs) [] = return (ls, rs)
> split n k (ls, rs) (x:xs) = do
> p <- randomRIO (1, n)
> if p <= k
> then split (n - 1) (k - 1) (x:ls, rs) xs
> else split (n - 1) k (ls, x:rs) xs
>
>
>
> I have also written an implementation for my former algorithm :
>
> import Data.List (mapAccumL, sortBy)
> import System.Random (RandomGen, split, randoms)
> import Data.Ord (Ordering)
> import Data.Function (on)
> -- compare two real numbers as infinite sequences of booleans
> real_cmp :: [Bool] -> [Bool] -> Ordering
> real_cmp (True:_) (False:_) = LT
> real_cmp (False:_) (True:_) = GT
> real_cmp (_:xs) (_:ys) = real_cmp xs ys
> -- weight each element with a random real number
> weight_list :: RandomGen g => g -> [a] -> [([Bool], a)]
> weight_list g = snd . mapAccumL weight g
> where weight g x =
> let (g1, g2) = split g in
> (g1, (randoms g2, x))
> -- shuffle by sorting on weights
> shuffle :: RandomGen g => g -> [a] -> [a]
> shuffle g = map snd . sort_on_weights . weight_list g
> where sort_on_weights = sortBy (real_cmp `on` fst)
>
>> Interesting question! Adapting quick sort is not that easy, though.
>>
>> First, you can skip choosing the pivot position because it is already
>> entailed by the choices of elements left and right to it.
>>
>> Second, probability 1/2 won't work, it doesn't give a uniform
>> distribution. In order to get that, the remaining input xs has to be
>> partitioned into two lists xs = (ls,rs) such that
>>
>> probability that length ys == k is 1/(n `over` k)
>>
>> where
>>
>> n `over` k = n! / (k! * (n-k)!)
>>
>> is the binomial coefficient. After all, calling "quickrandom"
>> recursively on each of the two lists will give two permutations with
>> probability 1/k! and 1/(n-k)! and the probability for a compound
>> permutation is
>>
>> 1/(n `over` k) * 1/k! * 1/(n-k)! = 1/n!
>>
>> as desired. In contrast, distributing elements with probability 1/2
>> would give
>>
>> probability that length ys == k is (n `over` k) * 2^(-n)
>>
>> which would skew the distribution heavily towards permutations where the
>> pivot element is in the middle.
>>
>>
>> However, it is possible to divide the list properly, though I haven't
>> worked out the exact numbers. The method would be
>>
>> divide (x:xs) = do
>> (ls,rs) <- divide xs
>> random <- uniform (0, 1) :: Random Double
>> if random <= p (length xs) (length ls)
>> then return (x:ls, rs)
>> else return (ls, x:rs)
>>
>> where the probability p of putting the element x into the left part
>> has to be chosen such that
>>
>> 1/(n `over` k) =
>> 1/(n-1 `over` k-1) * p (n-1) (k-1)
>> + 1/(n-1 `over` k ) * (1 - p (n-1) k)
>>
>>
>> Regards,
>> Heinrich Apfelmus
>
More information about the Beginners
mailing list