Bug in the scaling of randoms ...

Ronny Wichers Schreur ronny@cs.kun.nl
Tue, 07 May 2002 10:28:45 +0200


Dimitre Novatchev writes (to the Haskell mailing list):

> In his excellent book Simon Thompson defines scaling of the elements of
> a sequence of random numbers from the interval [0, 65535] to an
> interval [s,t] in the following way (top of page 368):

> > scaleSequence :: Int -> Int -> [Int] -> [Int]
> > scaleSequence s t
> >   = map scale
> >     where
> >       scale n = n `div` denom + s
> >       range   = t - s + 1
> >       denom   = modulus `div` range
> > where modulus = 65536
> However, the following expression:
>
> e4000 = (scaleSequence 1 966 ([65231] ++ [1..965]))!!0
> evaluates to 974 -- clearly outside of the specified interval.

The function fails in this case because the range doesn't divide the
modulus. In the book (I checked the Miranda version), there's a remark
after the function that "[The] range of numbers 0 to modulus-1 is split
into range blocks, each of the same length."

When the range doesn't divide the modulus, it's not possible to get
a uniform distribution if you scale each value individually. I assume
this is acceptable, otherwise you'll have to discard or combine values
from the original sequence.

Oleg answers:

>We aim to find a function sc(n) defined over integers 0 <= n <= M so
>that
>    sc(0) -> s (given integral number)
>    sc(M) -> t (given integral number), t>=s
>and
>    0<=a < b ==> sc(a) <= sc(b)
>
>Such a function sc(n) is given by the following expression:
>    sc(n) = (n*(t-s)) `div` M + s


(M = modulus - 1)

I think a more natural function would be

    sc(n) = (n * (t-s+1)) `div` modulus + s

This satifies the conditions (assuming range = t-s+1 <= modulus), and
distributes the values more evenly, in particular when the range divides
the modulus.



Cheers,

Ronny Wichers Schreur