[Haskell-cafe] Re: generalized shuffle
Manlio Perillo
manlio_perillo at libero.it
Wed Mar 25 08:23:46 EDT 2009
oleg at okmij.org ha scritto:
> Hello!
>
>> The reason is that in a project I need to extract random elements
>> from a list (and I need to extract a lot of them), and using "normal"
>> methods [1] seems to be rather inefficient.
Note that I have used an incorrect term, sorry.
What I need, in detail, is to:
Given a population of 100480507 elements, extract 17770 random samples,
where each sample has a size between 1 and 480189.
Yes, it is again my Netflix Prize project; here I'm trying to write a
program to generate the entire training data set using random data.
>> [1] by normal method I mean: extract a random number, in the range
>> [0, n] (where n is the length of the sequence), and get the element
>> indexed by that number.
>
> BTW, have you tried to use Data.IntMap? That is, put the sequence into
> an IntMap (each element being indexed by its index) and then extract
> elements at random; IntMap seems to be quite efficient...
>
This should be ok if I need independent random choices from a
population, but I need random samples, instead, sorry for the confusion.
Do you think that it is possible to further optimize your tree
structure? Or to use IntMap, instead?
> Cheers,
> Oleg
Thanks Manlio
More information about the Haskell-Cafe
mailing list