[Haskell-cafe] Re: generalized shuffle
manlio_perillo at libero.it
Wed Mar 25 08:23:46 EDT 2009
oleg at okmij.org ha scritto:
>> The reason is that in a project I need to extract random elements
>> from a list (and I need to extract a lot of them), and using "normal"
>> methods  seems to be rather inefficient.
Note that I have used an incorrect term, sorry.
What I need, in detail, is to:
Given a population of 100480507 elements, extract 17770 random samples,
where each sample has a size between 1 and 480189.
Yes, it is again my Netflix Prize project; here I'm trying to write a
program to generate the entire training data set using random data.
>>  by normal method I mean: extract a random number, in the range
>> [0, n] (where n is the length of the sequence), and get the element
>> indexed by that number.
> BTW, have you tried to use Data.IntMap? That is, put the sequence into
> an IntMap (each element being indexed by its index) and then extract
> elements at random; IntMap seems to be quite efficient...
This should be ok if I need independent random choices from a
population, but I need random samples, instead, sorry for the confusion.
Do you think that it is possible to further optimize your tree
structure? Or to use IntMap, instead?
More information about the Haskell-Cafe