Statistical-logical problem with rand ()

2

I will start with an example: I have a box, of which you have 5 opportunities to take a ball, of these there are 4 types, red blue and green gold, the overall percentage (that is, per box) of taking one golden is 15%, red 15%, blue 30% and green 40% To program this example I am not sure if the overall percentage (per box) is divided by the number of attempts (5) or the randomization is performed indiscriminately (that is, having 15% in each attempt for a golden one) If we join the percentages divided by intent, we would have 3% 3% 6% 8% respectively, but ... where are the other 80% left?

Passing it to code would be ...

int i,d;
for(d=0;d<5;d++){
   rnd=rand() % 100;
   if(rnd<6)i=5; //5%
   if(rnd<21 && rnd>5)i=15; //15%
   if(rnd<36 && rnd>20)i=15; //15%
   if(rnd<66 && rnd>35)i=30; //30%
   if(rnd<101 && rnd>65)i=40; //40%
}
switch(rnd){
   case 15: blabla
   case 30: blabla
   case 40: blabla
}

I am not sure that this code really is reliable regarding the percentages of the initial problem or if my logic is correct. I would appreciate some help.

    
asked by Winebous 12.10.2017 в 02:14
source

2 answers

4

Do not use rand() !

If you are looking for precision in the randomness distributions, leave rand() . Using rand() is not only inappropriate because it is a C language utility (being your question about C ++) but it is also because the use you are giving is distorting the distribution.

Distribution of rand() .

The rand() function returns an integer, pseudorandom number between 0 and RAND_MAX (both included). The distribution of the resulting values to call rand() is uniform , that is: all the numbers between 0 and RAND_MAX have the same probability of being obtained.

Break the distribution of rand() .

Assuming that the interval of rand() is [0, 32767], if the result you operate it with modulus ( % ) on a number that is not divisor of the maximum value of the interval (in your case 100 ) you will be falsifying the distribution: since the remainder of dividing 32767 by 100 is 67, the numbers from 0 to 67 are more likely to appear than numbers 68 to 99.

What should you do?

As I said, you should stop using rand() , which besides being not a C ++ utility, you were using it badly! The appropriate way to approach your problem is to use <random> the library of pseudo-random C ++ numbers that allows you to choose the probability distribution (uniform, Bernoulli, Poisson, normal, discrete, constant, linear ...) , the underlying type of the generated value and even the algorithm to be used (minstd, mt19937, ranlux, knuth ...).

In your case, since your probabilities are divided in two:

  • Did I get a ball? will be fulfilled 1 in 20 times (5%).
  • If I have taken a ball, what color is it? (15% gold, 15% red, 30% blue and 40% green).
  • For the first probability, you could use a Bernoulli distribution , which allows you to set the probability of an event (take out ball) happen:

    std::random_device device;
    std::mt19937 generador(device());
    std::bernoulli_distribution distribucion(0.05);
    
    if (distribucion(generador));
    {
        // Entramos en este if un 5% de las veces.
    }
    

    For the second probability, you could use a discrete distribution , which allows you to distribute the weight of each probability (the probability of appearance of each ball):

    std::random_device device;
    std::mt19937 generador(device());
    std::discrete_distribution<> distribucion({15, 15, 30, 40});
    
    switch(distribucion(generador))
    {
        case 0: std::cout << "Pelota dorada\n"; break;
        case 1: std::cout << "Pelota roja\n"; break;
        case 2: std::cout << "Pelota azul\n"; break;
        case 3: std::cout << "Pelota verde\n"; break;
    }
    

    You can see an example of this working in Wandbox 三 へ (へ ਊ) へ ハ ッ ハ ッ .

        
    answered by 13.10.2017 / 09:42
    source
    3

    The odds, as you comment, are:

    • 15% gold
    • 15% red
    • 30% blue
    • 40% green

    You can verify that the sum gives 100.

    So dividing the random number by 100 is relatively correct.

    Now then. How do we distribute the odds? Very easy:

    enum Color
    {
      Indefinido,
      Dorado,
      Rojo,
      Azul,
      Verde
    };
    
    for(d=0;d<5;d++){
      Color color = Indefinido;
      rnd=rand() % 100;
      if( rnd < 15 )
        color = Dorado;
      else if( rnd < 30 )
        color = Rojo;
      else if( rnd < 60 )
        color = Azul
      else
        color = Verde;
    
      // Hacer algo con el color
      // ...
    }
    

    How does it work? It's simple to understand:

    • If the number is less than 15 (between 0 and 14, that is, 15 values over 100) it is Golden
    • If the number is greater than 14 and less than 30 (that's why the else if ), then it's Red (15-29 are another 15 values out of 100).
    • The number is less than 60 then it is blue (Range 30-59 -> 30 values)
    • If none of the above is met, that is, if the number is equal to or greater than 60 then the ball is green (Range 60-99 -> 40 values).
    answered by 12.10.2017 в 10:03