Quote:
Ivan Pedroso said:
In fact you could choose any two small positive numbers, epsilon >0 and delta >0, (could be 0.00000001 and 0.000000001) and it will then be possible to find a laaaarge number N that insures that:
If N dies are rolled then the probability of getting an observed frequency that deviates from 1/8 with more than the small number epsilon, is smaller than delta.
That is:
Probability( |"observed frequency" - 1/8| > epsilon ) < delta
|
Do you know how to proof it? I don't see any obvious one. Let us consider frequency of death picks.
Code:
Probability to roll exactly k out of n P(k,n) = C(k,n)*p0^k*(1-p0)^(n-k). (p0 = 1/8)
For simplicity, let's consider overrunning your range up. Probability of that P(m+,n) = sum[k=m..n]{P(k,n)},
where m is smallest that satisfy m/n > p0 + epsilon.
Ignoring rounding effects we can write m=a*n, where a = p0 + epsilon.
Then P(m+,n) = sum[k=m..n]{P(k,n)}
= p0^(a*n) * sum[k=a*n..n]{C(k,n)*p0^(k-a*n)*(1-p0)^(n-k)}.
And that's where I'm getting stuck. p0^(a*n) quickly goes to 0 when n grows,
but the sum part has number of elements proportional to n,
with the dominant n! on the top, so it will grow very quickly.
Does this P(m+,n) converge to anything? And if it does, to what value?
I have tried to run a test program to see what is happening.
I didn't have few billion years to wait until the probability to get within the epsilon = 0.00000001 will become distinguishable from 0, so I took 0.002 as epsilon. Unfortunately, at around n=3000 my program is running out of precision of double. At that moment P(m+,n) is around 40%. Until then it was slowly going down, but the rate of descend was decrementing. So, the experiment didn't suggest any conclusion
Quote:
Ivan Pedroso said:
And then adding up the three largest observed frequencies will then result in a value that is in the interval
[3/8 - 3*epsilon ; 3/8 + 3*epsilon]
|
That looks wrong. You could do this if your frequencies were independent random processes. However, in our case they are dependent from each other, because the total of all frequencies is always 1. And of course, sum of three largest frequencies is always >= 3/8, but that isn't a problem.
I'm still unsure if your theorem is right or not, but your proof needs fixing.