Quote:
Ivan Pedroso said:
Quote:
alexti said:
Quote:
Ivan Pedroso said:
[snip]
That is:
Probability( |"observed frequency" - 1/8| > epsilon ) < delta
|
Do you know how to proof it? I don't see any obvious one.
|
"the Law of Large Numbers" (or is it called something else, can't remember it) ensures that:
"The more you repeat the random process the more the empirically measured frequencies approach the values of the
''true'' underlying probabilities"
- repeatedly roll some eight-sided dies and your observed frequencies will get closer and closer to 1/8 as you go along.
|
Ahh... I think your statement comes from Chebyshev's inequality. So it solves it. I guess to learn how to prove it I'll have to look up in the book
Quote:
Ivan Pedroso said:
Quote:
alexti said:
Quote:
Ivan Pedroso said:
And then adding up the three largest observed frequencies will then result in a value that is in the interval
[3/8 - 3*epsilon ; 3/8 + 3*epsilon]
|
That looks wrong. You could do this if your frequencies were independent random processes. However, in our case they are dependent from each other, because the total of all frequencies is always 1. And of course, sum of three largest frequencies is always >= 3/8, but that isn't a problem.
I'm still unsure if your theorem is right or not, but your proof needs fixing.
|
Sure... I should have used the interval: [3/8 ; 3/8+3*epsilon]
Adding up any three empirical frequencies will (very likely) result in a value that is included in the interval [3/8-3*e ; 3/8+3*e]. Adding the three largest will make the value end up in the smaller interval above.
(hehehe I could be a pedantic arse and state that the smaller interval is contained in the larger and still claim my statement to be true - but I simply just forgot about the obvious lower limit of 3/8 - made me look a bit foolish.)
|

What is wrong is that if you select N for epsilon, delta for the first path, using the same N for the second path is wrong, because after you've fixed one path, the distribution of the other remaining picks between paths is not the same as before. The interval didn't really matter, it was just a consequence of not considering that those variables became dependent.
However, you can prove by applying your reasoning to the random variables X1...Xm (m = C(3,8)) which represent rolling one of the 3 picks (for each combination of 3 paths), with mean = 3/8 for each variable. Than applying your original logic, selecting such N that
Probability( |"observed frequency of Xi" - mean| > epsilon ) < delta
Here, max ("observed frequency of Xi") is a "duck number", so it proves it.
So this case looks closed
