Random Magic Paths - is it truly random? - Page 6

Ivan Pedroso · #51 February 6th, 2005, 11:58 PM

Quote:

alexti said:

Quote:

Ivan Pedroso said:
[snip]
That is:
Probability( |"observed frequency" - 1/8| > epsilon ) < delta

Do you know how to proof it? I don't see any obvious one.

"the Law of Large Numbers" (or is it called something else, can't remember it) ensures that:
"The more you repeat the random process the more the empirically measured frequencies approach the values of the ''true'' underlying probabilities"
- repeatedly roll some eight-sided dies and your observed frequencies will get closer and closer to 1/8 as you go along.

Quote:

alexti said:

Quote:

Ivan Pedroso said:
And then adding up the three largest observed frequencies will then result in a value that is in the interval
[3/8 - 3*epsilon ; 3/8 + 3*epsilon]

That looks wrong. You could do this if your frequencies were independent random processes. However, in our case they are dependent from each other, because the total of all frequencies is always 1. And of course, sum of three largest frequencies is always >= 3/8, but that isn't a problem.

I'm still unsure if your theorem is right or not, but your proof needs fixing.

Sure... I should have used the interval: [3/8 ; 3/8+3*epsilon]
Adding up any three empirical frequencies will (very likely) result in a value that is included in the interval [3/8-3*e ; 3/8+3*e]. Adding the three largest will make the value end up in the smaller interval above.

(hehehe I could be a pedantic arse and state that the smaller interval is contained in the larger and still claim my statement to be true - but I simply just forgot about the obvious lower limit of 3/8

- made me look a bit foolish.)

I made a crude program to empirically calculate the Duck_Number: (I like MatLab

)

Code:



clear                            %%% Clears stuff.

rand('state',sum(100*clock));    %%% Random seed gets mixed up a bit.

m=[0 0 0 0 0 0 0 0];             %%% A vektor is created:

                                 %%% The first entry could be FIRE

                                 %%% the second AIR, and so on.

N=100000                         %%% Number of mages to be generated.

for I=1:N                       

    roll=randint(1,1,8)+1;       %%% Random integer between 1 and 8.

    m(roll)=m(roll)+1;           %%% The appropriate entry (FIRE, AIR...

end                              %%% gets bumped up by one.

f=m/N                            %%% The generated frequencies are computed.

sf=sort(f);                      %%% Frequencies are sorted (ascending order).

duck=sf(8)+sf(7)+sf(6)           %%% The three highest values are added.

Doing a handfull of runs (N=100000) resulted in:
Duck_Number ~ 0.377
as a reference: 3/8 = 0.375

alexti · #52 February 7th, 2005, 02:55 AM

Quote:

Ivan Pedroso said:

Quote:

alexti said:

Quote:

Ivan Pedroso said:
[snip]
That is:
Probability( |"observed frequency" - 1/8| > epsilon ) < delta

Do you know how to proof it? I don't see any obvious one.

"the Law of Large Numbers" (or is it called something else, can't remember it) ensures that:
"The more you repeat the random process the more the empirically measured frequencies approach the values of the
''true'' underlying probabilities"
- repeatedly roll some eight-sided dies and your observed frequencies will get closer and closer to 1/8 as you go along.

Ahh... I think your statement comes from Chebyshev's inequality. So it solves it. I guess to learn how to prove it I'll have to look up in the book

Quote:

Ivan Pedroso said:

Quote:

alexti said:

Quote:

Ivan Pedroso said:
And then adding up the three largest observed frequencies will then result in a value that is in the interval
[3/8 - 3*epsilon ; 3/8 + 3*epsilon]

That looks wrong. You could do this if your frequencies were independent random processes. However, in our case they are dependent from each other, because the total of all frequencies is always 1. And of course, sum of three largest frequencies is always >= 3/8, but that isn't a problem.

I'm still unsure if your theorem is right or not, but your proof needs fixing.

Sure... I should have used the interval: [3/8 ; 3/8+3*epsilon]
Adding up any three empirical frequencies will (very likely) result in a value that is included in the interval [3/8-3*e ; 3/8+3*e]. Adding the three largest will make the value end up in the smaller interval above.

(hehehe I could be a pedantic arse and state that the smaller interval is contained in the larger and still claim my statement to be true - but I simply just forgot about the obvious lower limit of 3/8

- made me look a bit foolish.)

What is wrong is that if you select N for epsilon, delta for the first path, using the same N for the second path is wrong, because after you've fixed one path, the distribution of the other remaining picks between paths is not the same as before. The interval didn't really matter, it was just a consequence of not considering that those variables became dependent.

However, you can prove by applying your reasoning to the random variables X1...Xm (m = C(3,8)) which represent rolling one of the 3 picks (for each combination of 3 paths), with mean = 3/8 for each variable. Than applying your original logic, selecting such N that
Probability( |"observed frequency of Xi" - mean| > epsilon ) < delta
Here, max ("observed frequency of Xi") is a "duck number", so it proves it.

So this case looks closed

Ivan Pedroso · #53 February 7th, 2005, 07:22 AM

(Sure - "choosing a large N" was always meant as: "as large as needed for the stuff to be correct".)

Back to the original question - has anyone found any significant deviations that suggest any hokey-pokey in the randomness ?!?

alexti · #54 February 7th, 2005, 11:47 AM

It doesn't look likely to happen, the sample needed to find anything with a reasonable probability of correctness seems to be too big. My test with sample of 3000 wasn't even close to the point when the distribution significantly condenses, in your 100000 sample result is still 0.5% off.

What is the size of the sample required to show that there's a problem (using duck number indicator) with a 99.99% probability?

I think we'd need the code of RNG to test it on large samples...

Alneyan · #55 February 7th, 2005, 01:33 PM

I suppose there would be no way to create a program able to count the magic paths on Spectres? Getting high number of mages is not difficult, but counting them is (I currently have 4,000 spectres or so, with 128 spectres summoning a spectre every month). Of course, the game *will* crash sooner or later, but I think having as many mages as the game can handle should be good enough for this purpose. Or perhaps several such tests could be run?

Now, the question would be how to count all those magic paths without spending a lifetime on the matter. With further experimentation, I noticed the Nation Overview wouldn't work with that many commanders, and only one hundred commanders can be moved at once to another province. The game also hangs up sometimes when running a turn, but brute force seems to solve the problem, for now at least.

Arralen · #56 February 7th, 2005, 02:22 PM

Err.. folks, I don't think, and the original claim wasn't about it either, that the RNG code is broken per se.

It's just that you e.g. get the "same" mage three times in a row in one game, say with earth/air as random pick, and in the next game, you only get nature and something, but never an air mage (while playing the same nation again).
And sometime the behaviour changes suddenly mid-game.

Would this be "singular" happenings, I woulnd't bother, but I have seen this way to often to write it off as Murphy's law or statistical effects or whatever.

But the error is not with the RNG, but with either the seed that is fed to it (and not re-set for some turns, essentially resulting in the same random number again) or with the code that turns that random num into the actual commander (maybe using old data from the previous turn, or copying the last build commander or something).

Wild guess: That bug about strangely ressurected commanders/heros, which turn up with other nations isn't fixed, too, as noone can figure out how and when it happens, right? Maybe there's some connection ???

Ivan Pedroso · #57 February 7th, 2005, 03:09 PM

@Arralen

(As started by others)
Just because something looks a bit funny doesn't mean that it isn't random.

Ivan Pedroso · #58 February 7th, 2005, 03:15 PM

The Duck_Number is not an easy tool to utilize in order to check if the distribution of random picks is in fact uniform. It is much easier to just count the different number of FIRE, AIR, WATER,� picks and then do a reduced_Chi^2 test to see if we can uphold the notion, that the random picks are uniformly distributed.
(Chi^2 is the square of the Greek letter Chi � looks like an X and is pronounced: �Kai�)

How to do it:
(1) Take your data-sample and group the measurements in some �bins� (intervals).
(2) Decide on a distribution (or model) that you would like to check your measurements against.
(3) Use your distribution to calculate the expected number of �hits� in each of the bins.
(4) Calculated the reduced_chi^2 value.

Formula:
Reduced_chi^2 = (1/d) * SUM[ (O(i) � E(i))^2 / E(i) ]
d = the number of degrees of freedom
O(i) = observed number of hits in the i�th bin.
E(i) = expected number of hits in the i�th bin.

Depending on the reduced_chi^2 you can now determine the following:
�How likely is it that my chosen distribution could be responsible for the observed data-sample.�
If the reduced_chi^2 is close to one, then agreement is satisfactory. If it is larger, then the observed results do not fit the assumed distribution. �Larger� depends on the value of d, but it usually means larger than 2 or 3.

I�ll do an example to illustrate the method:
I use the sample that Alneyan showed some posts above.

Fire:22, Air:22, Water:25, Earth:28, Astral:22, Death:31, Nture:17, Blood:27

Going through the steps above:
(1) well� an obvious choice of bins would be bin1=FIRE, bin2=AIR bin3=WATER and so on. (You could test the ratio of elemental vs. sorcery by choosing just two bins. There could be other interesting bin�ings.)
(2) I choose a model with equal probability (1/8) of getting the different paths.
(3) The total number of mages generated were 194. So the expected value in each bin is 194/8 = 24.25
(4) We first need to explain the �d� in the formula. Here d = 7. That is because of the following constraint: O(1)+O(2)+�+O(8) = 194. There are eight different bins. If you fill up the first 7, then the rest will just go into the last one.

Reduced_Chi^2 = (1/7)*[ (22 - 24.25)^2 / 24.25 + (22 � 24.25)^2 / 24.25 + (25 � 24.25)^2 / 24.25 + � + (27 � 24.25)^2 / 24.25 ] = 0.798

Apparently there is a nice level of agreement between Alneyan�s data and the hypothesis (probability of 1/8 of getting any of the paths).

But beware of jumping to conclusions!!! This result just mean that we can�t reject the hypothesis. We can�t claim that the probabilities are in fact (1/8) for the different paths, just that the data-sample doesn�t give us any reasons to discard it.

An example of a distribution that would be rejected by this data-sample is this:

FIRE = AIR = WATER= EARTH = 0.25 * 60/100 = 0.1625
ATRAL = DEATH = NATURE = BLOOD = 0.25 * 40/100 = 0.0875
(Corresponding to a 65%-35% distribution between elementals an sorcery)

O( i = 1,2,3,4 ) = 194*0.1625 = 31.525
O( i = 5,6,7,8 ) = 194*0.0875 = 16.975

Reduced_Chi^2 = (1/7)*[ (22 � 31.525)^2 / 31.525 + (22 � 31.525)^2 / 31.525 + (25 � 31.525)^2 / 31.525 + � + (27 � 16.975)^2 / 16.975 ] = 3.7852

With d=7 a result of 3.7852 indicates that it is highly unlikely that this new distribution is responsible for Alneyan�s data.

alexti · #59 February 7th, 2005, 09:12 PM

Quote:

Ivan Pedroso said:
The Duck_Number is not an easy tool to utilize in order to check if the distribution of random picks is in fact uniform. It is much easier to just count the different number of FIRE, AIR, WATER,… picks and then do a reduced_Chi^2 test to see if we can uphold the notion, that the random picks are uniformly distributed.

It is not as difficult to test for distribution as to test for independence. Consider pseudo-RNG that produces uniformly distributed numbers from 1 to 8. If we implement it as x(i) = 1+(i%8), it will generate very well distributed samples, however, those x(i) are not independent at all. In fact every x(i+1) is completely determined by x(i).

This lack of independence is a typical problem in pseudo-RNG. They often tend to repeat certain sequences more often that others. So if we're looking for a problem in RNG, I'd expect to find something like if you've sequentially rolled 1,2 and 8, there's about 50% probability that the next number will be 3 or 5.

Bummer_Duck · #60 February 15th, 2005, 11:25 PM

Ok I'm done being sick...

I finished my test to 100, the totals were:

F 13
A 7
W 10
E 16
A 11
D 14
N 17
B 12

It took a little while, cause I wrote them down in the order I recruited them.

The only thing that bothers me, is the streakyness of some picks. For example, I got my first air random on dwarf 17, then no more till #57. Or, I get 7 Earth randoms from Dwarf #8 to #21, and another grouping of 5 between #64 and #75. Astral and Water distribution look a little fishy to me also...

Fire, Nature, Death and Blood *look* random, I guess. So maybe it has just been my perception. Plus everyone else seems to think everything is OK, for the most part.

Thanks for the responses!