Path: illuminati.io.com!uunet!cs.utexas.edu!not-for-mail
From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt

Subject: Re: multi-sided dice for RNGs
Date: 6 Aug 1994 21:38:39 -0500
Organization: UTexas Mail-to-News Gateway
Lines: 126
Sender: nobody@cs.utexas.edu
Message-ID: <199408070238.VAA15098@pentagon.io.com>
NNTP-Posting-Host: news.cs.utexas.edu

 In 
 ppearson@netcom.com (PKJS Pearson family) writes:


>In article <31n5l6$7ja@xmission.xmission.com>,
>Berzerk  wrote:
>>Steve_Allen (stevalln@dorsai.org) wrote:
>>:    Plus, the 10-sided dice I have are all somewhat crooked. Here's
>>: a result of some trial tosses:
>>: Die face:     #hits:
>>: 1              4
>>: 2              19
>>: 3              9
>>: 4              11
>>: 5              9
>>: 6              12
>>: 7              12
>>: 8              5
>>: 9              12
>>: 0              10
>>
>>: After 103 tosses, the mean was 10.3 and the std was 3.95. Not as
>>: random as I'd like.
>>AHHHHH, I think that this is about as random as it gets.  The expected
>>result is 10.3(and can't vary, as sumation number of rolls is 103:-) and
>>the expected standard deviation is about the square root of this.
>>
>
>The normal statistical test applied to this situation is the chi-squared
>test, in which one calculates the sum of (observed-expected)^2/expected
>and asks how often one would expect to get such a large value from a
>genuinely random source. In the present case, the value is 15.155,
>with "9 degrees of freedom" (corresponding to the ten bins, less the
>one constraint that their sum be fixed). A truly random source would
>produce a chi-squared value greater than this (i.e., a distribution more
>uneven than this) in 8.7% of trials. This does not qualify as "statistically
>significant" (the customary threshold for which is 5%) unless you are the
>United States Environmental Protection Agency and are hard up to reach a
>predetermined conclusion. ((The moved the finish line to 10% for the
>second-hand tobacco smoke study.))

 A better statistical test for this situation is the old binomial
 equation:

                 (n)  k  n-k
      B(k;n,p) = (k) p  q

 This gives the probability of finding exactly k successes out of
 n trials with success probability p.

 We hope that p = 1/10, so the probability of finding exactly k = 4
 values which match in n = 103 samples is:

                 (103)   4    99
               = ( 4 ) .1   .9


               = 4421275 * 1e-4 * 2.9513e-5


               = 0.013 for any particular value.

 But we have 10 possible values, so we should get a value with
 4 occurrences of some value in about 13 out of every 100 trials.

 If we sum the probability of 0..4 matches, we get about 0.0192
 per value (and we have 10 values), so we should have a value with
 4 or fewer matches in about 2 out of every 10 trials.

 Similarly, we can sum the binomial probabilities for 0..18 matches,
 and subtract from 1.0, yielding 0.0064, then multiply by 10.  Thus,
 we should get some value which occurs 19 or more times in about
 1 out of every 20 trials.


 The original experiment suffers from not conducting multiple
 trials of a particular size.  Anything can happen in one trial.
 The issue is whether the results from one trial reflect the larger
 reality.  We discover this by conducting multiple trials.

 If we accumulate the results into a common distribution (that is,
 graph the number of occurrences of each exact repetition value), we
 will find them developing a particular distribution.  Ideally, it
 will be approximately Poisson, which is easy to deal with.


 Another thing we can do with this experiment is predict population
 from the experimental evidence, using augmented doubles.  First
 we compute the number of equivalent augmented doubles from the
 given exact matches:

          (19)         (12)         (11)         (10)
      1 * ( 2)  +  3 * ( 2)  +  1 * ( 2)  +  1 * ( 2)  +

                       ( 9)         ( 5)         ( 4)
                   2 * ( 2)  +  1 * ( 2)  +  1 * ( 2)  =

      171  +  3 * 66  +  55  +  45  +  2 * 36  +  10  +  6 =


      557 augmented doubles.
      ===

          (19)
 [ Note:  ( 2) is the number of combinations of 19 things taken
           2 at a time.  In this case, it is the number of effective
           doubles produced by 19 occurrences of the same value. ]


 Now we want to predict the population (N), given sample size
 (s) = 103, and augmented doubles (ad) = 557:

      N(s,ad) = s(s-1)/2ad

              = 103 * 102 / 2 * 557

              = 9.43

 This is obviously a little low, be we can't say much for sure until
 we have more trials.

 ---
 Terry Ritter   ritter@io.com