Computations of combinatoric and statistics functions and inverses which deliver good accuracy over a wide range of values. Accuracy tests allow the functions to be checked in any computing environment.

Please send comments and suggestions for improvement to: ritter@ciphersbyritter.com. You may wish to help support this work by patronizing Ritter's Crypto Bookshop.

**Hypothesis Testing****Continuous Statistic Distributions****The Meaning of Statistic Distributions****Normal****Chi Square**-- compare binned distribution counts**Kolmogorov-Smirnov**-- compare distributions without using bins

**Base Conversion**-- base 2..64 to decimal and decimal to base 2..64**Logs****Powers****Permutations**-- n things taken k at a time, order matters**Combinations**-- n things taken k at a time, order does not matter**Binomial**-- for success probability p and n trials, the probability of k successes**Bit Changes**-- bit changes from keyed invertible substitution tables or ciphers**Poisson**-- given mean u, the probability of k successes

One role of statistics
is to develop evidence that something
*is* or *is not* performing as expected. Typically we
have many objects, any one of which may be selected at
random
and then measured. Over many
samples,
measured values may recur with some fixed probability, and we can
construct a frequency
distribution
or *density function* experimentally. If we can form a theory
or *hypothesis* of what that distribution *should* be,
we can compare the theoretical distribution to the experimental
one. And if the distributions seem to be the same, we can have
some confidence that we know what is going on: We have "proven"
the hypothesis of a particular distribution.

To compare distributions, we compute some sort of
"goodness of fit"
statistic
on the samples. Now, the experimental
objects will have a distribution of values, but the test statistic
has its own *different* distribution. In "goodness of fit," the
statistic distribution is usually what we expect if we compare any
sample distribution to itself; it is thus the variation of random
sampling alone. So if we take a number of experimental samples, we
can compute a statistic value. Then we can calculate, from the
known distribution for that statistic, how often that result should
occur by chance alone, assuming the two distributions really are
the same. And if we repeatedly find statistic values which are
particularly unlikely, we can conclude that the distribution of
experimental objects is *not* what we expected it to be.

The normal,
chi-square and
Kolmogorov-Smirnov
statistics are *continuous*
distributions
(in contrast to the
binomial and
Poisson
*discrete* distributions). Because continuous statistics are
not limited to discrete values, there is almost *no* probability
that a particular precise value will occur. We ask, therefore,
about the probability of getting a particular value *or less,*
or the value *or more.*

The probability of a particular value or less is just the area under the probability "density" curve to the left of the value; this is the "left tail." These statistics are normalized so that the total probability is 1; if we have the area to the left of a value, we can easily find the amount to the right by subtracting what we have from 1. And we can find the area between any two values by finding the area to the right value, and then subtracting the area to the left value. The same statistic probability distributions can be described from either tail, but I try to uniformly use the left tail, which is also the "cumulative distribution function" or c.d.f. The c.d.f. is just the accumulated sum or integral of the density function; it gives the area to the left of the parameter value. Whether we describe a distribution from the left tail or the right tail really does not matter as long as we understand what the associated probability means.

The probability distribution for a statistic is usually what we
expect to see when we sample a known distribution. Over many
independent trials which sample ideal data for that statistic,
we expect to find that about half of the trials will have a
statistic value *below* the 0.5 probability value. Similarly,
about half of the trials will have a value *above* the 0.5
probability value.

"Critical" statistic values are often on the right tail, and may
indicate, for example, that 95 out of 100 trials should be below
some statistic value. But this means that we *expect* to get
5 trials out of 100 *above* that critical value, as a result
of random sampling. And even though a 5 percent chance is not very
high, it *is* to be expected, and might even occur on the very
first experiment.

To the extent that any statistic value can occur with some (even
if fantastically unlikely) random sampling, I claim that simply
finding a relatively unlikely statistic value yields no conclusion
in itself. Of course, a high cost of experimentation may simply
demand *some* decision which is more often right than wrong.
We rarely have that situation in
cryptography.
But if we *repeatedly* get many more than 5
percent of the trials in the upper 5 percent of the distribution
area, this is statistical *evidence* that the distribution is
not what we thought it was.

In addition to worrying about the upper 5 percent or 1 percent
of the distribution, we *also* expect to see about a quarter of
the samples in each quarter of the distribution. It can be important
check this, because a valid experimental procedure *must*
behave this way. If we concentrate solely on statistic values which
may be too far out on the right tail (so-called "failures"), we could
miss very serious problems in the experiment itself. If we repeatedly
do not find about a quarter of the samples in each quarter of the
distribution, we *know* the distribution is not what we expected
it to be, *even if* we have very few "failures." Any deviation
from what we expect means that we do not know what is going on.

A big part of the worth of statistics occurs when we can analyze
a problem, predict the results, then have measurements agree with
our predictions. Repeated agreement between theory and practice
means that a lot of analysis and a lot of measurement instrumentation
must be working properly. But if the results do *not* agree,
we have only proven that we have not described what is really
happening. The difference could be the result of any one of many
issues in the analysis, the design of the experiment, or its conduct.
So in "goodness-of-fit" testing, we only get "proof" when the two
distributions are measurably the same, and *no proof at all*
when they differ.

The normal distribution is the "bell shaped" curve we associate with grading curves and other population models. The normal distribution -- or something close to it -- appears often in nature. This probably happens because the sum of many "independent identically distributed" random variables has a normal distribution, independent of the distribution of the variables taken one-by-one.

The normal is actually a whole family of curves, which differ in their mean value and in the square root of their variance or standard deviation. Here the user can enter either the variance or standard deviation value, and the other will be updated automatically. The standardized result appears in z.

Chi-square is the best known goodness of fit statistic. Chi-square is a way to numerically compare two sampled distributions:

- Some number of "bins" is selected, each typically covering a different but similar range of values.
- Some much larger number of independent observations are taken. Each is measured and classified in some bin, and a count for that bin is incremented.
- Each resulting bin count is compared to an expected value
for that bin, based on the expected distribution. Each
expectation is subtracted from the corresponding bin count,
the difference is squared, then divided by the expectation:
X

^{2}= SUM( (Observed[i] - Expected[i])^{2}/ Expected[i] ) i

The sum of all the squared normalized differences is the chi-square
statistic,
and the distribution depends upon the number of bins through the
*degrees of freedom*
or *df.* The df value is normally one less than the number of
bins (though this will vary with different test structures).
Ideally, we choose the number of bins and the number of samples to
get at least ten counts in each bin. For distributions which trail
off, it may be necessary to collect the counts (and the expectations)
for some number of adjacent bins.

The chi-square c.d.f. tells us how often a particular value or lower would be seen when sampling the expected distribution. Ideally we expect to see chi-square values on the same order as the df value, but often we see huge values for which there really is little point in evaluating a precise probability.

Kolmogorov-Smirnov is another goodness of fit test for comparing two distributions. Here the measurements need not be collected into "bins," but are instead re-arranged and placed in order of magnitude:

*n*independent samples are collected and arranged in numerical order in array*X*as*x*[0]..*x*[*n*-1].*S*(*x*[*j*]) is the cumulative distribution of the sample: the fraction of the*n*observations which are less than or equal to*x*[*j*]. In the ordered array this is just ((*j*+1)/*n*).*F*(*x*) is the reference cumulative distribution, the probability that a random value will be less than or equal to*x*. Here we want*F*(*x*[*j*]), the fraction of the distribution to the left of*x*[*j*] which is a value from the array.

There are actually at least three different K-S statistics, and two different distributions:

The "one-sided" statistics are:

Dnwhere "MAX" is computed over all^{+}= MAX( S(x[j]) - F(x[j]) ) = MAX( ((j+1)/n) - F(x[j]) ) Dn^{-}= MAX( F(x[j]) - S(x[j]) ) = MAX( F(x[j]) - (j/n) )

The "two-sided" K-S statistic is:

Dn* = MAX( ABS( S(x[j]) - F(x[j]) ) ) = MAX( Dnand this has a somewhat^{+}, Dn^{-})

Knuth II multiplies Dn^{+}, Dn^{-} and Dn* by
SQRT(*n*) and calls them Kn^{+}, Kn^{-} and Kn*,
so we might say that there are at least *six* different K-S
statistics. So what do we use?

It turns out that the Dn* distribution is hard to compute with accuracy over the full range. There is a good transformation between the two distributions for values well out on the right tail, but this means we lose values for quarters of the distribution, and this is a significant loss. We also lose the ability to compute an arbitrary inverse. So, just like Knuth II, we support the "one-sided" versions.

The c.d.f. computation should give 4 good decimal places. The
critical value and inverse computations iterate the c.d.f. with
bisection for *n* under 100, which gives full accuracy, but
is very, very slow. The critical values might take half a minute
to come up, and the tests might well take a couple of minutes.