Statistical Distributions
  Frontispiece
Chapter I
Binomial Distributions
  Chapter II
Normal Distribution
  Chapter III
t-Distributions
  Chapter IV
F Distributions
  Chapter V
Chi Square Distributions
  Modicum of
Microsoft Excel
  Tables

Binomial Distributions

The basic distributions within the general linear model of statistics are the binomial distribution of which the normal distribution is the limiting case, the t- and F distributions, and the Chi-Square distribution. Let us begin the discussion of these distributions with the binomial distribution.

Galton's Quincunx

Probably the best way to introduce the binomial distribution is via the Galton’s Quincunx. Galton's Quincunx is an apparatus with a single top compartment that contains a handful of marbles and a maze of ducts leading to several compartments at the bottom of the instrument.

 

 

 

 

If this apparatus is set upright, the marbles will fall through the ducts and mimic the pattern of probabilities contained in the Pascal's triangle. When the marbles reach the bottom compartments, the upper contour of the stack of marbles will bear a strong resemblance to a normal curve. Galton's original device used pins positioned on a board and resembling an ornamental arrangement of five bushes; hence the name quincunx.

 

The probabilities associated with probabilities of falling marbles to enter certain path helps to explain the binomial distribution. In the middle of the top compartment that contains the marbles there is an opening with a middle partition. A marble falling through this partition has an equal probability of falling into one of two lower compartments. Each of these compartments has an opening with a partition in its middle. Below this, there are four more compartments. The probability of a marble falling into the leftmost compartment is half of .50, associated with the compartment above, i.e., .25. The probability of a marble falling into the two inner compartments is the sum of .25 from the first above compartment and .25 from the second, i.e., .50. The probability of a marble falling into the rightmost compartment is again .25. This branching pattern is repeated over the rows of the compartments. For the above example, these probabilities are

 

                                       

 

Plenum of Binary Events

Binary events have two outcomes and the number of rows of the n by k plenum where k is the number of events is

 

                                                            

 

Thus, e.g., three true-false items X1, X2, and X3 can be answered in  ways. Within a three item true-false test we can, theoretically, observe 8 response patterns. To construct this plenum, consider that a half of these 8 different outcomes will be false (0) and the other half will be true (1), thus the algorithm for the construction of this plenum is as follows:

 

Enter four zeroes and four ones to the first column. Split the four zeroes in the first column into two zeroes and two ones and record them in the second column. Do the same for the ones. In the last column, record alternate zeroes and ones.

 

The resulting n x k plenum is shown below.

 

                                                    

 

Note that the variables X1, X2, and X3 are not correlated, i.e.,

 

                                                    

 

and that the different response patterns are permutations (Lat. permutation; a thorough change) of 1s and 0s where their order is important.

 

Suppose that questions X1, X2, and X3 are items on the commercial pilot license test where item X1 pertains to FAA regulations, item X2 pertains to navigation and item X3 to the operation of the aircraft. Such a test is scored as shown in the table below,

 

 

                                             

 

as to give credit to responses that reflect knowledge of administrative regulations without the knowledge of navigation and operation of the aircraft or giving credit to knowledge of navigation without the know-how how to fly the commercial airliner could result in death of hundreds of people. The rectangular distribution of the test scored in this manner is shown below.

 

                                                        

 

Now, suppose that questions X1, X2, and X3 are items on the geography test where items X1, X2 and X3 pertain the capitals of Taiwan, South Korea, and Japan. Such a test is scored as a combination of correct responses, without regard to the order of their response patterns, as shown in the table below,

 

                                    

 

 

where the frequencies of the values of the composite score X form a binomial distribution

 

                                                        

 

The mean of the composite score X is defined as

 

                                                           

 

computed for the X in the above table as 3(.5) which equals 1.5. The variance of this variable can be defined as

 

                                                          

 

computed for the above variable X as 3(.5)(.5) which equals .75. Note that the n for the argument of the binomial frequency distribution equals k+1 and that the variance of the binomial distribution refers to the variance of its expanded frequencies [0 1 1 1 2 2 2 3] that is identical to the variance of the composite variable X [0 1 1 2 1 2 2 3].

 

Pascal Triangle

We could also get the above frequencies directly, by constructing the Pascal's triangle, as follows:

 

Start with 1 on the top of the triangle, and append two 1s on the next line with the top 1 in-between. Continue to append 1s on the extremes of following lines while constructing the in-between terms by summing the adjacent terms above them.

 

 

                                                  

 

Pascal triangle was originally developed as a mnemonic aid for expanding binomials, as the expansion of binomials such as  or  are easy to memorize, but expansions to the higher powers become increasingly more difficult to remember. If we arrange expanded binomials according to the increasing order of their exponents

 

                                              

 

It becomes obvious that these expansions can be separated into a component containing progressions of decreasing and increasing exponents, for the bottom line highlighted as

 

                                             

 

and their coefficients that can be recalled by using the mental image of the Pascal’s triangle. For the example, as

 

                                                

 

 

and

                                                  

 

 

Pascal’s triangle can be also written in terms of factorials as

 

                                       

 

                                                                                                                               

and its associated Gamma functions

 

              

 

 

Note that the length of the binomial variable is k+1, the argument of the gamma function. When this argument is an integer, the gamma function is just the factorial function offset by one,

 

                                                       

 

However, the gamma functions differ from the factorials that they are capable to compute the factorials not only of integers, but also of the real numbers. The gamma functions are the key parameters of the family of gamma distributions which includes the binomial, normal, t, F, and chi-square distributions.

 

Binomial Distribution as the Precursor of the Normal Distribution

Let us define the binomial distribution within the context of the theoretical model underlying this distribution that has the following properties. The n x k plenum consists of 2k possible response patterns (permutations) to a set of k binary variables, assembled into the variable X without regard to their order, i.e., the variable X is the combination of the response patterns contained by the  variables and defined as their sum. The probabilities associated with the 1 (p) and 0 (q) values of the binary variables are equal, i.e., p = q. The means of the binary variables equal to p and their variances equal to pq. The binary variables are orthogonal (uncorrelated), and thus their sums do not contain the covariance terms. Thus the mean of the variable X is defined as kp and its variance as kpq, as indicated in the diagram below.

 

                            

 

The binomial distribution  associated with this model can be defined as

 

 

                                                

 

 

where i = 0, 1, … k+1 and j = k – i. For the example of the three items X1, X2, and X3,

 

                                       

 

 

 the ordinate  corresponding to the abscissa  equals

 

 

                                    

 

                                    

 

 

                                    

 

 

                                    

 

and can be plotted as

 

                                                        

 

These results are identical to results obtained by computing probabilities directly from the Pascal's triangle by dividing the rows of the Pascal's triangle by their row sums, as

 

                                       

 

When k approaches infinity, the above probabilities merge with the probabilities of the areas under the normal distribution. When p is not equal to q, the binomial distribution is not a precursor of the normal distribution, but rather mimics its various skewed forms.

 

Binomial Distribution within the Microsoft Excel Framework

In the Microsoft Excel, one has to define the location of the argument, the length of the argument, the probability p, and whether the distribution should be cumulative, or not. For the example where the argument of the binomial function was generated in the column a1:a4 as 0,1,2,3, the probability p equal to .5, and standard, non-cumulative binomial function, the formula was written as

 

Microsoft Excel

=BINOMDIST(A1,3,0.5,FALSE)

Binomial Distribution

 

 

Plotted

 

 

BINOMIAL DISTRIBUTION WITHIN THE CRUISE SCIENTIFIC VISUAL STATISTICS STUDIO

In the Visual Statistics Studio, select (Distributions, Ordinates of Binomial Distributions)

specify the length of the argument as 4, and click on the Standard command. On Descriptive Statistics mark the Sum. As the sum of values of the bDist variable equals 8, select (Operations, Divide by a Constant), mark the bDist, Name the Result Binomial, enter the Constant as 8 and click the Append command. Thus

Select (Graphs, Spline Graphs), mark the X and the Binomial variables, and click on the Accept command. As the top of the spline is slightly depressed, right click on the graph, select (Properties, Scale) and increase the ordinate from .40 to .50.

To visualize binomial distribution of the 10th order, repeat the previous steps, substituting 10 for 4 on the n input box. Click on (Stereographs, Define Stereograph), define Diameter, Contours, and Colors, and click the C, Z and Y commands on the stereogram display.