Basic Statistics in Matrix Algebra Notation

The central statistics of the general linear model are the algebraic mean, variance, followed by the coefficients of covariance and correlation. The concepts of correlation analysis can be extended to include the coefficients of statistical significance as t and F, obtained in the course of computations of t-test and analysis of variance. These univariate and bivariate statistical methods are usually expressed by using the summation notation, but can be expressed as well by using the matrix algebra notation.

Arithmetic Mean

Using summation notation, the arithmetic mean of a variable X can be written as

 

where n signifies the number of cases. In the Visual Statistics Studio, arithmetic mean is calculated as

In the notation of matrix algebra, the mean of the vector X can be written as

where 0 is a null vector and n is the length of both the X and 0 vectors.

Consider vector X [1 2 3 4 5]. Its length equals 5 and its arithmetic mean is computed as

+


=

 

 




=



Alternatively,          


where 1 is a unit vector. Thus,

True Variance

Using the summation notation in obtained scores, the true variance of a variable X is

 

  For the example of the variable X = [1 2 3 4 5] its variance can be computed, using the obtained scores and their squares, as

For the example, variance of the variable X can be computed as (5(55) - (15)2) / 25 which equals (275 - 225) / 25 which, in turn, equals 2.

In the notation of matrix algebra, the same expression can be written as

where 1 is a unit vector, and the Greek letter delta signifies triangulation of the skew matrix into a skew-positive matrix. For the vector X [1 2 3 4 5] the above expression is written as

   

Subtracting the X - X' expression within the parentheses,

and triangulating the skew-symmetric matrix

 

Squaring the matrix elements

the true variance of the variable X can be computed as 50/25 that is 2.

Using the Visual Statistics Studio, right click the matrix module and enter from Data Prototypes the (5,1) column vector X, transpose it (X'), click on the Delta X command, and from the Means menu select the Grand Mean of Squared Elements command.


The Delta X command equals the X-Y command with negative values of the resulting skew symmetric matrix removed. Variance in the Matrix  Cell 4 equals 2.00.

Covariance

Using summation notation, the covariance of variables X and Y can be written as 

Consider the following example

Using the deviation scores, covariance can be computed as 16/4, which equals 4.0.

In the notation of matrix algebra, the covariance of the matrix X can be written as

where D signifies the matrix X, linearly transformed into deviation scores, and n is the number of rows in the matrix D. For the above example, 

Consider another example, where the matrix X equals 

 

and its corresponding matrix of deviation scores D is

The covariance of the matrix is computed as

The matrix C is also called the variance-covariance matrix, since the variance of each variable is in its principal diagonal and the covariance among its variables is in the off-diagonal elements.

In the Visual Statistics Studio, enter the 

data matrix. Select ( Transformations, Deviations from the Mean ) commands and replace the data on the vector display with the deviation scores.

Transfer the deviation scores to the Matrix Cell 1. Transpose the deviation scores. Select the matrix multiplication X*Y command and store the result in the Matrix Cell 3. Enter the number of cases (5) into the Matrix Cell 4.  Select the X/c command and divide the values in the Matrix Cell 3 by the n. Store the result in the Matrix Cell 5.

 

Correlation

Using summation notation, the correlation of variables X and Y can be written as

where and  are standard scores, obtained from the deviation scores corresponding to variables X and Y by linear transformations and . The n signifies the number of cases. In the notation of matrix algebra, the correlation matrix R corresponding to the data matrix X can be written as

where Z signifies the matrix X, linearly transformed into standard scores, and n is the number of rows in the matrix Z. Consider matrix X

and its corresponding matrix of standard scores Z

The correlation matrix R corresponding to the matrix X is computed as

Multiplying matrices in the numerator

and dividing by the scalar number in the denominator

Note that all discussed operations, done with respect to columns (attributes) of the data matrix, can be also done with respect to its rows (entities).

Within the Visual Statistics Studio, the steps for computation of the matrix of correlations will be the same as steps described for the obtaining the variance -   covariance matrix, except instead of transferring deviation scores from the vector display, transfer the standard scores.

Summary

The statistical formulae for means and variances in both the summation and the matrix notation are summarized as

 

Summation Notation

Matrix Algebra  Notation

 

Mean

 

Variance

For covariance and correlation, the key formulae are summarized as

 

Summation Notation

Matrix Algebra Notation

 

Covariance

 

Correlation