|
|
|
|
|
|
The general linear model subsumes most of the methods of statistical analysis. It has been used across disciplines with few modifications for about a century. Its tenets are simple: orthogonal coordinates in multidimensional space define elements of data matrices. They are linearly related, and are analyzed in such a manner as to minimize error. Since it is difficult to visualize hyperspace, it is helpful initially to analyze elements of the general linear model in minute detail for the cases of one, two, and three variables to facilitate generalizations of these properties from their linear, planar, and three-dimensional space representations to multidimensional space later. In the present chapter, some of the formal aspects of the general model will be introduced. We begin with the description of a line.
One of the simplest equations of analytic geometry is the equation of a line
where B is the slope and A is the intercept. To illustrate the use
of linear equations in data analysis, let us consider graph plotted in the
obtained scores for two variables, X [0 .5 1 1.5 2] and Y, [1 2 3 4 5], shown
in the table below.

The line connecting the plotted data points intersects the
ordinate (Y axis) at Y = 1.00. This point is known as the intercept, A. The
intercept is measured by the distance from the origin (0,0) to the location of
the point of intersection. As you move along the abscissa (the X axis) and
observe changes in the Y scores, you will notice a systematic change. For each
unit change in X, there are two unit changes in Y. This systematic change,
written B in the equation of the line, is the slope of the line.
The equation of a line, written in obtained scores is
![]()
For the example Y = 2X + 1. To simplify the above equation,
transform the obtained scores to deviation scores. This transformation
preserves the slope of the line and transfers the intercept to the origin of
the coordinate system. The equation for a line in deviation scores is
![]()
In the above equation b = B and a = 0. Since the intercept of a
line in deviation scores is always equal to zero, it falls out of the above
equation which, in its full form, reads y = bx + a. Note that the notation for
X and Y has changed to x and y. This reflects the use of deviation scores as
transformed from the obtained scores.
The equation in
obtained scores was Y = 2X + 1. In deviation scores, the intercept equals 0,
the slope remains unchanged, and the new equation for the same line is y = 2x.
This is summarized and plotted in the table and figure below.

Compare the line
plotted using deviation scores with the line plotted using obtained scores.
Notice that the intercept of the line plotted using deviation scores was
transformed to the origin of the system of coordinates while the slope remained
the same. The linear transformation of obtained to deviation scores thus may be
visualized as a shift of a line to the origin of the Cartesian system of
coordinates, preserving the slope of the line.
Standardization of
a linear relationship preserves the zero intercept, and standardizes the slope
to a unity. The equation for a line in standard form can be written as
where the slope, beta, is equal to one, and the intercept, alpha,
is always equal to zero. The above equation is frequently written as
The data points for a line in standard scores are computed in the
table below.
The standard scores
for this example that exemplifies a perfect linear relationship are plotted
below.

Since the scores defining both coordinates are identical standard
scores with means of zero, the line has zero intercept with slope equal to one.
Linear relationships, stated in the analytic form as the equations
of a line, indicate perfect relationships. Perfect linear relationships are
conceptualized within the framework of statistical theory by expressing the
slope of a line as a ratio of
variances of variables X and Y. The equation of the line, using statistical
notation in the form of standard score scores is written as
![]()
Its slope could have been written as a ratio of two variances.
However, since the variance of standard variables always equals one, the slope
equals one and is implied (not written). Substituting deviation scores for
standard scores
(
and
),
![]()
the equation of a line in deviation score form can be written as
![]()
In the above equation, the slope is equal to
![]()
and the intercept, located at the origin of the Cartesian
coordinates, is equal to zero.
Further
substitutions can convert this equation at the level of deviation scores to
obtained score form. By substituting x = X - Mx and y = Y - My, the line can be written in terms of means
and standard deviations
![]()
and, moving the My to the
right side while changing its sign, as
![]()
The slopes of the lines in both deviation scores and obtained
scores are equal, (i.e., b = B), and thus
![]()
To define the intercept, substitute B for the
ratio
and multiply the term in parentheses by B as
![]()
Compare the above equation with the analytical equation for the
line
![]()
by equating their right sides as
![]()
Canceling the BX terms on the both sides defines the intercept as
![]()
Linear transformations are frequently employed to convert
measurements using different units. For example, the analytic equation for
conversion of pounds (X) to kilograms (Y) is Y = .45 X and the linear analytic
equation for changing miles (X) to kilometers (Y) is Y = 1.6 X. The analytical
equation for translation between degrees of Celsius and Fahrenheit is Y = 1.8 X
+ 32. Knowledge of conversion equations between various measurement systems can
sometimes acquire an urging concern. Imagine finding yourself on a tropical
island with a sick child running a high fever. A thermometer you bought from a
local drug store is calibrated in degrees of Celsius. After giving up the
frantic search for the misplaced dictionary that may or may not have had the
conversion equation, you happen to spot the travel brochure opened on the page
listing the annual mean temperatures on the island in both the degrees of
Celsius and Fahrenheit:
From the statistical course you took before going on the summer
vacation you remember that, for perfectly linear relationships
This knowledge, complemented by the knowledge of the equations for
translation of obtained scores to deviation scores
and of deviation scores to standard scores
together with the knowledge of elementary algebra, will allow you
to reconstruct the necessary conversion equations.
The unknown values in the equation for statistical conversion of
units of measurement can be obtained from the data in the travel brochure by
simply computing the means and variances of the temperatures expressed in
degrees of Celsius and in the degrees of Fahrenheit. The computed means and
variances, for any season, can be entered into the statistical conversion
equation. We selected temperatures for the Fall, as they show the greatest
variability. We entered the values into the conversion equation as (5.79 / 3.3)
(X - 26.67) + 80.33 which, simplified, equals Y = 1.75X + 33.54. Thermometer
sticking from buttocks of your child reads 39 degrees Celsius. Substituting 39
for X results in Y = 1.75(39) + 33.54 = 101.8, the sought after degrees of
Fahrenheit.
To contrast the
difference between linear transformations within statistical and analytical
frameworks, compare the results obtained from the analytical conversion
equation Y = 1.8 X + 32 (102) with the results obtained from the empirical,
statistical approach (101.8). The results are pretty close, well within the
expected measurement and rounding errors.
Analytic and statistical equations of a line are summarized in the
table below. The properties of a line, described by using notation and concepts
of analytic geometry, are summarized in the upper part of this table. The key
relationships pertaining to the statistical equations of a line are summarized
in the lower part of the same table.
|
Equations of a
Line |
Obtained
Scores |
Deviation
Scores |
Standard Scores |
|
Analytic |
|
|
|
|
Statistical |
|
|
|
|
Slope |
|
|
|
|
Intercept |
|
|
|
These equations are special cases of equations for statistical
prediction to be discussed in subsequent chapters.