Regression: Structural Assumptions


Linear Relationship

The general linear model subsumes most of the methods of statistical analysis. It has been used across disciplines with few modifications for about a century. Its tenets are simple: orthogonal coordinates in multidimensional space define elements of data matrices. They are linearly related, and are analyzed in such a manner as to minimize error.

Since it is difficult to visualize hyperspace, it is helpful initially to analyze elements of the general linear model in minute detail for the cases of one, two, and three variables to facilitate generalizations of these properties from their linear, planar, and three-dimensional space representations to multidimensional space later. In the present chapter, some of the formal aspects of the general model will be introduced. We begin with the description of a line.

 

A Line

One of the simplest equations of analytic geometry is the equation of a line

 

 

where B is the slope and A is the intercept. To illustrate the use of linear equations in data analysis, let us consider graph plotted in the obtained scores for two variables, X [0 .5 1 1.5 2] and Y, [1 2 3 4 5], shown in the table below.


Data Set


Data Visualization     

Intercept

The line connecting the plotted data points intersects the ordinate (Y axis) at Y = 1.00. This point is known as the intercept, A. The intercept is measured by the distance from the origin (0,0) to the location of the point of intersection.

Slope

As you move along the abscissa (the X axis) and observe changes in the Y scores, you will notice a systematic change. This systematic change, written B in the equation of the line, is the slope of the line. For our example, B = 2. That is, for each unit change in X, there are two unit changes in Y. 

 

Line in the Obtained, Deviation, and Standard Scores

Line in Obtained Scores

The equation of a line, written in obtained scores is

 

 

For the example Y = 2X + 1.

Line in Deviation Scores

To simplify the above equation, transform the obtained scores to deviation scores. This transformation preserves the slope of the line and transfers the intercept to the origin of the coordinate system. The equation for a line in deviation scores is

 

 

In the above equation b = B and a = 0. Since the intercept of a line in deviation scores is always equal to zero, it falls out of the above equation which, in its full form, reads y = bx + a. Note that the notation for X and Y has changed to x and y. This reflects the use of deviation scores as transformed from the obtained scores.

Comparisons

The equation in obtained scores was Y = 2X + 1. In deviation scores, the intercept equals 0, the slope remains unchanged, and the new equation for the same line is y = 2x. This is summarized and plotted in the table 

 

 

and figure below.

 

 

Intercept and Slope

Compare the line plotted using deviation scores with the line plotted using obtained scores. Notice that the intercept of the line plotted using deviation scores was transformed to the origin of the system of coordinates while the slope remained the same. The linear transformation of obtained to deviation scores thus may be visualized as a shift of a line to the origin of the Cartesian system of coordinates, preserving the slope of the line.

Line in Standard Scores

Standardization of a linear relationship preserves the zero intercept, and standardizes the slope to a unity. The equation for a line in standard form can be written as

 

 

where the slope, beta, is equal to one, and the intercept, alpha, is always equal to zero. The above equation is frequently written as

 

The data points for a line in standard scores are computed in the table below.

 

 

The standard scores for this example that exemplifies a perfect linear relationship are plotted below.

 

 

Since the scores defining both coordinates are identical standard scores with means of zero, the line has zero intercept with slope equal to one.

 

Statistical Equations for Perfect Linear Relationships

Linear relationships, stated in the analytic form as the equations of a line, indicate perfect relationships. Perfect linear relationships are conceptualized within the framework of statistical theory by expressing the slope of a line as a ratio of variances of variables X and Y. The equation of the line, using statistical notation in the form of standard score scores is written as

 

 

Its slope could have been written as a ratio of two variances. However, since the variance of standard variables always equals one, the slope equals one and is implied (not written). Substituting deviation scores for standard scores

 

(and ),

 

the equation of a line in deviation score form can be written as

 

 

In the above equation, the slope is equal to

 

 

and the intercept, located at the origin of the Cartesian coordinates, is equal to zero. Further substitutions can convert this equation at the level of deviation scores to obtained score form. By substituting x = X - Mx and y = Y - My, the line can be written in terms of means and standard deviations

 

 

and, moving the My to the right side while changing its sign, as

 

 

The slopes of the lines in both deviation scores and obtained scores are equal, (i.e., b = B), and thus

 

 

To define the intercept, substitute B for the ratio and multiply the term in parentheses by B as

 

 

Compare the above equation with the analytical equation for the line

 

 

by equating their right sides as

 

 

Canceling the BX terms on the both sides defines the intercept as

 

 

About an Island Bisected by the Tropic of Cancer

Linear transformations are frequently employed to convert measurements using different units. For example, the analytic equation for conversion of pounds (X) to kilograms (Y) is Y = .45 X and the linear analytic equation for changing miles (X) to kilometers (Y) is Y = 1.6 X. The analytical equation for translation between degrees of Celsius and Fahrenheit is Y = 1.8 X + 32. Knowledge of conversion equations between various measurement systems can sometimes acquire an urging concern.

Imagine finding yourself on a tropical island with a sick child running a high fever. A thermometer you bought from a local drug store is calibrated in degrees of Celsius. After giving up the frantic search for the misplaced dictionary that may or may not have had the conversion equation, you happen to spot the travel brochure opened on the page listing the annual mean temperatures on the island in both the degrees of Celsius and Fahrenheit:

 

 

From the statistical course you took before going on the summer vacation you remember that, for perfectly linear relationships

 

 

This knowledge, complemented by the knowledge of the equations for translation of obtained scores to deviation scores

 

 

and of deviation scores to standard scores

 

 

together with the knowledge of elementary algebra, will allow you to reconstruct the necessary conversion equations.

 

 

The unknown values in the equation for statistical conversion of units of measurement can be obtained from the data in the travel brochure by simply computing the means and variances of the temperatures expressed in degrees of Celsius and in the degrees of Fahrenheit.

The computed means and variances, for any season, can be entered into the statistical conversion equation. We selected temperatures for the Fall, as they show the greatest variability. We entered the values into the conversion equation as (5.79 / 3.3) (X - 26.67) + 80.33 which, simplified, equals Y = 1.75X + 33.54. Thermometer sticking from buttocks of your child reads 39 degrees Celsius. Substituting 39 for X results in Y = 1.75(39) + 33.54 = 101.8, the sought after degrees of Fahrenheit.

To contrast the difference between linear transformations within statistical and analytical frameworks, compare the results obtained from the analytical analytical conversion equation Y = 1.8 X + 32 (102) with the results obtained from the empirical, statistical approach (101.8). The results are pretty close, well within the expected measurement and rounding errors.

 

Summary

Analytic and statistical equations of a line are summarized in the table below. The properties of a line, described by using notation and concepts of analytic geometry, are summarized in the upper part of this table. The key relationships pertaining to the statistical equations of a line are summarized in the lower part of the same table.

 

Equations of a Line

Obtained Scores

Deviation Scores

Standard Scores

Analytic

Statistical

Slope

Intercept

 

These equations are special cases of equations for statistical prediction to be discussed in subsequent chapters.