Krus, D. J. & Ko, H. O. (1983) Algorithm
for autocorrelation of secular trends. Educational
and Psychological Measurement, 43, 821-828.
ALGORITHM FOR AUTOCORRELATION ANALYSIS OF SECULAR TRENDS
David J. Krus and Hyon O. Ko
An algorithm for autocorrelation analysis of time series data was presented and compared with two alternative computational methods. The algorithm was based on continuous variance adjustment following withdrawal of an element from the lagged series. The algorithm not only reduces the number of computations during the repeated passes through the data vector but also avoids a dampening effect toward the end of the series.
Computation of autocorrelations is an integral part of a majority of analyses of secular trends. The autocorrelation analysis is used to determine whether cyclic components are present in the series or whether apparent fluctuations are random.
There are two standard types of autocorrelation routines, constant
variance routines and product-moment routines. The frequently used constant
variance routines (Makridakis and Wheelwright, 1978; Southworth, 1960) estimate
the autocorrelation function from
the formula
|
|
|
where
is an element of a time series vector at time t.
M is the mean of all data, k is the degree of time lag, and n is
the number of observations. As commented by McCleary and Hay (1980, p. 66),
“for a given lag k, however, variance (the denominator of the formula) s estimated over all n observations while covariance (the numerator of the formula) is estimated over only n-k pairs of observations. Strictly speaking, then, the autocorrelation function is not the familiar Pearson product-moment correlation coefficient between time series observations units apart.”
The
product-moment routines (Veldman, 1967) compute the auto-correlation
function using
formula
where
are true Pearson’s product-moment coefficients
of correlation.
As contrasted with the product-moment routines, the constant variance
routines are fast, as only the covariance term has to be computed during the
lagged passes through the data. However, divergence of the values of and
coefficients rapidly increases toward the end
of the series; Makridakis and Wheelwright (1978. p. 35) have not recommended
the use of the
coefficients for
.
The purpose of the present communication is to describe an algorithm for computation of the true product-moment autocorrelation coefficients throughout the full range of the lagged series, at speeds comparable to those possible with the constant variance routines.
Adjustment of Mean and Variance Following the Withdrawal of an Element from a Lagged Series
A schematic representation of a lagged relationship is shown below:
The
data vector is slipped and the first and the last elements of the time series
data are removed. The Pearson product-moment coefficients of correlation (Eq. (2)) are
computed for the remaining elements of the lagged secular data. This process
can be repeated up to n-3 times when the set is reduced to two elements.
Computation of product-moment coefficients for autocorrelation analysis by using Eq. (2) necessitates recalculation of means, variances, and covariances of the lagged series during each pass through the data. The intent of the present communication is to suggest that the means and variances of the lagged series instead of being recalculated n-3 times can be updated, following principles used in the partitioning of variance in the regression analysis. A schematic representation of relationships of the combined point and continuous distributions which must be considered is shown below:
|
|
|
The
parent vector P indexes the withdrawal vector (of
the point size, containing a single element). The vector
contains values of the lagged series. Vector
represents the time series before each lagging
step. Vectors
and
consist of scores which can and cannot be
determined from the knowledge of the values of the parent vector P, respectively.
The relationships under consideration conform to the rules for combination of
distributions, adjusted for the fact that one distribution is a point
distribution (Krus and Ceurvorst, 1978. pp. 816-817) and that the distributions
are subtracted, not combined.
Adjustment of the Mean of the Lagged Series
Computation
of the mean of the lagged series is straightforward. Following withdrawal of
the top (or bottom) element from the series, the mean M,
is updated as
|
(3) |
|
|
Adjustment of the Variance of the Lagged Series
To
adjust the variance of the lagged series, relationships between variances of
the data vector and its determined
and alienated
components must be considered. Since elements
of
consist of means of
and
distributions, its variance can be written as
|
|
|
Variance
of the component of
can be written as
|
|
|
The
variances of both the and
components of
are additive, thus
|
|
|
Since
the point distribution contains only a single element, its variance equals
zero and the variance of the
can be written (Eq. (5)) as
|
|
|
Transposing
and
in Equation (6) and
substituting its left side with the right side of Equation (7) results in
|
(8) |
|
|
Substituting
the right side of Equation (4)
for and simplifying allows for computation of the
variance of the lagged distribution
as
|
|
|
Using Equation (9), one can update the variance of the lagged distribution and can avoid re-computing the repeated passes through the time series data vector.
To
compute the autocorrelation, two variances have to be updated: variance of the series
with its first element removed and variance of the series with its last element
removed. Only the product has to be cumulated during the pass
through a lagged series to obtain all the information necessary to compute
autocorrelation from using Pearson’s product-moment formula.
Code for the Autocorrelation Coefficients
A subroutine for computation of the product-moment autocorrelation coefficients is presented in Table 3. The arguments of the AUTO subroutine are the input data array N, the output array of autocorrelations R, and the number of observations, N. The mnemonics for variables used in the AUTO subroutine are A for the mean (average), V for variance, and CV for covariance. Single indices A and V, stand for initial values, doubled indices AA and VV stand for updated values, and the trailing numerals as in A1, V1, AAI. VV1, A2, V2, AA2, and VV2 indicate whether the updating pertained to removal of the top (index 1) or the bottom (index 2) element of the array X. The variable K indexes the number of lags.
Table 3. Subroutine for computation
of the autocorrelations by using the
updated
means and variances.
|
IF
(V1.LE.0.0) STOP ‘Zero variance in data.’ C Lagged means 2 VV2=(N*V2-(X(N)-A2)**2)I(N- I)-(AA2-A2)**2 |
Discussion
The described procedure allows for a minimum number of computations during the repeated passes through the data. To compare its performance with the constant variance routine and the standard product-moment routine, a three-cycle sinusoid wave was generated and analyzed by each of the three discussed algorithms. Analysis with the constant variance formula (Eq. (1)) took 17 seconds, and the product-moment routine with recomputed variances (Eq. (2)) used 24 seconds for analysis of the same data. The product-moment routine with updated variances analyzed the data in 15 seconds. The time differences in performance of each discussed routine tend to magnify with larger time series studies. Thus, for a time series analysis of 464 elements, the constant variance method required about 6 minutes. The calculation of the product-moment coefficients with recomputed variances took about 9 minutes. Calculation of the autocorrelation function using updated variances lasted about 3 minutes. However, an issue far more important than the speed of computation is in question pertains to the conceptualization of variances to be used in the denominator of the autocorrelation function and in the formula for the estimation of its standard error. The common practice is to adapt the constant variance for both the autocorrelation function and its standard error. This procedure results in parallel lines indicating a confidence interval extended throughout the series; autocorrelation coefficients are damped throughout the middle and end of the series. An implicit assumption behind this practice is that the dampening of the autocorrelation function is equivalent to the wedge-like opening of the confidence interval so that it would envelop nearly the whole -1 to +1 range of the autocorrelation function toward the end of the series.
However, this assumption was not substantiated when errorless trend data (as, e.g., a sinusoid wave) were compared with trends generated at random. Unlike the product-moment routine, the constant variance method was unable to differentiate among errorless and random trends in the latter parts of the series. This shortcoming of the constant variance method is particularly pronounced in the case of analysis of complicated and lengthy secular trends. Thus, e.g., it was not possible to replicate Moyal’s (1949) autocorrelation study of Wright’s (1942) list of wars for the 1480-1942 interval by using the constant variance method (cf., also Denton and Phillips, 1968 and Krus and Blackman, 1980).
The analysis of time series within the interactive computer environment is part of the recent theoretical and methodological developments within data analysis pertaining to social science research and brings the ideal of the causal modeling within the grasp of the social scientists.
REFERENCES
Krus, D. J. and Blackman, H. S. Time-scale factor as related to theories of societal change. Psychological Reports,1980, 46, 95-102.
Krus, D. J. and Ceurvorst, R. W. Computed assisted construction of variable norms. Educational and Psychological Measurement, 1978. 38, 815-818.
Makridakis,
S. and Wheelwright, S. C. Interactive forecasting: Univariate and
multivariate methods. (2nd ed.).
McCleary, R. and Hay, R. A. Applied time
series analysis for the social sciences.
Moyal, J. E. The distribution of wars in time. Journal of the Royal Statistical Society, 1949, 112, 446-458.
Southworth, R. W. Autocorrelation and
spectral analysis. In A. Ralston and H. S. Wilf (Eds.) Mathematical methods
for digital computers.
Veldman, D. J. Fortran programming for
the behavioral sciences.
Wright, Q. A study of war.