Cruise Scientific      Visual Statistics Studio      Visual Statistics

Krus, D. J. & Ko, H. O. (1983) Algorithm for autocorrelation of secular trends. Educational and Psychological Measurement, 43, 821-828.


ALGORITHM FOR AUTOCORRELATION ANALYSIS OF SECULAR TRENDS

David J. Krus and Hyon O. Ko
Arizona State University

 

An algorithm for autocorrelation analysis of time series data was presented and compared with two alternative computational methods. The algorithm was based on continuous variance adjustment following withdrawal of an element from the lagged series. The algorithm not only reduces the number of computations during the repeated passes through the data vector but also avoids a dampening effect toward the end of the series.

 

Computation of autocorrelations is an integral part of a majority of analyses of secular trends. The autocorrelation analysis is used to determine whether cyclic components are present in the series or whether apparent fluctuations are random.

There are two standard types of autocorrelation routines, constant variance routines and product-moment routines. The frequently used constant variance routines (Makridakis and Wheelwright, 1978; Southworth, 1960) estimate the autocorrelation function  from the formula 

(1)

 

 

where  is an element of a time series vector at time t. M is the mean of all data, k is the degree of time lag, and n is the number of observations. As commented by McCleary and Hay (1980, p. 66),

“for a given lag k, however, variance (the denominator of the formula) s estimated over all n observations while covariance (the numerator of the formula) is estimated over only n-k pairs of observations. Strictly speaking, then, the autocorrelation function is not the familiar Pearson product-moment correlation coefficient between time series observations units apart.”

The product-moment routines (Veldman, 1967) compute the auto-correlation function  using formula

(2)                                                                                                                     

where  are true Pearson’s product-moment coefficients of correlation.

As contrasted with the product-moment routines, the constant variance routines are fast, as only the covariance term has to be computed during the lagged passes through the data. However, divergence of the values of  and  coefficients rapidly increases toward the end of the series; Makridakis and Wheelwright (1978. p. 35) have not recommended the use of the  coefficients for .

The purpose of the present communication is to describe an algorithm for computation of the true product-moment autocorrelation coefficients throughout the full range of the lagged series, at speeds comparable to those possible with the constant variance routines.

 

Adjustment of Mean and Variance Following the Withdrawal of an Element from a Lagged Series

A schematic representation of a lagged relationship is shown below:

 

The data vector is slipped and the first and the last elements of the time series data are removed. The Pearson product-moment coefficients of correlation  (Eq. (2)) are computed for the remaining elements of the lagged secular data. This process can be repeated up to n-3 times when the set is reduced to two elements.

Computation of product-moment coefficients for autocorrelation analysis by using Eq. (2) necessitates recalculation of means, variances, and covariances of the lagged series during each pass through the data. The intent of the present communication is to suggest that the means and variances of the lagged series instead of being recalculated n-3 times can be updated, following principles used in the partitioning of variance in the regression analysis. A schematic representation of relationships of the combined point and continuous distributions which must be considered is shown below:

 

 

The parent vector P indexes the withdrawal vector  (of the point size, containing a single element). The vector  contains values of the lagged series. Vector  represents the time series before each lagging step. Vectors  and  consist of scores which can and cannot be determined from the knowledge of the values of the parent vector P, respectively. The relationships under consideration conform to the rules for combination of distributions, adjusted for the fact that one distribution is a point distribution (Krus and Ceurvorst, 1978. pp. 816-817) and that the distributions are subtracted, not combined.

 

Adjustment of the Mean of the Lagged Series

Computation of the mean of the lagged series is straightforward. Following withdrawal of the top  (or bottom) element from the series, the mean M, is updated as

(3)

 

 

 

Adjustment of the Variance of the Lagged Series

To adjust the variance of the lagged series, relationships between variances of the data vector  and its determined  and alienated  components must be considered. Since elements of  consist of means of  and  distributions, its variance can be written as

(4)

 

 

Variance of the  component of  can be written as

(5)

 

 

The variances of both the  and  components of  are additive, thus

(6)

 

 

Since the point distribution  contains only a single element, its variance equals zero and the variance of the  can be written (Eq. (5)) as

(7)

 

 

Transposing  and  in Equation (6) and substituting its left side with the right side of Equation (7) results in

(8)

 

 

Substituting the right side of Equation (4) for  and simplifying allows for computation of the variance of the lagged distribution  as

(9)

 

 

Using Equation (9), one can update the variance of the lagged distribution and can avoid re-computing the repeated passes through the time series data vector.

To compute the autocorrelation, two variances have to be updated: variance of the series with its first element removed and variance of the series with its last element removed. Only the  product has to be cumulated during the pass through a lagged series to obtain all the information necessary to compute autocorrelation from using Pearson’s product-moment formula.

Code for the Autocorrelation Coefficients

A subroutine for computation of the product-moment autocorrelation coefficients is presented in Table 3. The arguments of the AUTO subroutine are the input data array N, the output array of autocorrelations R, and the number of observations, N. The mnemonics for variables used in the AUTO subroutine are A for the mean (average), V for variance, and CV for covariance. Single indices A and V, stand for initial values, doubled indices AA and VV stand for updated values, and the trailing numerals as in A1, V1, AAI. VV1, A2, V2, AA2, and VV2 indicate whether the updating pertained to removal of the top (index 1) or the bottom (index 2) element of the array X. The variable K indexes the number of lags.

 

Table 3. Subroutine for computation
of the autocorrelations by using the
  updated means and variances.


         
SUBROUTINE AUTO (X,R,N)
          REAL*4 X(N),R(N)

    C   Mean and variance of the series
          A1 =0.0
          V1 =0.0
          DO 10 I=1,N
          A1
= A1 +X(I)
   10  V1= V1 +X(I)**2
          A1= A1/N
          V1=V1/N-A1**2

          IF (V1.LE.0.0) STOP ‘Zero variance in data.’
         
          A2 = A1
          V2=V1
C       Calling sequence
          K=0
          N=N+1
      
1 K=K+I
           N=N-1
           IF (N.LE.3) GOTO 7

C        Lagged means
          AA1=(N*A1-X(K))/(N- I)
          AA2=(N*A2- X(N))/(N -1)

C       Lagged variances
           VV1=(N*V1-(X(K)-A1)**2)/(n-1)-(AA1-A1)**2
           IF(VV1)7,7,2

2  VV2=(N*V2-(X(N)-A2)**2)I(N- I)-(AA2-A2)**2
    IF(VV2)7.7,3

C   Covariance
     3    CV=0.0
           DO 12 I=1, N-1
   12   CV=CV+X(I)*X(I+K)
           CV=CV/(N-1)

C        Correlation
            R(K)=(CV=AA1*AA2)/SQRT(VV1*VV2)

C        Replace means and variances
           A1=AA1
           A2=AA2
           V1=VV1
           V2=VV2
           GOTO 1

C        Closing sequence
    7     N=K-1
           RETURN
           END

 

Discussion

The described procedure allows for a minimum number of computations during the repeated passes through the data. To compare its performance with the constant variance routine and the standard product-moment routine, a three-cycle sinusoid wave was generated and analyzed by each of the three discussed algorithms. Analysis with the constant variance formula (Eq. (1)) took 17 seconds, and the product-moment routine with recomputed variances (Eq. (2)) used 24 seconds for analysis of the same data. The product-moment routine with updated variances analyzed the data in 15 seconds. The time differences in performance of each discussed routine tend to magnify with larger time series studies. Thus, for a time series analysis of 464 elements, the constant variance method required about 6 minutes. The calculation of the product-moment coefficients with recomputed variances took about 9 minutes. Calculation of the autocorrelation function using updated variances lasted about 3 minutes. However, an issue far more important than the speed of computation is in question pertains to the conceptualization of variances to be used in the denominator of the autocorrelation function and in the formula for the estimation of its standard error. The common practice is to adapt the constant variance for both the autocorrelation function and its standard error. This procedure results in parallel lines indicating a confidence interval extended throughout the series; autocorrelation coefficients are damped throughout the middle and end of the series. An implicit assumption behind this practice is that the dampening of the autocorrelation function is equivalent to the wedge-like opening of the confidence interval so that it would envelop nearly the whole -1 to +1 range of the autocorrelation function toward the end of the series.

However, this assumption was not substantiated when errorless trend data (as, e.g., a sinusoid wave) were compared with trends generated at random. Unlike the product-moment routine, the constant variance method was unable to differentiate among errorless and random trends in the latter parts of the series. This shortcoming of the constant variance method is particularly pronounced in the case of analysis of complicated and lengthy secular trends. Thus, e.g., it was not possible to replicate Moyal’s (1949) autocorrelation study of Wright’s (1942) list of wars for the 1480-1942 interval by using the constant variance method (cf., also Denton and Phillips, 1968 and Krus and Blackman, 1980).

The analysis of time series within the interactive computer environment is part of the recent theoretical and methodological developments within data analysis pertaining to social science research and brings the ideal of the causal modeling within the grasp of the social scientists.

REFERENCES

Denton, F. H. and Phillips, W. Some patterns in the history of violence. Journal of Conflict Resolution, 1968, 12, 182-15.

Krus, D. J. and Blackman, H. S. Time-scale factor as related to theories of societal change. Psychological Reports,1980, 46, 95-102.

Krus, D. J. and Ceurvorst, R. W. Computed assisted construction of variable norms. Educational and Psychological Measurement, 1978. 38, 815-818.

Makridakis, S. and Wheelwright, S. C. Interactive forecasting: Univariate and multivariate methods. (2nd ed.). San Francisco: Holden-Day, 1978.

McCleary, R. and Hay, R. A. Applied time series analysis for the social sciences. Beverly Hills, Ca: Sage, 1980.

Moyal, J. E. The distribution of wars in time. Journal of the Royal Statistical Society, 1949, 112, 446-458.

Southworth, R. W. Autocorrelation and spectral analysis. In A. Ralston and H. S. Wilf (Eds.) Mathematical methods for digital computers. New York: Wiley, 1960.

Veldman, D. J. Fortran programming for the behavioral sciences. New York: Holt, Rinehart and Winston, 1967.

Wright, Q. A study of war. Chicago: University of Chicago Press, 1942.