The t-Test for Two Independent Samples

 

Consider a hypothetical experiment pertaining to a change in attitudes following a persuasive communication.

Random selection and random assignment

Prior to the onset of the experiment, ten subjects were randomly selected and randomly assigned to one of the two groups. Assume that there is no difference between the two groups in the initial attitudes toward animal slaughter.

Why random selection? Why random assignment? 

Optional Reading: Producing Data by Surfstat Australia

Manipulate the independent variable and measure the dependent variable

One group watched a movie about killing baby seals in the Arctic (the experimental group), and the second group watched a movie about the migration of caribou (the placebo group). Subsequent to viewing the movies, both groups responded to a questionnaire measuring agreement and disagreement for various arguments justifying and opposing the hunting and killing of animals. Higher scores on the questionnaire represent rationalization of the animal slaughter

If all factors except one are kept constant and the events systematically change as a result of manipulation of that factor, then the change can be ascribed to that particular factor (experimental condition or treatment) and some degree of causality can be inferred. 

 

Data set

Research Question

Does watching the movie about killing baby seals change viewers' attitudes toward the hunting and killing of animals? Specifically, will the group watching a movie about killing baby seals score lower on the questionnaire?

Conduct an independent t-test. Use a .05 significant level.

 

SPSS for Windows

1. Input data. The variable `group` represents the group membership. The variable `score` is the dependent variable. 

 

2. From the menus choose: Analyze \ Compare Means \ Independent-Samples T Test.

3. Select the test variable `score`. Select the group variable `group`.

4. Define the categories of the grouping variable.

Click on Define Groups. Enter 0 and 1. Click on Continue.

5. Click on OK to get the default independent-samples t test with two-tailed probabilities and a 95% confidence interval for the difference in means.

Both equal- and unequal- variance t values are provided, as well as the Levene test for equality of variances.

 

Outputs

Group Statistics

Examine the group means and standard deviations.

  • Group Mean

Higher scores on the questionnaire represent rationalization of the animal slaughter.

The group watching a movie about killing baby seals scored lower (M1 = 28, SD = 3.24) on the questionnaire than did the group viewing a movie about the migration of caribou (M0 = 32.8, SD = 3.11).

  • Equality of Variances

Most computer programs routinely check for equality of variances for both groups before computing a t-test.  Levene's test is used to test the null hypothesis that the two population variances are equal.

Levene's Test for Equality of Variances: F = .076, p = .79

The observed significance level for the F test was larger than .05 (the preset significance level). The null hypothesis was not rejected, F = .076, p = .79. The two population variances were equal. The assumption of homoscedasticity was met.

(When there are unequal group variances, various methods of separate variance estimates were proposed by Cochran and Cox, Behrens and Fisher, and by Welch to compensate for the lack of homoscedascity. The method of Welsh gained wide recognition, perhaps because it is implemented by most computer packages for statistical analysis, notably by the SPSS.)

  • Independent Samples Test

    An independent-samples t test was conducted to evaluate whether watching the movie about killing baby seals would change viewers' attitudes toward animal slaughter. Higher scores on the questionnaire represent rationalization of the animal slaughter
     

 

Using the Traditional Formula to Compute the t Value

Suppose all possible samples of size 5 were taken in independent pairs from each population. Next, compute the sample means and the differences between pairs of means

The resulting distribution of mean differences would be called the sampling distribution of the difference.

Mean

The mean of the sampling distribution of the difference would be zero.

Standard Deviation

The standard deviation of the sampling distribution of the difference would be called standard error of the difference. Choose one of the following formulas to compute the standard error of the difference.  

a. Assume that two populations have the same standard deviation. The estimated standard error of the difference can be computed as


The standard error of the difference equals 2.01.

b. Using separate variance estimates, the standard error of the difference can be computed as
 


 

Recall that the variance of a difference between two variables is defined as the sum of the variances of their constituent variables minus twice their covariance. Since the two variables are uncorrelated (independent), the covariance term disappears from the above formula. 

Compute the t ratio.

t
= (mean difference) / (standard error of the difference)
 


 

t = 4.8 / 2.01 = 2.388.

Degrees of Freedom: 10 - 2 = 8.
 

Probability Associated with the t ratio.
 

Visual Table

Locate the position of the calculated t value in the t distribution with 8 degrees of freedom. The probability associated with a t-ratio of 2.39 or more was .022 (one-tailed). 
 

 

 

Approximately 2 percent of the time you would get a t ratio of 2.39 or more by chance.


SPSS output

The alternative hypothesis states that the group watching a movie about killing baby seals scores lower on the questionnaire than the group viewing a movie about the migration of caribou. It is a on-sided test.

.044 / 2 = .022 (one-tailed)
 

Since the probability associated with the t-ratio is less than .05, the researcher would reject the null hypothesis and declare the result to be statistically significant.

The group watching a movie about killing baby seals scored significantly lower (M1 = 28, SD = 3.24) on the questionnaire than did the group viewing a movie about the migration of caribou (M0 = 32.8, SD = 3.11), t(8) = 2.338, p = .022.


Percentage of variance accounted for

It is advisable to report both the t-value and the strength of the relationship (r) or the percentage of variation (r squared).

About 42% of variance in the dependent variable was accounted for by the experimental treatments.


95% Confidence Interval of the Difference

The 95% confidence interval of the difference between two population means was from.1650 to 9.435.
 



Notice that the 95 % confidence interval of the  difference between two population means did not include 0 (the null value). The null hypothesis value falls outside the 95% confidence interval of the difference. (Recall that the null hypothesis of no difference was rejected.)  
 

Summary

An independent-samples t test was conducted to evaluate whether watching the movie about killing baby seals would change viewers' attitudes toward animal slaughter. Higher scores on the questionnaire represent rationalization of the animal slaughter

The results indicated that the group watching a movie about killing baby seals scored significantly lower (M = 28, SD = 3.24) on the questionnaire than did the group viewing a movie about the migration of caribou (M = 32.8, SD = 3.11),  t(8) = 2.338, p = .022. Approximately 42 percent of variance was accounted for.

 

Data Visualization

Create error bar graphs and boxplots.

  • Error Bar Graph

Produce an error bar chart that shows a 95% confidence interval around the mean.

Choose Graphs \ Error Bar. Simple error bar and Summaries for groups of cases are the default. 

Click the Define button. Variable: Select score. Category Axis: Select group. 

In the Bars Represent area: Confidence Interval for mean is the default. Right click on it to view the information.
 


Level: 95% is the default. The error bar for each group can be shown below. 

 

 

  • Boxplot

Choose Graphs \ Boxplot. Simple boxplot and Summaries for groups of cases are the default. 

Click the Define button. Variable: Select score. Category Axis: Select group. Click OK.

There is not much overlap in the distributions for the two ratings.

A boxplot plots the the 25th percentile, the median (the 50th percentile), the 75th percentile, and outlying or extreme values.

1. The boundaries of the box indicates the 25th percentile and the 75th percentile. From the length of the box, you can determine the variability

2. The horizontal line inside the box represents the median. If the median is not in the center of the box, the distribution may be skewed.

3. Lines are drawn from the ends of the box to the largest and smallest values. These lines are called whiskers. 

4. Case numbers are used to label outliers and extremes. The extreme values are cases with the values more than 3 box-lengths from the 75th percentile or 25th percentile. The outliers are cases with the values between 1.5 and 3 box-lengths from  the 75th percentile or 25th percentile. Note that the boxplot did not detect any outliers or extremes.

 

Reading