The t-Test for Two Independent Samples

 

Randomly select two groups of patients, diagnosed with different diseases (disease0 and disease1). Their body temperatures are recorded as shown below.

 

 

Variables

The independent variable is the type of disease patients are suffering from. It can be coded as 0 and 1. 

The dependent variable is body temperature. To be specific, column Y0 indicates that the temperatures were obtained from the patients diagnosed with disease0. Column Y1 indicates that the temperatures were obtained from the patients diagnosed with disease1.

Research Question

Is there a significant difference in body temperature between the two groups of patients?
 

The Null Hypothesis

There is no difference in body temperature between two groups. 

 

The Alternative Hypothesis

There is a significant difference in body temperature between two groups. 
 



You are unsure about the direction in which a mean should differ from the other. A two-sided test is recommended. 

Set a significance level

Use a .05 significant level.


Definitional Formula


The t Square Ratio 

Express the t square ratio in terms of the coefficients of determination and alienation. 
 

Advantages: The t square ratio directly provides the information about the effect size (r2) and the knowledge of the t square ratio can be readily transferred to learning the properties of the F test.

The t square ratio shown above is more parsimonious than the traditional formula shown below
 


Note that both formulae will result in an identical t value.

Data Set

First, create a coded predictor variable X to index the type of disease (Disease 0 and Disease 1). Second, combine body temperatures from two groups into one total group.


a. Independent Variable: X (Type of Illness).

b. Dependent Variable: Y (Body Temperature)  


Regression on Categories Approach

Predict Y from X. We will split the criterion variable Y into predicted part (Y') and the error part (Y^). 

a. Predicted Values

With group membership information, we are able to compute the predicted group means.

Group 0: (99+100+101)/3 = 100
Group 1: (101+102+103)/3 = 102

The predicted variable Y' consists of group means, 100 and 102.

b. Error Scores

The error variable Y^ is computed as Y - Y'. 

Compute the Variances for Variables Y, Y' and Y^

The variability carries information about phenomena that the data describe.

Total Variance

In the absence of any relevant information, the best prediction of the outcome is the mean. The overall mean of the total group is 101.

The total variance of the criterion variable (Y) is _____ (1.667).

How to compute?

The variance is computed as the mean of the squared deviation scores. 

Predictable Variance

With group membership information, we are able to compute the predicted group means. The predicted variable Y' consists of group means, 100 and 102.

The variance of the predicted variable (Y') is ______. (1)

How to compute?

The variance is computed as the mean of the squared deviation scores. 

The variance due to different group means (type of disease) is 1.

Error Variance

The error variable Y^ is computed as Y - Y'. The variance which can not be explained by the predictor X is called error variance.

The variance of the error variable (Y^) is equal to _____ (.667). The variance is computed as the mean of the squared deviation scores. 
 

The error variance is equal to .667.

Variance Components

Total Variance  = Predictable Variance + Error Variance

1.667 = 1 + .667


Examine the Standard Variances

Divide the variance components by the total variance of the criterion variable Y. 

1.667/1.667 = 1/1.667 + .667/1.667

1 = .60 + .40

About 60 percent of the variance in body temperature was accounted for by knowing the type of disease. Approximately 40 percent of the variance in body temperature was likely due to other, unidentifiable factors.

Form a t-square ratio


t
 (4) = 2.45


Locate the position of the calculated t value in the t distribution with 4 degrees of freedom. Approximately 7% of the area under the t distribution lies at or beyond t's of +-2.45. The observed probability is larger than .05.

Report the results.

An independent-samples t test was conducted to determine whether there was a significant difference in body temperature between the two groups of patients. The mean body temperature was 100 degrees Fahrenheit (SD = 1.00) for patients diagnosed with Disease 0  and 102 degrees Fahrenheit  (SD = 1.00)  for Disease 1. The difference between the means was not statistically significant, t(4) = 2.45, p > .05. About 60 percent of the variance in body temperature was accounted for by knowing the type of disease.
 

Computational Formula


SPSS For Windows

Input

Output

Group Statistics

Independent Samples t Test


Levene's test is used to test the null hypothesis that the two population variances are equal. The null hypothesis was not rejected, F = 0, p = 1. The assumption of equality of variances was met.

 

Using the Traditional Formula to Compute the t Value

Suppose all possible random samples of size 3 have been drawn in independent pairs from each population. First, compute sample means. Next, the difference between each pair of means can be computed. Third, form a sampling distribution of the difference

  • The mean of the sampling distribution of the difference would be zero.
     
  • The estimated standard error of the difference can be computed as 


The standard error of the difference equals .8165.

  • Compute the t ratio. 
     




t = -2 / .8165 = -2.449

df = 6 - 2 = 4

  • Locate the position of the calculated t value in the t distribution with df = 4. The observed significance level was .07 and was larger than .05. What would you conclude?
     

Compute a 95% Confidence Interval of the Difference

 

Output

 


 


How to Compute a 95% Confidence Interval of the Difference

1. Obtain the critical values

Set the degrees of freedom to 4 and the significance to .025. The cutoff t values for the bottom 2.5% and the top 2.5% was -2.78 and +2.78.

 
2. A 95% confidence interval reaches approximately 2.78 standard errors on either side of the difference. 




We knew that the mean difference was -2 and the standard error of the difference was .8165. Thus, the 95% confidence interval for the difference between two population means was ________. 

 

Note that the 95% confidence interval for the difference between two population means included zero. (Recall that we failed to reject the null hypothesis of no difference.)

 


 

Optional Reading

Correlation and Causation by Jason Newsom

Point-biserial correlation by Jason Newsom