The t-Test for Two Independent Samples
Randomly select two groups of patients, diagnosed with different diseases
(disease0 and disease1). Their body temperatures are
recorded as shown below.
Variables
The independent variable is the type of disease patients are suffering from.
It can be coded as 0 and 1.
The dependent variable is body temperature. To be specific, column Y0
indicates that the temperatures were obtained from the patients diagnosed
with disease0. Column Y1 indicates that the
temperatures were obtained from the patients diagnosed with disease1.
Research Question
Is there a significant difference in body temperature between the two groups
of patients?
The Null Hypothesis
There is no difference in body temperature between two groups.
The Alternative Hypothesis
There is a significant difference in body temperature between two
groups.
You are unsure about the direction in which a mean should differ from the
other. A two-sided test is recommended.
Set a significance level
Use a .05 significant
level.
Definitional Formula
The t Square Ratio
Express the t square ratio in terms of the coefficients
of determination and alienation.
Advantages: The t
square ratio directly provides the information about the effect size (r2)
and the knowledge of the t square ratio can be readily transferred to
learning the properties of the F test.
The t square ratio shown above is more
parsimonious than the traditional formula shown below
Note that both formulae will result in an identical t value.
Data Set
First,
create a coded predictor variable X to index the
type of disease (Disease 0 and Disease 1). Second, combine body
temperatures from two groups into one total group.
a. Independent Variable: X (Type of Illness).
b. Dependent Variable: Y (Body Temperature)
Regression on Categories Approach
Predict Y
from X. We will split the criterion variable Y into predicted part (Y')
and the error part (Y^).
a. Predicted
Values
With group
membership information, we are able to compute the predicted group
means.
Group 0: (99+100+101)/3 = 100
Group 1: (101+102+103)/3 = 102
The
predicted variable Y' consists of group means, 100 and 102.
b. Error
Scores
The error
variable Y^ is computed as Y - Y'.
Compute the
Variances for Variables Y, Y' and Y^
The variability carries information about phenomena
that the data describe.
Total Variance
In the
absence of any relevant information, the best prediction of the outcome is
the mean. The overall mean of the total group is
101.
The total variance of the criterion variable
(Y) is _____ (1.667).
How to compute?
The variance is computed as the mean of the squared deviation
scores.
Predictable
Variance
With group
membership information, we are able to compute the predicted group means.
The predicted variable Y' consists of group means,
100 and 102.
The variance of the predicted variable (Y') is ______. (1)
How to compute?
The variance is computed as the mean of the squared deviation
scores.
The variance due to different group means
(type of disease) is 1.
Error Variance
The error
variable Y^ is computed as Y - Y'. The variance which can not be explained
by the predictor X is called error variance.
The variance of the error variable (Y^) is equal to _____ (.667). The variance is computed as the mean
of the squared deviation scores.
The error variance is equal to .667.
Variance
Components
Total Variance =
Predictable Variance + Error Variance
1.667 = 1 + .667
Examine the Standard Variances
Divide the variance components by
the total variance of the criterion variable Y.
1.667/1.667
= 1/1.667 + .667/1.667
1 = .60 + .40
About 60 percent of the variance
in body temperature was accounted for by knowing the type of disease.
Approximately 40 percent of the variance in body temperature was likely
due to other, unidentifiable factors.
Form a t-square ratio
t (4) = 2.45
Locate the position of the calculated t value in the t
distribution with 4 degrees of freedom. Approximately
7% of the area under the t
distribution lies at or beyond t's of +-2.45.
The observed probability is larger
than .05.
Report the results.
An independent-samples t
test was conducted to determine whether there was a significant difference
in body temperature between the two groups of patients. The mean body
temperature was 100 degrees Fahrenheit (SD = 1.00) for patients
diagnosed with Disease 0 and 102 degrees Fahrenheit (SD = 1.00)
for Disease 1. The difference between the means was not statistically
significant, t(4) = 2.45, p > .05.
About 60 percent of the variance in body temperature was accounted for by
knowing the type of disease.
Computational Formula
SPSS For Windows
Input
Output
Group Statistics
Independent Samples t Test
Levene's test is used to test the null hypothesis that the two population
variances are equal. The null hypothesis was not rejected, F = 0,
p = 1. The assumption of equality of variances was met.
Using
the Traditional Formula to Compute the t Value
Suppose all possible random samples of size 3 have been drawn in
independent pairs from each population. First, compute sample means. Next,
the difference between each pair of means can be computed. Third, form
a sampling distribution of the difference.
- The mean of the sampling distribution of the difference would be zero.
- The estimated standard error of the difference can be computed as
The standard error of the difference equals .8165.
t = -2 / .8165 = -2.449
df = 6 - 2 = 4
- Locate the position of the calculated t value in the t
distribution with df = 4. The observed significance level was .07
and was larger than .05. What would you conclude?
Compute
a 95% Confidence Interval of the Difference
Output
How to Compute a 95% Confidence
Interval of the Difference
1. Obtain the t critical values
Set the degrees of freedom to 4 and the significance to .025. The
cutoff t values for the bottom 2.5% and the top 2.5% was -2.78 and
+2.78.
2. A 95% confidence interval reaches approximately 2.78 standard errors on
either side of the difference.
We knew that the mean difference was -2 and the standard error of the
difference was .8165. Thus, the 95% confidence interval for the difference
between two population means was ________.
Note that
the 95% confidence interval for the difference between two population
means included zero. (Recall that we failed
to reject the null hypothesis of
no difference.)
Optional Reading
Correlation and Causation by
Jason Newsom
Point-biserial correlation by
Jason Newsom
|