Procedure Crosstabs

The CROSSTABS procedure creates tables showing the joint distribution of two or more categorical variables (e.g., cell counts, cell percentages, expected values, and residuals). Various measures of association can be obtained.

Example 1

Twenty married men and twenty married women were asked whether they would marry their current spouse if they had to do it all over again. 
 

Code Data

Binary Variables

1. Gender: Assign 0 for `married women` and 1 for `married men`.
2. Choice: Assign 0 for `Yes` and 1 for `No`.  

Four Combinations
 

  Yes (0) No (1)
Women (0) Women/Yes   Women/No
Men (1) Men/Yes Men/No

 

Data Set

Their responses can be shown below.  
 

ID

Gender

CHOICE

 

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

 

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

 

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

0

0

0

0

 


Response Patterns:
There are four binary response patterns: [0,0] [0,1] [1, 0] and [1,1] .

Frequency count
of four binary response patterns.

Count the frequencies and fill in the four cells.  
 

  Yes (0) No (1)
Women (0) [0,0] 15 [0,1]
Men (1) 13 [1,0] 7 [1, 1]

 

Research Question

There are two categorical variables (gender and choice). A chi-square test of independence is called for. The null hypothesis is that the two variables (gender and choice) are unrelated (independent). The alternative hypothesis is that there is an association between gender and choice.

Conduct a two-way contingency table analysis to find out if gender and choice are related.

   

SPSS for Windows

A. Enter data values in the Data View window.


B. Define variable names, labels, and value labels in the Variable View window.
 

Click the Variable View tab to open the Variable View window.




The Name Column

To edit the variable name, double click on var00001. Delete var00001 and type in `gender` to replace the default one. Next, double click on var00002. Delete var00002 and type in `choice` to replace the default one.
 

 

The Values Column

Values: Assign descriptive labels to values.

gender

Double click on the textbox and a gray square will appear. Click on the gray square. a Value Label dialog box will appear.

            

(1) Enter the value: 0. 

(2) Enter the value label: Women. 

(3) Click on Add. The value label is added to the list. 

(4) Enter the next value: 1. Enter the corresponding value label: Men. Click on Add. Click on OK to end the input. 

choice

Value Labels: `Same` is coded as `0`. `Different` is coded as `1`.

 

B. Choose the Crosstabs procedure 

1. From the menus to choose: Analyze \ Descriptive Statistics \ Crosstabs 

2. Select the row variable `gender` and the column variable `choice`. 

3. Click on Statistics.

Click on the Chi-square check box. In the Nominal area, click on the `Phi and Creamer’s V` check box.  (To access the definition of the 'Phi and Cramer's V', right-click on the term.) Click on Continue. Return to the Crosstabs dialog box. 

4. Click on Cells. This opens the Crosstabs: Cell Display dialog box.

In the Counts area: In addition to the default observed count, select Expected. 

In the Percentages area, select Row. The researcher wanted to know the percentages who made the same or different choices within each gender.

In the Residuals area: Choose Unstandardized. Unstandardized residuals are the observed minus the expected frequencies. Click Continue. Click OK.

C. Data Visualization. Produce a clustered bar chart.  

From the SPSS Data Editor menus choose: Graphs \ Bar 

Type of Bar Chart: Select Clustered.

Data in Chart Are: Summaries for groups of cases. It is the default.

Click on Define to open a dialog box.

Bars Represent: Select % of cases.

Specify a category axis: Click on the variable 'gender'. Click on the right arrow button.

Define Clusters by: Select the variable 'choice'. Click OK. The bar chart will be displayed in the SPSS Viewer window.

 

SPSS Printout


The data can be expressed in a 2 (gender) by 2 (choice) table.   
 

   

Examine the obtained percentages.

Women & Choices

The percentage of women who chose to marry their current spouse (25%) was less than the percentage of women who chose not to (75%). 

Men & Choices

The percentage of men who chose to marry their current spouse (65%) was more than the percentage of men who chose not to (35%). 

Gender & Choices

Most women chose not to marry the same spouse while most men chose to marry the same spouse.

 

 

1. The percentage of men who chose to marry the same spouse is greater than the percentage of women in the same category.

2. The percentage of men who chose not  to marry the same spouse is less than the percentage of women in the same category.

Examine the above pattern. What do you observe?

 

The statistical significance of the relationship between the two categorical variables, choice and gender, can be investigated using the test. The expected counts and residuals are used to compute the chi-square statistic.

Examine the expected counts.

The expected counts can be obtained from the row and column totals

(1) For the women-same cell, the expected count is computed as
 

number of women * number of same
total

 

(20*18) / 40 = 9

 

  Same (0) Different (1) Row Total
Women (0) Count: 5 

Expected Count: ?

Count: 15  20
Men (1) Count: 13 Count: 7 Count: 20
Column Total  18 Count: 22 Count: 40

 

(2) For the women-different cell, the expected count is computed as
 

number of women * number of different
total

 

(20*22) / 40 = 11

 

Same (0) Different (1) Row Total
Women (0) Count: 15

Expected Count: ?

20
Men (1) 13 7 20
Column Total 18 22 40

 

(3) For the men-same cell, the expected count is computed as ___________

(20*18) / 40 = 9
 

(4) For the men-different cell, the expected count is computed as ___________

(20*22) / 40 = 11
 

Examine the Residuals.

A residual is the difference between an observed and an expected count. 

Women
 

  Same (0) Different (1)
Women (0) observed: 5

expected: 9 

residual: ?

observed: 15

expected: 11 

residual: ?


For the women-same cell, the residual is computed as 5-9= -4. For the women-different cell, the residual is computed as 15-11= 4.

Men
 

  Same (0) Different (1)
men (1) observed: 13

expected: 9 

residual: ?

observed: 7

expected: 11 

residual: ?

 

For the men-same cell, the residual is computed as _____. (13-9=4)
For the men-different cell, the residual is computed as _____. (7-11=-4)

 

The  can be calculated by summing over all cells the squared residuals divided by the expected frequencies:

 

 

 

Results

 

 

What do you conclude?

The value of the Pearson chi-square equals 6.465. Its probability is .011. The associated degree of freedom is computed as (2-1)(2-1)=1. The observed probability is less than .05.

There is a significant relationship between the two categorical variables. Women and men differ in their choices. Gender and choice are related.

(A problem arises when we do CROSSTABS with a small number of cases. Fisher's exact test is an alternative test for the 2 x 2 table. It is most useful when the total sample size and the expected values are small.)
 

The results of a chi-square test of contingency merely indicate whether or not there is a statistically significant relationship between two categorical variables. It is desirable to include both a significance test and a measure of the strength of the relationship.

The phi coefficient equals -.402. Gender and choice are related, p < .05. About 16% of the variance in the variable Choice is accounted for by gender.

Ignore the sign of the phi coefficient. It depends on which levels in each variable get the one and which the zero.  
 

The phi coefficient can be calculated indirectly by using chi-square. The phi coefficient modifies the chi-square by dividing it by the sample size and taking the square root of the result.   
 


The phi coefficient of correlation is associated with the chi-square test of significance. Use phi coefficient to quantify the strength of the association between gender and choice. Phi = .402  
 
 

 


 

Example 2

Randomly selected American college graduates and high school graduates were asked how they kept up with the news. The observed frequencies of their responses were shown below.  
 

  Newspaper (1) Radio (2) TV. (3)
High School Graduates (0) 17 10 80
College Graduates (1) 29 21 80

 

The researcher also collected the same data in Japan. The observed frequencies of responses are shown below.
 

  Newspaper (1) Radio (2) TV  (3)
High School Graduates (0) 25 12 70
College Graduates (1) 89 21 20

 

Code Data

For the row variable (education), assign 0 for 'high school graduates' and 1 for 'college graduates'.

For the column variable (preference), assign 1 for 'newspaper', 2 for 'radio', and 3 for 'television'.

For the control variable (country), assign 0 for 'America' and 1 for 'Japan'.
 

SPSS for Windows 

A. Create a new data file.  

From the menus choose: File \ New \ Data

B. There are four variables in the example. 

1. Define all the variable names, labels, and value labels in the Variable View window.



2. Enter data values.  

The Tab Key moves from left to right through the variables for each case in the select area. The Enter key moves from top to bottom through the cases for each variable in the selected area.

C. Weight cases 

From the menus choose: Data \ Weight Cases . Select Weight cases by. Click on the variable `freq` and the > push-button. `freq` is defined as a frequency variable. Click OK.

D. Choose the Crosstabs procedure. 

From the menus to choose: Analyze \ Descriptive Statistics \ Crosstabs

 

1. Select the row variable ‘ed` and the column variable `pref`. 

2. Select `country` as the layer 1 control variable.

Click on country in the source variable list. Click on the > push-button. 

By adding a control variable (country) to the TABLES subcommand, SPSS produces a separate table for each country, America and Japan.

3. Click the Display clustered bar charts check box. 

4. Click on Statistics. Select Chi-square. Select Phi and Cramer's V.

Cramer's V is computed for this 2 by 3 table. Click Continue.  

5. Click on Cells in the Crosstabs dialog box. Select the desired statistics from the check boxes.

Generally, we will choose only observed counts and percentages. The expected counts and residuals will be used for illustrating purposes.

The Percentage Area: Percentages within each cell can be based on the row totals, the column totals, and the total number of cases.

The researcher wanted to know the percentages who obtain news from different sources within each education level. Thus, the row (education level) percentage option was chosen.

Click on Continue. Click on OK in the Crossrabs dialog box.

 

SPSS Printout

 

Education by Preference by Country Crosstabulation


Normally, we will report only observed counts and percentages. The expected counts and residuals will not be reported.

America

Order the percentages from the highest to the lowest in each row.

For high school graduates: Television (75%), Newspaper (16%), Radio (9%) 

For College Graduates: Television (62%), Newspaper (22%), Radio (16%) 
 

  Newspaper (1) Radio (2) Television (3)
High School Graduates (0) 2nd Highest% (16%) Lowest % (9%) Highest % (75%)
College Graduates (1) 2nd Highest% (22%) Lowest % (16%) Highest % (62%)


 

Clustered bar charts of preference within the education categories 


Examine the above pattern. What do you observe?

 

Japan

1. Order the percentages from the highest to the lowest in each row. What do you observe? 

For high school graduates: Television (65%), Newspaper (23%), Radio (11%)

For College Graduates: Newspaper (69%), Radio (16%), Television (16%)
 

  Newspaper (1) Radio (2) Television (3)
High School Graduates (0) 2nd Highest% (23%) Lowest % (11%) Highest % (65%)
College Graduates (1)   Highest% (69%) 2nd Highest % (16%) Lowest % (15%)


2. In Japan, the percentage of college graduates who read newspapers (69%) is greater than the percentage of high school graduates in the same category (23%). Also, the percentage of college graduates who watch television (15.4%) is less than the percentage of high school graduates in the same category (65.4%).

 

Clustered bar charts of preference within the education categories 


Examine the above pattern. What do you observe?

 

Chi-Square Tests

Are education and preference significantly related?

 

A. Americans 

The value of the Pearson chi-square equals 4.847. Its probability is .089. Education and preference are not significantly related, p > .05.

B. Japanese 

The Pearson chi-square equals 64.538. The probability is less than .001. Education and preference are significantly related, p < .05. 

 

Strength of the Association: Cramer's V

Cramer's V is applied to contingency tables that are larger than 2x2.

Use Cramer's V to quantify the strength of the association between educational groups (2 levels) and their preferences (3 levels). Cramer's V is computed for each country as shown below.

 

A. Americans 

Use Cramer's V to quantify the strength of the association between educational groups and their preferences. Cramer's V = .143.   Its probability is .089. Education and preference are not significantly related, p > .05.

B. Japanese 

The Cramer's V coefficient for Japanese sample is .522. The probability is less than .001. Education and preference are significantly related.

 


 

Optional Reading

 

http://www.davidmlane.com/hyperstat/chi_square.html

 

http://vassun.vassar.edu/~lowry/ch8pt1.html

 

http://bmj.com/statsbk/8.shtml

 

http://web.uccs.edu/lbecker/SPSS/content.htm

Crosstabulation and Measures of Association