Contingency Tables
University of Amsterdam
2025-11-12
In this lecture we aim to:
Reading: Chapter 16
Relation between categorical variables
A “chi-squared test”, also written as \(\chi^2\) test, is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Without other qualification, ‘chi-squared test’ often is used as short for Pearson’s chi-squared test.
Chi-squared tests are often constructed from a Lack-of-fit sum of squared errors. A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent.
Source: wikipedia
\(\chi^2 = \sum \frac{(\text{observed}_{ij} - \text{model}_{ij})^2}{\text{model}_{ij}}\)
\(\text{observed}_{ij} = \begin{pmatrix} o_{11} & o_{12} & \cdots & o_{1j} \\ o_{21} & o_{22} & \cdots & o_{2j} \\ \vdots & \vdots & \ddots & \vdots \\ o_{i1} & o_{i2} & \cdots & o_{ij} \end{pmatrix}\)
\(\text{model}_{ij} = \begin{pmatrix} m_{11} & m_{12} & \cdots & m_{1j} \\ m_{21} & m_{22} & \cdots & m_{2j} \\ \vdots & \vdots & \ddots & \vdots \\ m_{i1} & m_{i2} & \cdots & m_{ij} \end{pmatrix}\)
The \(\chi^2\) distribution describes the test statistic under the assumption of \(H_0\), given the degrees of freedom.
\(df = (r - 1) (c - 1)\) where \(r\) is the number of rows and \(c\) the number of columns.

Personality
Lecture Extrovert Introvert
Digital 10 13
Live 13 16
\(\text{observed}_{ij} = \begin{pmatrix} 10 & 13 \\ 13 & 16 \\ \end{pmatrix}\)
\(\text{model}_{ij} = E_{ij} = \frac{\text{row total}_i \times \text{column total}_j}{n }\)
\(\text{model}_{ij} = E_{ij} = \frac{\text{row total}_i \times \text{column total}_j}{n }\)
modelPredictions <- matrix( c((totExt * totDig) / n,
(totExt * totLiv) / n,
(totInt * totDig) / n,
(totInt * totLiv) / n), 2, 2,
byrow=FALSE, dimnames = dimnames(observed)
)
modelPredictions Personality
Lecture Extrovert Introvert
Digital 10.17308 12.82692
Live 12.82692 16.17308
\(\text{model}_{ij} = \begin{pmatrix} 10.1730769 & 12.8269231 \\ 12.8269231 & 16.1730769 \\ \end{pmatrix}\)
Personality
Lecture Extrovert Introvert
Digital 10 13
Live 13 16
Personality
Lecture Extrovert Introvert
Digital 10.17308 12.82692
Live 12.82692 16.17308
Personality
Lecture Extrovert Introvert
Digital -0.1730769 0.1730769
Live 0.1730769 -0.1730769
\(\chi^2 = \sum \frac{(\text{observed}_{ij} - \text{model}_{ij})^2}{\text{model}_{ij}}\)
\(df = (r - 1) (c - 1)\)
Calculates exact \(\chi^2\) for small samples (at least one cell of the table has an expected count smaller than 5), when the \(\chi^2\) approximation does not yet suffice.
Calculate all possible permutations.
For 2 x 2 contingency tables, Yates’s correction is to prevent overestimation of statistical significance for small data. Unfortunately, Yates’s correction may tend to overcorrect (i.e., produce an overly conservative result), and is not really recommended anymore (see Section 16.3.5).
\(\chi^2 = \sum \frac{ ( | \text{observed}_{ij} - \text{model}_{ij} | - .5)^2}{\text{model}_{ij}}\)
[1] 0.03377921
\(\text{standardized residuals} = \frac{ \text{observed}_{ij} - \text{model}_{ij} }{ \sqrt{ \text{model}_{ij} } }\)
Odds ratio based on the observed values
\(\begin{pmatrix} a & b \\ c & d \\ \end{pmatrix}\)
\(OR = \frac{a \times d}{b \times c} = \frac{10 \times 16}{13 \times 13} = 0.9467456\)
Personality
Lecture Extrovert Introvert
Digital 10 13
Live 13 16
The extrovert/introvert ratio for digital and live audiences:
In the digital responses, there are +- 0.77 times as many extroverts than introverts. In the live responses, there are +- 0.81 times as many extroverts than introverts.
Personality
Lecture Extrovert Introvert
Digital 10 13
Live 13 16
Alternatively, we can look at the ratio’s of digital/live for extroverts and introverts:
For the extroverts, there are +- 0.77 times as many digital viewers than live viewers. For the introverts, there are +- 0.81 times as many digital viewers than live viewers.
Is the ratio of these odds.
\(OR = \frac{\text{digital}}{\text{live}} = \frac{0.7692308}{0.8125} = \frac{\text{extrovert}}{\text{introvert}} = \frac{0.7692308}{0.8125} = 0.9467456\)
For these data, extroverts were approximately 0.95 times more likely to watch digitally, compared to introverts. The odds ratio also accounts for the scores in both conditions—watching digitally and watching live—by comparing the odds of watching digitally to live viewing across both personality types.

Scientific & Statistical Reasoning