4. Statistical Reasoning

Introduction and NHST

Author

Affiliation

Johnny van Doorn

University of Amsterdam

Published

September 9, 2025

In this lecture we aim to:

Introduce the S in SSR
Repeat some stats concepts from RMS
- “Logic” behind common hypothesis testing
- Four scenario’s in statistical decision making

Reading: Chapters 1, 2, 3

Learning

About SSR

Book: Discovering Statistics Using JASP
- Many pages, but light content
- Difficulty indications in each section (A/B/C/D)
- Theory in first half, application in JASP in second half
Companion website of book - Data sets

About SSR

Lectures:
- Slightly erratic
- Starts with conceptual understanding
- Ends with JASP demonstration
- Confused?
  - Read the chapter first
  - Rewatch lecture
  - Ask questions (during lecture, on discussion board)

About SSR

Practice:
- Tutorials, WA in Ans
- Smart Alex tasks
- Labcoat Leni examples
- Self-tests

Software

JASP: main tool for analyses, data management
- Freely available at www.jasp-stats.org
- Also available at libraries, apps.uva.nl
R: very flexible, 100% optional
- Freely available at https://cran.r-project.org/
- Want a nicer environment for coding? Try Rstudio
- Will be available during exam, but not required to use (can also use Ans calculator)
- Free intro course

The Research Process

Sampling Variability

Null Hypothesis
Significance Testing

Neyman-Pearson Paradigm

Neyman

Pearson

Two hypotheses

\(H_0\)

Skeptical point of view
No effect
No preference
No correlation
No difference

\(H_A\)

Refute Skepticism
Effect
Preference
Correlation
Difference

Frequentist probability

Objective Probability
Relative frequency in the long run

Standard Error

95% confidence interval

\[SE = \frac{\text{Standard deviation}}{\text{Square root of sample size}} = \frac{s}{\sqrt{n}}\]

Lowerbound = \(\bar{x} - 1.96 \times SE\)
Upperbound = \(\bar{x} + 1.96 \times SE\)

Standard Error

Binomial \(H_0\) distribution

n <- 10   # Sample size
k <- 0:n  # Discrete probability space
p <- .5   # Probability of head

coin <- 0:1

permutations <- factorial(n) / ( factorial(k) * factorial(n-k) )
# permutations

p_k  <- p^k * (1-p)^(n-k)  # Probability of single event
p_kp <- p_k * permutations # Probability of event times 
# the occurrence of that event

title <- "Binomial Null distribution"

# col=c(rep("red",2),rep("beige",7),rep("red",2))

barplot( p_kp, 
         main=title, 
         names.arg=0:n, 
         xlab="number of heads", 
         ylab="P(%)", 
         col='beige',
         ylim=c(0,.3) )

# abline(v = c(2.5,10.9), lty=2, col='red')

text(.6:10.6*1.2,p_kp,round(p_kp,3),pos=3,cex=.5)

Binomial \(H_A\) distributions

Decision table

Alpha \(\alpha\)

Incorrectly reject \(H_0\)
Type I error
False Positive
Threshold for “significance”
Criteria often 5% but heavily criticized

Power

Correctly reject \(H_0\)
True positive
Power equal to: 1 - Beta
Beta is Type II error
Criteria often 80%
Depends on sample size

One minus alpha

Correctly accept \(H_0\)
True negative

Beta

Incorrectly accept \(H_0\)
Type II error
False Negative
Criteria often 20%
Distribution depends on sample size

P-value

Conditional probability of the observed test statistic or more extreme assuming the null hypothesis is true.

Reject \(H_0\) when:

\(p\)-value \(\leq\) \(\alpha\)

Test statistics

A statistic that summarizes the data and is used for hypothesis testing, because we know how it’s distributed under different hypotheses

Common test statistics:

Number of heads
Sum of dice
\(t\)-statistic
\(F\)-statistic
\(\chi^2\)-statistic
etc…

P-value in \(H_{0}\) distribution

P-value and \(\alpha\)

Alpha determines how willingly we reject the null hypothesis:

Increase \(\alpha\) = reject null hypothesis more often
- Increases Type I error rate
- Decreases Type II error rate
Historically set to 0.05, but widely criticized (see Jane Superbrain 3.1)

No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. (Fisher, 1956)

Misconceptions about the p-value

A significant result means that the effect is important
- Significance = effect size + sample size
A non-significant result means that the null hypothesis is true
A significant result means that the null hypothesis is false

Decision Table

Play around with this app to get an idea of the probabilities

NHST Reasoning Scheme

Next Time

Visualization in JASP
Correlation
- How it works
- Controlling for a third variable

Bored?

Contact

Learning

About SSR

About SSR

About SSR

Software

The Research Process

Sampling Variability

Null HypothesisSignificance Testing

Neyman-Pearson Paradigm

Two hypotheses

Frequentist probability

Standard Error

Standard Error

Binomial \(H_0\) distribution

Binomial \(H_A\) distributions

Decision table

Alpha \(\alpha\)

Power

One minus alpha

Beta

P-value

Test statistics

P-value in \(H_{0}\) distribution

P-value and \(\alpha\)

Misconceptions about the p-value

Decision Table

NHST Reasoning Scheme

Next Time

Bored?

Contact

Null Hypothesis
Significance Testing