Lecture 6 — Reporting Guidelines & Exam Preparation

Author

Johnny van Doorn

Published

June 17, 2026

By the end of this lecture, you will be able to:

  • Apply the van Doorn et al. (2021) reporting checklist to a Bayesian analysis
  • Use calibrated language when describing BFs and posteriors
  • Identify common errors in written Bayesian reports
  • Demonstrate the full analysis pipeline on exam-style questions

Reading: van Doorn et al. (2021), Psychonomic Bulletin & Review. doi: 10.3758/s13423-020-01798-5

Exam: Friday, June 19, 09:00–11:00, IWO 4.04C (Blauw). Open-book ANS exam with R (no internet); the textbook is provided as a PDF and you may bring one A4 cheat sheet (both sides). Covers all lecture material (L1–L5).


Reporting Guidelines (van Doorn et al., 2021)

Why Reporting Matters

A Bayesian analysis is only as good as its communication. Two researchers with the same data but different priors can legitimately reach different conclusions. That’s fine, but readers must be able to evaluate and reproduce the analysis.

Van Doorn et al. identify four stages of a Bayesian study:

\[\text{Planning} \;\rightarrow\; \text{Executing} \;\rightarrow\; \text{Interpreting} \;\rightarrow\; \text{Reporting}\]

Stage 1: Planning

What to decide before you see the data

  • Specify the goal: estimation, hypothesis testing, or both?
  • If testing: one-sided or two-sided? Justify theoretically (before seeing data).
  • Choose a statistical model: likelihood + prior family + prior parameters.
  • Plan a robustness check: which alternative priors will you report?
  • Specify a sampling plan (optional for Bayesians, but transparent).
  • Pre-register for confirmatory research.

Stage 2: Executing

  • Check assumptions before running (outliers, distributional fit).
  • Run the pre-registered analysis; annotate any deviations.
  • Conduct the robustness check: vary the prior and report how the BF changes.

A Bayesian analysis does not safeguard against model misspecification. Anscombe’s quartet applies here too; always inspect your data.

Stage 3: Interpreting

For hypothesis testing (BF):

  • BF is a relative measure: data are \(x\) times more likely under \(H_1\) than \(H_0\)
  • \(BF_{10} = 1/BF_{01}\); always use the subscript to be unambiguous
  • BF does not equal posterior probability of \(H_1\); that requires prior odds: \[\frac{P(H_1 \mid y)}{P(H_0 \mid y)} = BF_{10} \times \frac{P(H_1)}{P(H_0)}\]

For estimation (posterior):

  • Report posterior median (or mean) and a credible interval (ETI or HDI, specify which)
  • The CI has a direct probability interpretation: \(P(\theta \in [L,U] \mid y) = 0.95\)

Stage 4: Reporting (The Complete Checklist)

Model specification

  • State the likelihood explicitly
  • State the prior with all parameters; justify the choice
  • Report the prior ESS

Checking

  • Prior predictive check: are the prior predictions plausible?
  • Posterior predictive check: does the model fit the data?

Results

  • Posterior median/mean and credible interval (ETI or HDI; state which)
  • If testing: report \(BF_{10}\) (or \(BF_{01}\)) with both hypotheses written out explicitly
  • Sensitivity analysis: repeat under 2–3 alternative priors

Transparency

  • Share data, code, and prior specification (OSF, GitHub, or supplementary)
  • Use calibrated language (see below)

Language Guide

Use these

  • “Moderate evidence for \(H_1\) (\(BF_{10} = 7.2\))”
  • “The data are \(x\) times more likely under \(H_1\)
  • “The posterior mean is 0.65 (95% ETI: [0.51, 0.78])”
  • “Evidence against the null (\(BF_{01} = 12\))”
  • “Anecdotal support for \(H_1\)

Avoid these

  • “The null hypothesis is false / rejected”
  • “We have proven that the effect exists”
  • \(\theta\) is definitely in [0.51, 0.78]”
  • “No evidence, so the null is true”
  • “The BF confirms our hypothesis”
  • “The CI shows significance”

A Specimen Paragraph

We modelled the probability of correct beer identification (\(\theta\)) with prior \(\text{Beta}(6, 6)\) (ESS = 12, symmetric, no directional expectation). After observing \(y = 42\) correct identifications out of \(n = 57\), the posterior was \(\text{Beta}(48, 21)\), with median 0.696 (95% ETI: [0.578, 0.806]). A Bayes factor of \(BF_{10} = 58.3\) indicated very strong evidence that performance exceeds chance (\(\theta = 0.5\)). A sensitivity analysis under \(\text{Beta}(1,1)\) yielded \(BF_{10} = 43.1\), supporting the robustness of this conclusion. Data and analysis code are available at [OSF link].


Four Common Exam Mistakes

Mistake 1: Misreading the Credible Interval

Wrong: “There is a 95% probability that the true \(\theta\) is between 0.51 and 0.78 in repeated sampling.”

Correct:Given the data and the prior, there is a 95% probability that \(\theta\) is between 0.51 and 0.78.”

The frequentist “repeated sampling” clause belongs to the confidence interval, not the Bayesian credible interval.

Mistake 2: BF = Posterior Probability

Wrong:\(BF_{10} = 9\) means there is a 90% chance that \(H_1\) is true.”

Correct:\(BF_{10} = 9\) means the data are 9 times more likely under \(H_1\) than under \(H_0\). Converting to posterior probability requires specifying the prior odds.”

\[P(H_1 \mid y) = \frac{BF_{10} \times P(H_1)}{BF_{10} \times P(H_1) + P(H_0)}\]

Mistake 3: No Evidence ≠ Evidence for the Null

Wrong:\(BF_{10} = 1.2\), there is no effect.”

Correct:\(BF_{10} = 1.2\) is anecdotal. The data were uninformative: they did not substantially shift beliefs in either direction.”

Contrast with \(BF_{01} = 14\): this is positive evidence for \(H_0\).

Mistake 4: Using the CI to Test a Hypothesis

Wrong: “The 95% CI excludes 0.5, so \(H_0: \theta = 0.5\) is rejected.”

Correct: Credible intervals are for estimation. Use the Bayes factor for hypothesis testing. The CI and the BF answer different questions.

Question Tool
Does an effect exist? Bayes factor
How large is the effect? Posterior + CI
What data should we expect next? Posterior predictive

Error-Spotting Exercises

Report 1: Mindfulness & Memory

We tested a mindfulness intervention on working memory (\(n = 15\)). \(BF_{10} = 4.3\), so we conclude the null is false. The CI [39.8, 42.6] proves the intervention is effective. No robustness analysis was needed since results are clear.

Find 3 errors in this report.

  1. “Null is false”: \(BF_{10} = 4.3\) is moderate evidence; it cannot falsify \(H_0\). Report as “moderate evidence in favour of \(H_1\).”
  2. “CI proves effectiveness”: credible intervals summarise uncertainty, they cannot “prove” anything. We would also need to compare the CI to a meaningful null value.
  3. “No robustness needed”: robustness checks are especially important when making strong claims, not an optional extra.

Report 2: Facial Feedback

One-sided Bayesian \(t\)-test (\(n_1 = 53\), \(n_2 = 57\)). Informed prior: \(t(0.35, 0.102, 3)\). Result: \(BF_{0-} = 11.5\). This means \(H_0\) is 11.5× more probable than \(H_-\).

Find 1 error in this report.

  1. \(H_0\) is 11.5× more probable”: \(BF_{01}\) is a likelihood ratio, not a probability ratio. Converting to probability ratios requires specifying prior odds.

Mock Exam 1

A full practice paper (16.5 pts) in the style of last year’s exam. It is deliberately longer than the real exam (10 pts, 3 questions) to give you more to work through, and it covers Lectures 3–5 plus a model-checking simulation. (Lectures 1–2 are covered in Mock Exam 2 below.)

Q1–Q3 are by-hand computational questions; Q4 is a with-R simulation question.

Attempt each question under exam conditions before opening its solution.

Mock Q1: Conjugacy & Prior Choice (4.5 pts)

(Lecture 3 / Ch. 5; cf. Beta–Binomial exercises)

A food-delivery app wants to estimate \(\theta\), the proportion of customers who tip their driver. Based on a pilot study, they adopt a \(\text{Beta}(5, 5)\) prior. In the new cohort, 42 out of 60 customers tip.

a. [1] Identify the data model and conjugate prior family. Write the posterior update rule and state the resulting posterior.

b. [1] Compute the prior mean, observed proportion, and posterior mean. Explain the influence of the prior vs data.

c. [0.75] Compute the 95% equal-tailed credible interval and interpret it.

d. [0.75] Compute \(P(\theta > 0.70 \mid y)\).

e. [1] (Describe, max 90 words.) Describe how the posterior has shifted relative to the prior — in terms of its mean, mode, and standard deviation, and explain what this shift indicates about how the data updated your beliefs about \(\theta\).

Attempt before looking at the solution.

a. \(Y \mid \theta \sim \text{Binomial}(60, \theta)\); conjugate prior \(\theta \sim \text{Beta}(\alpha,\beta)\). Update: \(\theta \mid y \sim \text{Beta}(\alpha + y,\ \beta + n - y)\).

Code
alpha <- 5; beta <- 5; y <- 42; n <- 60
ap <- alpha + y; bp <- beta + n - y
cat("Posterior: Beta(", ap, ",", bp, ")\n")
Posterior: Beta( 47 , 23 )

b.

Code
cat("Prior mean:     ", round(alpha/(alpha+beta), 3), "\n")
Prior mean:      0.5 
Code
cat("Observed prop:  ", round(y/n, 3), "\n")
Observed prop:   0.7 
Code
cat("Posterior mean: ", round(ap/(ap+bp), 3), "\n")
Posterior mean:  0.671 
Code
cat("Prior ESS:      ", alpha + beta, "\n")
Prior ESS:       10 

The posterior mean (0.671) is a weighted average of the prior mean (0.50) and the observed proportion (0.70). With prior ESS = 10 vs n = 60, the data carry ~6× the weight, so the posterior sits close to \(\bar y\), only slightly pulled toward the prior.

c.

Code
round(qbeta(c(0.025, 0.975), ap, bp), 3)
[1] 0.558 0.776

Given the data and prior, there is a 95% probability that \(\theta\) lies in this interval.

d.

Code
pbeta(0.70, ap, bp, lower.tail = FALSE)
[1] 0.3134553

About a 31% posterior probability that more than 70% of customers tip.

e.

Code
prior_mode <- (alpha - 1) / (alpha + beta - 2)
post_mode  <- (ap - 1) / (ap + bp - 2)
prior_sd <- sqrt(alpha*beta / ((alpha+beta)^2 * (alpha+beta+1)))
post_sd  <- sqrt(ap*bp / ((ap+bp)^2 * (ap+bp+1)))
cat("Prior:     mean", round(alpha/(alpha+beta),3), " mode", round(prior_mode,3), " sd", round(prior_sd,3), "\n")
Prior:     mean 0.5  mode 0.5  sd 0.151 
Code
cat("Posterior: mean", round(ap/(ap+bp),3),         " mode", round(post_mode,3),  " sd", round(post_sd,3),  "\n")
Posterior: mean 0.671  mode 0.676  sd 0.056 

The posterior has shifted upward — both the mean (0.50 → 0.671) and the mode (0.50 → 0.677) move about 0.17 toward the observed proportion of 0.70 — and tightened sharply, with the SD falling roughly threefold (0.151 → 0.056). The 60 observations have refined a fairly diffuse, symmetric Beta(5, 5) prior into a much more precise, slightly right-leaning posterior: our belief about the tipping rate has moved from “around half, very uncertain” to “about two-thirds, fairly precisely.”

Marking: upward shift in mean and mode (0.25); reduced SD / increased precision (0.25); state the actual prior and posterior values (0.25); interpret the shift as the data sharpening a diffuse prior toward the observed proportion (0.25).

Mock Q2: Grid Approximation & MCMC (3 pts)

(Lecture 4 / Ch. 6–7; cf. grid + Metropolis exercises)

A researcher models \(\theta\), the probability of a new drug being effective. They use a \(\text{Beta}(2, 2)\) prior truncated to \([0.3, 0.7]\). In a trial of 20 patients, 12 show improvement.

a. [0.5] Explain why this truncated prior has no closed-form conjugate posterior.

b. [1] The researcher writes the following recipe on the whiteboard:

  1. Build a fine grid \(\theta_1, \ldots, \theta_K\) on \([0, 1]\).
  2. At each \(\theta_i\), compute \(u_i = f(\theta_i)\, L(y \mid \theta_i)\) using the truncated Beta prior and Binomial likelihood.
  3. Normalise: \(w_i = u_i / \sum_k u_k\).
  4. Report posterior summaries as weighted averages over the grid.
  1. [0.5] Which numerical method does this describe? Pick one and justify in one sentence: (A) Metropolis–Hastings MCMC · (B) Grid approximation · (C) Conjugate Beta–Binomial update · (D) Posterior-predictive Monte Carlo.

  2. [0.5] Applying the recipe gives a posterior mean of \(\bar\theta \approx 0.562\). The untruncated conjugate posterior \(\text{Beta}(2 + 12,\; 2 + 8) = \text{Beta}(14, 10)\) has mean \(\approx 0.583\). What does the comparison tell you about the truncation here?

c. [1] A random-walk Metropolis sampler on the untruncated model is at \(\theta_c = 0.55\) and proposes \(\theta_p = 0.60\). Using the unnormalised posterior, compute the Metropolis acceptance probability \(\alpha\). Is the move accepted with certainty?

d. [0.5] You rerun the sampler with three step sizes. One trace plot is a flat caterpillar around the mode; one drifts slowly and never settles; one gets “stuck” on long plateaus. Match each pattern to the diagnosis: good mixing, step size too small, step size too large.

Attempt before looking at the solution.

a. The Beta–Binomial conjugate update needs a proper Beta prior on \([0, 1]\). Truncating to \([0.3, 0.7]\) takes the prior out of the Beta family, so the posterior is no longer Beta and has no closed form.

b. (i) (B) Grid approximation. No accept/reject step (so not MCMC), no analytical update (so not the conjugate Beta–Binomial), and the recipe produces posterior samples for \(\theta\) rather than predictions for future \(y\) (so not posterior-predictive Monte Carlo).

  1. The two means differ by about 0.02 (0.562 vs 0.583). With only \(n = 20\), the conjugate posterior \(\text{Beta}(14, 10)\) still has noticeable mass above 0.7, which the truncation cuts off — that pulls the truncated posterior mean slightly downward. The truncation is doing real work here, unlike in cases with larger \(n\) where the data already concentrate \(\theta\) well inside the truncation bounds.

c.

Code
y <- 12; n <- 20
logpost <- function(t) dbeta(t, 2, 2, log = TRUE) + dbinom(y, n, t, log = TRUE)
cat("Acceptance probability:", round(min(1, exp(logpost(0.60) - logpost(0.55))), 3), "\n")
Acceptance probability: 1 

\(\alpha = 1\): the proposed value \(\theta_p = 0.60\) is more plausible than the current \(\theta_c = 0.55\) (the MLE sits exactly at \(12/20 = 0.60\)), so the unnormalised posterior ratio exceeds 1 and the move is accepted with certainty.

d. Flat caterpillar around the mode = good mixing. Slow drift that never settles = step size too small (high autocorrelation, the chain barely explores). Long stuck plateaus = step size too large (proposals overshoot into low-probability regions and are repeatedly rejected).

Mock Q3: Inference, Prediction & Testing (5 pts)

(Lecture 5 / Ch. 8; cf. posterior-summary, prediction & Bayes-factor exercises)

A clinic models \(\lambda\), the mean number of emergency calls per hour, with a \(\text{Gamma}(3, 1)\) prior. Over 8 hours they record 15 calls.

a. [0.75] Give the posterior distribution, its mean, and a 95% equal-tailed credible interval.

b. [1] Predict the number of calls next hour. Compute \(E(Y' \mid y)\) and \(P(Y' \ge 2 \mid y)\) using the posterior-predictive (Negative-Binomial) distribution. Why is this wider than plugging the posterior mean into a single Poisson?

c. [1] Test \(H_+: \lambda > 2\) vs \(H_-: \lambda < 2\). Compute \(BF_{+-}\) from the prior and posterior odds.

d. [0.75] Test \(H_0: \lambda = 2\) using the Savage–Dickey density ratio. What does \(BF_{01}\) say about the value \(\lambda = 2\)?

e. [1.5] (Interpret, max 70 words.) State in words what the Bayes factor from (c) tells you about the call rate, using calibrated language. What does it license you to conclude, and what does it not?

Attempt before looking at the solution.

a. \(Y_i\mid\lambda \overset{iid}{\sim}\text{Poisson}(\lambda)\); posterior \(\lambda\mid y \sim \text{Gamma}(s+\sum y,\ r+n)\).

Code
s <- 3; r <- 1; sumy <- 15; nobs <- 8
sp <- s + sumy; rp <- r + nobs
cat("Posterior: Gamma(", sp, ",", rp, "),  mean =", round(sp/rp, 3), "\n")
Posterior: Gamma( 18 , 9 ),  mean = 2 
Code
cat("95% ETI:", round(qgamma(c(0.025, 0.975), sp, rp), 3), "\n")
95% ETI: 1.185 3.024 

b.

Code
cat("E(Y' | y) =",       round(sp/rp, 3), "\n")
E(Y' | y) = 2 
Code
cat("P(Y' >= 2 | y) =",  round(1 - pnbinom(1, size = sp, prob = rp/(rp+1)), 3), "\n")
P(Y' >= 2 | y) = 0.58 

The posterior-predictive marginalises over the full posterior uncertainty in \(\lambda\). Plugging the posterior mean into a single \(\text{Poisson}(\hat\lambda)\) fixes \(\lambda\) at one value and ignores that uncertainty, so it is too narrow and over-confident; the Negative-Binomial is correctly wider.

c.

Code
prior_above <- pgamma(2, s, r, lower.tail = FALSE)    # P(lambda > 2)
post_above  <- pgamma(2, sp, rp, lower.tail = FALSE)   # P(lambda > 2 | y)
BF_plusminus <- (post_above/(1 - post_above)) / (prior_above/(1 - prior_above))
cat("Prior P(l>2):", round(prior_above,3), " Posterior P(l>2):", round(post_above,3), "\n")
Prior P(l>2): 0.677  Posterior P(l>2): 0.469 
Code
cat("BF(+, -) =", round(BF_plusminus, 3),
    "  (equivalently BF(-, +) =", round(1/BF_plusminus, 2), ")\n")
BF(+, -) = 0.421   (equivalently BF(-, +) = 2.37 )

\(BF_{+-} \approx 0.42 < 1\), so the data favour \(H_-\) (rate below 2) over \(H_+\). Equivalently \(BF_{-+} \approx 2.4\) — only anecdotal evidence for \(\lambda < 2\) on the Kass & Raftery scale. The data nudged \(P(\lambda > 2)\) down from 0.68 to 0.47, but not decisively.

d.

Code
cat("BF01 =", round(dgamma(2, sp, rp) / dgamma(2, s, r), 3), "\n")
BF01 = 3.112 
Code
cat("BF10 =", round(dgamma(2, s, r) / dgamma(2, sp, rp), 3), "\n")
BF10 = 0.321 

\(BF_{01} \approx 3.1\): the posterior density at \(\lambda = 2\) is about three times the prior density there, so the data are ~3× more consistent with \(\lambda = 2\) than the diffuse alternative expected — moderate evidence for the value 2, not against it. Note that (c) and (d) answer different questions: a one-sided direction test versus a point test.

e. Model answer (≤70 words). The Bayes factor \(BF_{+-} \approx 0.42\) means the observed data are about 2.4 times more likely under \(H_-\) (\(\lambda < 2\)) than under \(H_+\) (\(\lambda > 2\)) — anecdotal, not compelling, evidence for a call rate below 2 per hour. It licenses a tentative lean toward the lower rate; it does not prove \(\lambda < 2\), nor give the probability that \(H_-\) is true (that would need the prior odds).

Marking: correct direction — evidence favours \(H_-\) (0.5); translates 0.42 into “~2.4× more likely under \(H_-\)” (0.5); uses calibrated/probabilistic language (not “reject”/“significant”) and notes BF \(\neq\) posterior probability (0.5).

Mock Q4: Posterior Prediction by Simulation (4 pts)

(Lecture 5 / Ch. 8; cf. posterior-prediction exercises, with-R question)

A marine ecologist models \(\lambda\), the mean number of whale sightings per one-hour boat survey, with a \(\text{Gamma}(2, 1)\) prior. Over 10 surveys the counts are

\[y = (2,\; 0,\; 3,\; 1,\; 2,\; 4,\; 1,\; 0,\; 3,\; 2).\]

a. [0.5] Derive the posterior analytically.

b. [1.5] Write R code to simulate 50,000 draws from the posterior-predictive distribution for a single future survey. Plot it, and compute \(E(Y' \mid y)\) and \(P(Y' = 0 \mid y)\). Confirm \(P(Y'=0\mid y)\) against the analytic Negative-Binomial value.

c. [1] The team plans a block of 5 future surveys. Simulate the total number of sightings across the block and estimate \(E(Y_\text{total} \mid y)\) and \(P(Y_\text{total} \ge 10 \mid y)\).

d. [1] (Interpret, max 70 words.) A colleague wants to report only the predictive mean from (b) as “the” forecast for the next survey, with no uncertainty. What is wrong with this, and what should they report instead?

Attempt before looking at the solution.

a. \(Y_i \mid \lambda \overset{iid}{\sim} \text{Poisson}(\lambda)\); conjugate update \(\lambda \mid y \sim \text{Gamma}(s + \sum y_i,\ r + n)\).

Code
y  <- c(2, 0, 3, 1, 2, 4, 1, 0, 3, 2)
s0 <- 2; r0 <- 1                          # prior Gamma(2, 1)
sp <- s0 + sum(y); rp <- r0 + length(y)
cat("Posterior: Gamma(", sp, ",", rp, "),  mean =", round(sp/rp, 3), "\n")
Posterior: Gamma( 20 , 11 ),  mean = 1.818 

\(\sum y_i = 18\), \(n = 10\), so \(\lambda \mid y \sim \text{Gamma}(20, 11)\) with mean \(20/11 \approx 1.82\).

b. Draw \(\lambda^{(s)}\) from the posterior, then \(Y'^{(s)} \sim \text{Poisson}(\lambda^{(s)})\) — this propagates posterior uncertainty into the prediction.

Code
set.seed(2026)
N <- 50000
lambda_post <- rgamma(N, shape = sp, rate = rp)
y_next      <- rpois(N, lambda = lambda_post)

cat("E(Y' | y)     =", round(mean(y_next), 3), "\n")
E(Y' | y)     = 1.82 
Code
cat("P(Y' = 0 | y) =", round(mean(y_next == 0), 3),
    "  (analytic NB:", round(dnbinom(0, size = sp, prob = rp/(rp + 1)), 3), ")\n")
P(Y' = 0 | y) = 0.174   (analytic NB: 0.175 )
Code
hist(y_next, breaks = -0.5:(max(y_next) + 0.5), probability = TRUE,
     main = "Posterior predictive: next survey",
     xlab = "Whale sightings", col = "steelblue", border = "white")

The predictive distribution is right-skewed over 0–6 sightings. \(E(Y' \mid y) \approx 1.82\) (it equals the posterior mean of \(\lambda\)), and \(P(Y' = 0 \mid y) \approx 0.17\) — the simulation matches the analytic Negative-Binomial value (0.175).

c.

Code
y_block <- replicate(N, sum(rpois(5, lambda = rgamma(1, sp, rp))))
cat("E(Y_total | y)       =", round(mean(y_block), 2), "\n")
E(Y_total | y)       = 9.09 
Code
cat("P(Y_total >= 10 | y) =", round(mean(y_block >= 10), 3), "\n")
P(Y_total >= 10 | y) = 0.42 

About 9.1 sightings expected over the 5-survey block (\(5 \times\) the per-survey mean), with roughly a 42% chance of 10 or more.

d. Model answer (≤70 words). Reporting only the mean (≈1.8) discards all predictive uncertainty — and 1.8 is not even an observable count. They should report the full posterior-predictive distribution, or at least a predictive interval (here a 90% interval is roughly \([0, 4]\) sightings) plus informative summaries such as \(P(Y' = 0 \mid y)\). A bare point estimate cannot distinguish a confident forecast from a highly uncertain one.

Marking: identifies that a point estimate discards predictive uncertainty (0.5); proposes reporting the predictive distribution or a predictive interval / tail probabilities instead (0.5).


Mock Exam 2 (Last Year’s Resit)

Last year’s digital resit (8 July 2025), reproduced as a practice paper. Six exercises, 18 points, spanning Lectures 1–5. Same open-book ANS-with-R format as your exam.

Q1: Bayes’ Rule (1 pt)

(Lecture 1 / Ch. 2)

Sophie is trying to guess where a friend spent a particular summer day. She remembers that 40% of her friends stayed in the Netherlands and 60% went on holiday to southern Europe, but not who went where. The chance of a great beach day is only 0.10 in the Netherlands, but 0.90 in southern Europe. One day the friend texts: “Today was a perfect beach day!”

Given the friend had a great beach day, what is the probability they spent the day in the Netherlands?

Attempt before looking at the solution.

By Bayes’ rule, with \(B\) = “stayed in the Netherlands” and \(A\) = “great beach day”:

\[P(B \mid A) = \frac{P(A \mid B)\,P(B)}{P(A \mid B)\,P(B) + P(A \mid B^c)\,P(B^c)}.\]

Code
(0.10 * 0.40) / (0.10 * 0.40 + 0.90 * 0.60)
[1] 0.06896552

So \(P(\text{Netherlands} \mid \text{beach day}) \approx 0.069\): even though beach days are far more common abroad, hearing about a great beach day makes the Netherlands quite unlikely.

Q2: The Beta-Binomial Model (4.25 pts)

(Lecture 2 / Ch. 3)

A researcher models a success probability with a \(\text{Beta}(3, 10)\) prior and collects \(n = 100\) trials. The Beta-Binomial update gives the posterior \(\text{Beta}(24, 89)\).

a. [0.25] How many successes and how many failures were observed?

b. [0.5] Write the summarize_beta_binomial() call that produces this prior-to-posterior summary.

c. [0.5] Write the code that plots the prior and posterior in one figure, without the scaled likelihood.

d. [3] (Describe, max 130 words.) Describe how the posterior has shifted relative to the prior in terms of its mean, mode, and standard deviation, and what this shift says about how the data updated your beliefs.

Attempt before looking at the solution.

a. The update is \(\text{Beta}(\alpha+y,\ \beta+n-y) = \text{Beta}(3+y,\ 10+n-y) = \text{Beta}(24, 89)\), so \(y = 21\) successes and \(n - y = 79\) failures.

b.

Code
summarize_beta_binomial(alpha = 3, beta = 10, y = 21, n = 100)

c.

Code
plot_beta_binomial(alpha = 3, beta = 10, y = 21, n = 100, likelihood = FALSE)

d. The posterior mean falls slightly (0.23 to 0.21) while the posterior mode rises slightly (0.18 to 0.21). The two move in opposite directions because the prior is right-skewed, and updating pulls the mean and mode together. The posterior is also far more concentrated: the standard deviation drops from about 0.11 to about 0.04, reflecting much greater certainty. In short the data broadly agreed with the prior; rather than overturning our beliefs, they sharpened them, leaving the estimate near 0.21 but with much less uncertainty.

Marking: mean shifts down and mode shifts up, with the opposite directions attributed to the prior’s skewness; SD shrinks (greater precision); actual prior/posterior values stated; data seen to reinforce rather than overturn the prior.

Q3: Match the Beta Posteriors (2.75 pts)

(Lecture 2 / Ch. 3)

You are modelling a success probability. In each case you start with a Beta prior and observe some successes and failures:

Case Prior Observed data
A Beta(1, 1) 3 successes, 6 failures
B Beta(2, 2) 6 successes, 4 failures
C Beta(1, 4) 4 successes, 2 failures

a. [0.75] Determine the posterior distribution for each case.

b. [1.5] Compute the standard deviation of each posterior.

c. [0.5] Rank the posteriors from highest to lowest confidence about the success probability.

Attempt before looking at the solution.

a. Update each with \(\text{Beta}(\alpha + s,\ \beta + f)\):

  • A: \(\text{Beta}(1+3,\ 1+6) = \text{Beta}(4, 7)\)
  • B: \(\text{Beta}(2+6,\ 2+4) = \text{Beta}(8, 6)\)
  • C: \(\text{Beta}(1+4,\ 4+2) = \text{Beta}(5, 6)\)

b.

Code
post_sd <- function(a, b) sqrt(a * b / ((a + b)^2 * (a + b + 1)))
round(c(A = post_sd(4, 7), B = post_sd(8, 6), C = post_sd(5, 6)), 3)
    A     B     C 
0.139 0.128 0.144 

c. Higher confidence means a smaller posterior SD, so the ranking is B (SD 0.128, highest confidence), then A (SD 0.139), then C (SD 0.144, lowest confidence).

Q4: Sequential Updating with the Bechdel Data (2.5 pts)

(Lecture 2 / Ch. 4)

The bechdel data in bayesrules records whether films pass the Bechdel test. John models \(\pi\), the proportion of films that pass, with a symmetric \(\text{Beta}(2, 2)\) prior and analyses one year at a time.

a. [0.5] John analyses the 1995 films. Give the posterior, its mean, and its mode.

b. [0.5] The next day he analyses the 2005 films, building on the previous day’s posterior. Give the posterior, mean, and mode.

c. [0.5] On the third day he analyses the 2013 films, building on the previous two analyses. Give the posterior, mean, and mode.

d. [1] (Explain, max 70 words.) Jenna instead analyses 1995, 2005 and 2013 jointly, all at once. Under what conditions will her posterior be identical to John’s?

Attempt before looking at the solution.

a.–c. Each year adds its passes and failures, and yesterday’s posterior becomes today’s prior. Starting from \(\text{Beta}(2, 2)\):

Code
data(bechdel, package = "bayesrules")
a <- 2; b <- 2
for (yr in c(1995, 2005, 2013)) {
  d <- bechdel[bechdel$year == yr, ]
  y <- sum(d$binary == "PASS"); n <- nrow(d)
  a <- a + y; b <- b + (n - y)
  cat(yr, ": Beta(", a, ",", b, ")  mean =", round(a/(a+b), 3),
      " mode =", round((a-1)/(a+b-2), 3), "\n")
}
1995 : Beta( 20 , 20 )  mean = 0.5  mode = 0.5 
2005 : Beta( 74 , 66 )  mean = 0.529  mode = 0.529 
2013 : Beta( 120 , 119 )  mean = 0.502  mode = 0.502 

So 1995 gives \(\text{Beta}(20, 20)\) (mean and mode 0.50), 2005 gives \(\text{Beta}(74, 66)\) (mean 0.529), and 2013 gives \(\text{Beta}(120, 119)\) (mean 0.502).

d. Model answer (≤70 words). If Jenna uses the same \(\text{Beta}(2, 2)\) prior and the same combined data, the two posteriors are identical. In Bayesian updating, processing the data sequentially or all at once gives the same posterior, as long as the prior and the total data are the same.

Marking: states the conditions (same prior and same data) and recognises that sequential and batch updating coincide.

Q5: Tuning a Gamma Prior (2 pts)

(Lecture 3 / Ch. 5)

Analysts model \(\lambda\), the average number of three-pointers made per NBA game.

a. [1] Based on past seasons they expect a mean of 30 with a variance of 60. Construct a Gamma prior for \(\lambda\).

b. [1] Using the current season (284 games, 858 three-pointers in total), compute the posterior mean.

Attempt before looking at the solution.

a. Moment-matching for \(\text{Gamma}(s, r)\) uses mean \(= s/r\) and variance \(= s/r^2\), so \(r = \text{mean}/\text{variance} = 30/60 = 0.5\) and \(s = \text{mean} \times r = 15\). The prior is \(\text{Gamma}(15, 0.5)\).

b. The Gamma-Poisson update is \(\text{Gamma}(s + \sum y_i,\ r + n)\) with \(\sum y_i = 858\) and \(n = 284\) games:

Code
s <- 15; r <- 0.5
sp <- s + 858; rp <- r + 284
cat("Posterior: Gamma(", sp, ",", rp, "),  mean =", round(sp/rp, 4), "\n")
Posterior: Gamma( 873 , 284.5 ),  mean = 3.0685 

The posterior mean is about 3.07.

Q6: Customer Complaints (5.5 pts)

(Lecture 5 / Ch. 8)

A call centre evaluates its complaint rate \(\lambda\) (average complaints per hour) using a \(\text{Gamma}(2, 1)\) prior (shape-rate). After observing \(y = 12\) complaints in \(t = 4\) hours, they update to the posterior \(\text{Gamma}(14, 5)\).

a. [1] The centre aims to keep the rate below 3 per hour. Compute \(P(\lambda < 3 \mid y)\).

b. [1] Compute the prior odds and posterior odds for \(H_0: \lambda \geq 3\) versus \(H_1: \lambda < 3\).

c. [1] Use part (b) to compute \(BF_{10}\).

d. [1.5] (Interpret, max 70 words.) What does this Bayes factor say about the strength of evidence in the data?

e. [1] Test the point hypothesis \(H_0: \lambda = 3\) versus \(H_1: \lambda \neq 3\) with the Savage-Dickey density ratio. Compute \(BF_{01}\).

Attempt before looking at the solution.

a.

Code
pgamma(3, shape = 14, rate = 5)
[1] 0.6367822

\(P(\lambda < 3 \mid y) \approx 0.637\).

b.

Code
prior_p <- pgamma(3, 2, 1)    # P(lambda < 3) under the prior
post_p  <- pgamma(3, 14, 5)   # P(lambda < 3 | y)
cat("Prior odds  (H1/H0):", round(prior_p/(1 - prior_p), 3), "\n")
Prior odds  (H1/H0): 4.021 
Code
cat("Posterior odds (H1/H0):", round(post_p/(1 - post_p), 3), "\n")
Posterior odds (H1/H0): 1.753 

Prior odds of \(H_1\) over \(H_0\) are about 4.02; posterior odds about 1.75. (Equivalently, \(H_0\) over \(H_1\): 0.25 and 0.57.)

c.

Code
BF10 <- (post_p/(1 - post_p)) / (prior_p/(1 - prior_p))
cat("BF10 =", round(BF10, 3), "\n")
BF10 = 0.436 

d. Model answer (≤70 words). \(BF_{10} \approx 0.44\) is below 1, so the data favour \(H_0\) (\(\lambda \geq 3\)) over \(H_1\) (\(\lambda < 3\)): the observed data are about \(1/0.44 \approx 2.3\) times more likely under \(H_0\) than under \(H_1\). The observed rate of 3 per hour pulled belief toward higher rates, so this is only weak evidence, and it points away from the “below 3” hypothesis.

Marking: states that the evidence favours \(H_0\); translates 0.44 into “about 2.3 times more likely under \(H_0\)”; uses probabilistic language rather than “reject” or “significant”.

e.

Code
cat("BF01 (Savage-Dickey) =", round(dgamma(3, 14, 5) / dgamma(3, 2, 1), 3), "\n")
BF01 (Savage-Dickey) = 3.201 

\(BF_{01} \approx 3.20\): the posterior density at \(\lambda = 3\) is about three times the prior density there, so the data are about three times more consistent with exactly \(\lambda = 3\) than the diffuse alternative expected. Moderate evidence for the point value 3.


Exam Tips

What to Expect

  • Duration: 2 hours (09:00–11:00) · Format: open-book ANS exam with R, no internet. The textbook is provided as a PDF, and you may bring one A4 cheat sheet (written on both sides).
  • Content: All lecture material (Lectures 1–5, Bayes Rules! Ch. 1–8). Questions are re-skinned versions of the end-of-chapter exercises.

Study Checklist (Lectures 1–5)

The exam covers all lecture material. Mock Exam 1 focuses on Lectures 3–5 and Mock Exam 2 (last year’s paper) spans Lectures 1–5; work through every box below regardless.

Lecture 1: Introduction & Bayes’ Rule (Ch. 1–2)

  • Bayesian vs frequentist knowledge-building; the two interpretations of probability (long-run frequency vs plausibility)
  • What the prior, likelihood, and posterior each represent
  • Conditional vs unconditional probability; why order matters (\(P(A\mid B)\neq P(B\mid A)\)); independence (\(P(A\mid B)=P(A)\))
  • Likelihood vs conditional probability: the \(L(B\mid A)\) notation, and why a likelihood need not sum to 1
  • Joint probability \(P(A\cap B)=P(A\mid B)\,P(B)\) and marginal probability (law of total probability)
  • Bayes’ rule for events, \(P(B\mid A)=\dfrac{P(A\mid B)\,P(B)}{P(A)}\), and getting a posterior from a prior-and-likelihood table

Lecture 2: The Beta-Binomial Model & Sequential Testing (Ch. 3–4)

  • From a single number to a continuous prior; properties of a pdf \(f(\pi)\) (integrates to 1, area = probability, \(P(\pi=\pi_0)=0\), may exceed 1)
  • The Beta prior: tuning \(\alpha,\beta\) (successes/failures + 1; \(\alpha=\beta=1\) uniform; below 1 gives a U-shape); reading Beta shapes
  • Beta summaries: mean \(\alpha/(\alpha+\beta)\), mode \((\alpha-1)/(\alpha+\beta-2)\), variance
  • Choosing and justifying a prior; the Binomial data model \(Y\mid\pi\sim\text{Binomial}(n,\pi)\) and its likelihood
  • The Beta posterior (conjugacy): \(\pi\mid y\sim\text{Beta}(\alpha+y,\ \beta+n-y)\)
  • Sequential updating: adding data in batches or one point at a time gives the same final posterior
  • R: plot_beta(), plot_beta_binomial(), summarize_beta_binomial()

Lecture 3: Conjugate Models & Choosing Your Prior (Ch. 5)

  • What conjugacy means and why it yields a closed-form posterior
  • The three conjugate update rules: Beta-Binomial, Gamma-Poisson (\(\lambda\mid y\sim\text{Gamma}(s+\sum y_i,\ r+n)\)), Normal-Normal
  • Prior effective sample size (ESS) and how prior vs data weight the posterior
  • Prior predictive check; prior sensitivity analysis

Lecture 4: Grid Approximation & MCMC (Ch. 6–7)

  • Why a non-conjugate posterior (e.g., a truncated prior) has no closed form
  • Grid approximation: evaluate the unnormalised posterior on a grid, normalise, then summarise or sample
  • Metropolis MCMC: the intuition and the one-step acceptance probability \(\alpha=\min(1,\text{ratio of unnormalised posteriors})\)
  • Chain diagnostics: trace plots (good mixing vs step size too small or too large), effective sample size, \(\hat R\), autocorrelation
  • Using posterior samples for summaries (the workflow is the same however the samples were obtained)

Lecture 5: Inference, Prediction & Hypothesis Testing (Ch. 8)

  • Posterior estimation: location (mean/median/mode) and spread; credible intervals, central/ETI vs HPD/HDI
  • Posterior prediction: the posterior-predictive distribution (Beta-Binomial; Negative-Binomial for Gamma-Poisson) and why it is wider than a plug-in
  • Hypothesis testing: interval (one- and two-sided) hypotheses via prior and posterior odds Bayes factors
  • Point hypotheses via the Savage-Dickey density ratio

Key Formulas to Have at Hand

Conjugate updates

\(\theta \mid y \sim \text{Beta}(\alpha+y,\;\beta+n-y)\)

\(\lambda \mid \mathbf{y} \sim \text{Gamma}(s+\sum y_i,\;r+n)\)

\(\mu_\text{post} = w_\text{pr}\,\mu_0 + w_\text{d}\,\bar{y}\)

Testing

\(\text{Post odds} = BF_{10} \times \text{Prior odds}\)

\(BF_{10} = f(\theta_0)\,/\,f(\theta_0 \mid y)\) (Savage–Dickey)

\(BF_{+0} = \dfrac{P(\theta > \theta_0 \mid y)}{P(\theta > \theta_0)}\)

Credible interval: direct probability statement, \(P(\theta \in [L,U] \mid y) = 0.95\).

ETI: qbeta(c(0.025, 0.975), a, b)


Open Q&A: any topic from the past three weeks. Exam is Friday, June 19, 09:00–11:00, IWO 4.04C (Blauw). Good luck!