Code
alpha <- 5; beta <- 5; y <- 42; n <- 60
ap <- alpha + y; bp <- beta + n - y
cat("Posterior: Beta(", ap, ",", bp, ")\n")Posterior: Beta( 47 , 23 )
By the end of this lecture, you will be able to:
Reading: van Doorn et al. (2021), Psychonomic Bulletin & Review. doi: 10.3758/s13423-020-01798-5
Exam: Friday, June 19, 09:00–11:00, IWO 4.04C (Blauw). Open-book ANS exam with R (no internet); the textbook is provided as a PDF and you may bring one A4 cheat sheet (both sides). Covers all lecture material (L1–L5).
A Bayesian analysis is only as good as its communication. Two researchers with the same data but different priors can legitimately reach different conclusions. That’s fine, but readers must be able to evaluate and reproduce the analysis.
Van Doorn et al. identify four stages of a Bayesian study:
\[\text{Planning} \;\rightarrow\; \text{Executing} \;\rightarrow\; \text{Interpreting} \;\rightarrow\; \text{Reporting}\]
What to decide before you see the data
A Bayesian analysis does not safeguard against model misspecification. Anscombe’s quartet applies here too; always inspect your data.
For hypothesis testing (BF):
For estimation (posterior):
Model specification
Checking
Results
Transparency
Use these
Avoid these
We modelled the probability of correct beer identification (\(\theta\)) with prior \(\text{Beta}(6, 6)\) (ESS = 12, symmetric, no directional expectation). After observing \(y = 42\) correct identifications out of \(n = 57\), the posterior was \(\text{Beta}(48, 21)\), with median 0.696 (95% ETI: [0.578, 0.806]). A Bayes factor of \(BF_{10} = 58.3\) indicated very strong evidence that performance exceeds chance (\(\theta = 0.5\)). A sensitivity analysis under \(\text{Beta}(1,1)\) yielded \(BF_{10} = 43.1\), supporting the robustness of this conclusion. Data and analysis code are available at [OSF link].
Wrong: “There is a 95% probability that the true \(\theta\) is between 0.51 and 0.78 in repeated sampling.”
Correct: “Given the data and the prior, there is a 95% probability that \(\theta\) is between 0.51 and 0.78.”
The frequentist “repeated sampling” clause belongs to the confidence interval, not the Bayesian credible interval.
Wrong: “\(BF_{10} = 9\) means there is a 90% chance that \(H_1\) is true.”
Correct: “\(BF_{10} = 9\) means the data are 9 times more likely under \(H_1\) than under \(H_0\). Converting to posterior probability requires specifying the prior odds.”
\[P(H_1 \mid y) = \frac{BF_{10} \times P(H_1)}{BF_{10} \times P(H_1) + P(H_0)}\]
Wrong: “\(BF_{10} = 1.2\), there is no effect.”
Correct: “\(BF_{10} = 1.2\) is anecdotal. The data were uninformative: they did not substantially shift beliefs in either direction.”
Contrast with \(BF_{01} = 14\): this is positive evidence for \(H_0\).
Wrong: “The 95% CI excludes 0.5, so \(H_0: \theta = 0.5\) is rejected.”
Correct: Credible intervals are for estimation. Use the Bayes factor for hypothesis testing. The CI and the BF answer different questions.
| Question | Tool |
|---|---|
| Does an effect exist? | Bayes factor |
| How large is the effect? | Posterior + CI |
| What data should we expect next? | Posterior predictive |
We tested a mindfulness intervention on working memory (\(n = 15\)). \(BF_{10} = 4.3\), so we conclude the null is false. The CI [39.8, 42.6] proves the intervention is effective. No robustness analysis was needed since results are clear.
Find 3 errors in this report.
One-sided Bayesian \(t\)-test (\(n_1 = 53\), \(n_2 = 57\)). Informed prior: \(t(0.35, 0.102, 3)\). Result: \(BF_{0-} = 11.5\). This means \(H_0\) is 11.5× more probable than \(H_-\).
Find 1 error in this report.
A full practice paper (16.5 pts) in the style of last year’s exam. It is deliberately longer than the real exam (10 pts, 3 questions) to give you more to work through, and it covers Lectures 3–5 plus a model-checking simulation. (Lectures 1–2 are covered in Mock Exam 2 below.)
Q1–Q3 are by-hand computational questions; Q4 is a with-R simulation question.
Attempt each question under exam conditions before opening its solution.
(Lecture 3 / Ch. 5; cf. Beta–Binomial exercises)
A food-delivery app wants to estimate \(\theta\), the proportion of customers who tip their driver. Based on a pilot study, they adopt a \(\text{Beta}(5, 5)\) prior. In the new cohort, 42 out of 60 customers tip.
a. [1] Identify the data model and conjugate prior family. Write the posterior update rule and state the resulting posterior.
b. [1] Compute the prior mean, observed proportion, and posterior mean. Explain the influence of the prior vs data.
c. [0.75] Compute the 95% equal-tailed credible interval and interpret it.
d. [0.75] Compute \(P(\theta > 0.70 \mid y)\).
e. [1] (Describe, max 90 words.) Describe how the posterior has shifted relative to the prior — in terms of its mean, mode, and standard deviation, and explain what this shift indicates about how the data updated your beliefs about \(\theta\).
Attempt before looking at the solution.
a. \(Y \mid \theta \sim \text{Binomial}(60, \theta)\); conjugate prior \(\theta \sim \text{Beta}(\alpha,\beta)\). Update: \(\theta \mid y \sim \text{Beta}(\alpha + y,\ \beta + n - y)\).
alpha <- 5; beta <- 5; y <- 42; n <- 60
ap <- alpha + y; bp <- beta + n - y
cat("Posterior: Beta(", ap, ",", bp, ")\n")Posterior: Beta( 47 , 23 )
b.
cat("Prior mean: ", round(alpha/(alpha+beta), 3), "\n")Prior mean: 0.5
cat("Observed prop: ", round(y/n, 3), "\n")Observed prop: 0.7
cat("Posterior mean: ", round(ap/(ap+bp), 3), "\n")Posterior mean: 0.671
cat("Prior ESS: ", alpha + beta, "\n")Prior ESS: 10
The posterior mean (0.671) is a weighted average of the prior mean (0.50) and the observed proportion (0.70). With prior ESS = 10 vs n = 60, the data carry ~6× the weight, so the posterior sits close to \(\bar y\), only slightly pulled toward the prior.
c.
round(qbeta(c(0.025, 0.975), ap, bp), 3)[1] 0.558 0.776
Given the data and prior, there is a 95% probability that \(\theta\) lies in this interval.
d.
pbeta(0.70, ap, bp, lower.tail = FALSE)[1] 0.3134553
About a 31% posterior probability that more than 70% of customers tip.
e.
prior_mode <- (alpha - 1) / (alpha + beta - 2)
post_mode <- (ap - 1) / (ap + bp - 2)
prior_sd <- sqrt(alpha*beta / ((alpha+beta)^2 * (alpha+beta+1)))
post_sd <- sqrt(ap*bp / ((ap+bp)^2 * (ap+bp+1)))
cat("Prior: mean", round(alpha/(alpha+beta),3), " mode", round(prior_mode,3), " sd", round(prior_sd,3), "\n")Prior: mean 0.5 mode 0.5 sd 0.151
cat("Posterior: mean", round(ap/(ap+bp),3), " mode", round(post_mode,3), " sd", round(post_sd,3), "\n")Posterior: mean 0.671 mode 0.676 sd 0.056
The posterior has shifted upward — both the mean (0.50 → 0.671) and the mode (0.50 → 0.677) move about 0.17 toward the observed proportion of 0.70 — and tightened sharply, with the SD falling roughly threefold (0.151 → 0.056). The 60 observations have refined a fairly diffuse, symmetric Beta(5, 5) prior into a much more precise, slightly right-leaning posterior: our belief about the tipping rate has moved from “around half, very uncertain” to “about two-thirds, fairly precisely.”
Marking: upward shift in mean and mode (0.25); reduced SD / increased precision (0.25); state the actual prior and posterior values (0.25); interpret the shift as the data sharpening a diffuse prior toward the observed proportion (0.25).
(Lecture 4 / Ch. 6–7; cf. grid + Metropolis exercises)
A researcher models \(\theta\), the probability of a new drug being effective. They use a \(\text{Beta}(2, 2)\) prior truncated to \([0.3, 0.7]\). In a trial of 20 patients, 12 show improvement.
a. [0.5] Explain why this truncated prior has no closed-form conjugate posterior.
b. [1] The researcher writes the following recipe on the whiteboard:
- Build a fine grid \(\theta_1, \ldots, \theta_K\) on \([0, 1]\).
- At each \(\theta_i\), compute \(u_i = f(\theta_i)\, L(y \mid \theta_i)\) using the truncated Beta prior and Binomial likelihood.
- Normalise: \(w_i = u_i / \sum_k u_k\).
- Report posterior summaries as weighted averages over the grid.
[0.5] Which numerical method does this describe? Pick one and justify in one sentence: (A) Metropolis–Hastings MCMC · (B) Grid approximation · (C) Conjugate Beta–Binomial update · (D) Posterior-predictive Monte Carlo.
[0.5] Applying the recipe gives a posterior mean of \(\bar\theta \approx 0.562\). The untruncated conjugate posterior \(\text{Beta}(2 + 12,\; 2 + 8) = \text{Beta}(14, 10)\) has mean \(\approx 0.583\). What does the comparison tell you about the truncation here?
c. [1] A random-walk Metropolis sampler on the untruncated model is at \(\theta_c = 0.55\) and proposes \(\theta_p = 0.60\). Using the unnormalised posterior, compute the Metropolis acceptance probability \(\alpha\). Is the move accepted with certainty?
d. [0.5] You rerun the sampler with three step sizes. One trace plot is a flat caterpillar around the mode; one drifts slowly and never settles; one gets “stuck” on long plateaus. Match each pattern to the diagnosis: good mixing, step size too small, step size too large.
Attempt before looking at the solution.
a. The Beta–Binomial conjugate update needs a proper Beta prior on \([0, 1]\). Truncating to \([0.3, 0.7]\) takes the prior out of the Beta family, so the posterior is no longer Beta and has no closed form.
b. (i) (B) Grid approximation. No accept/reject step (so not MCMC), no analytical update (so not the conjugate Beta–Binomial), and the recipe produces posterior samples for \(\theta\) rather than predictions for future \(y\) (so not posterior-predictive Monte Carlo).
c.
y <- 12; n <- 20
logpost <- function(t) dbeta(t, 2, 2, log = TRUE) + dbinom(y, n, t, log = TRUE)
cat("Acceptance probability:", round(min(1, exp(logpost(0.60) - logpost(0.55))), 3), "\n")Acceptance probability: 1
\(\alpha = 1\): the proposed value \(\theta_p = 0.60\) is more plausible than the current \(\theta_c = 0.55\) (the MLE sits exactly at \(12/20 = 0.60\)), so the unnormalised posterior ratio exceeds 1 and the move is accepted with certainty.
d. Flat caterpillar around the mode = good mixing. Slow drift that never settles = step size too small (high autocorrelation, the chain barely explores). Long stuck plateaus = step size too large (proposals overshoot into low-probability regions and are repeatedly rejected).
(Lecture 5 / Ch. 8; cf. posterior-summary, prediction & Bayes-factor exercises)
A clinic models \(\lambda\), the mean number of emergency calls per hour, with a \(\text{Gamma}(3, 1)\) prior. Over 8 hours they record 15 calls.
a. [0.75] Give the posterior distribution, its mean, and a 95% equal-tailed credible interval.
b. [1] Predict the number of calls next hour. Compute \(E(Y' \mid y)\) and \(P(Y' \ge 2 \mid y)\) using the posterior-predictive (Negative-Binomial) distribution. Why is this wider than plugging the posterior mean into a single Poisson?
c. [1] Test \(H_+: \lambda > 2\) vs \(H_-: \lambda < 2\). Compute \(BF_{+-}\) from the prior and posterior odds.
d. [0.75] Test \(H_0: \lambda = 2\) using the Savage–Dickey density ratio. What does \(BF_{01}\) say about the value \(\lambda = 2\)?
e. [1.5] (Interpret, max 70 words.) State in words what the Bayes factor from (c) tells you about the call rate, using calibrated language. What does it license you to conclude, and what does it not?
Attempt before looking at the solution.
a. \(Y_i\mid\lambda \overset{iid}{\sim}\text{Poisson}(\lambda)\); posterior \(\lambda\mid y \sim \text{Gamma}(s+\sum y,\ r+n)\).
s <- 3; r <- 1; sumy <- 15; nobs <- 8
sp <- s + sumy; rp <- r + nobs
cat("Posterior: Gamma(", sp, ",", rp, "), mean =", round(sp/rp, 3), "\n")Posterior: Gamma( 18 , 9 ), mean = 2
cat("95% ETI:", round(qgamma(c(0.025, 0.975), sp, rp), 3), "\n")95% ETI: 1.185 3.024
b.
cat("E(Y' | y) =", round(sp/rp, 3), "\n")E(Y' | y) = 2
cat("P(Y' >= 2 | y) =", round(1 - pnbinom(1, size = sp, prob = rp/(rp+1)), 3), "\n")P(Y' >= 2 | y) = 0.58
The posterior-predictive marginalises over the full posterior uncertainty in \(\lambda\). Plugging the posterior mean into a single \(\text{Poisson}(\hat\lambda)\) fixes \(\lambda\) at one value and ignores that uncertainty, so it is too narrow and over-confident; the Negative-Binomial is correctly wider.
c.
prior_above <- pgamma(2, s, r, lower.tail = FALSE) # P(lambda > 2)
post_above <- pgamma(2, sp, rp, lower.tail = FALSE) # P(lambda > 2 | y)
BF_plusminus <- (post_above/(1 - post_above)) / (prior_above/(1 - prior_above))
cat("Prior P(l>2):", round(prior_above,3), " Posterior P(l>2):", round(post_above,3), "\n")Prior P(l>2): 0.677 Posterior P(l>2): 0.469
cat("BF(+, -) =", round(BF_plusminus, 3),
" (equivalently BF(-, +) =", round(1/BF_plusminus, 2), ")\n")BF(+, -) = 0.421 (equivalently BF(-, +) = 2.37 )
\(BF_{+-} \approx 0.42 < 1\), so the data favour \(H_-\) (rate below 2) over \(H_+\). Equivalently \(BF_{-+} \approx 2.4\) — only anecdotal evidence for \(\lambda < 2\) on the Kass & Raftery scale. The data nudged \(P(\lambda > 2)\) down from 0.68 to 0.47, but not decisively.
d.
cat("BF01 =", round(dgamma(2, sp, rp) / dgamma(2, s, r), 3), "\n")BF01 = 3.112
cat("BF10 =", round(dgamma(2, s, r) / dgamma(2, sp, rp), 3), "\n")BF10 = 0.321
\(BF_{01} \approx 3.1\): the posterior density at \(\lambda = 2\) is about three times the prior density there, so the data are ~3× more consistent with \(\lambda = 2\) than the diffuse alternative expected — moderate evidence for the value 2, not against it. Note that (c) and (d) answer different questions: a one-sided direction test versus a point test.
e. Model answer (≤70 words). The Bayes factor \(BF_{+-} \approx 0.42\) means the observed data are about 2.4 times more likely under \(H_-\) (\(\lambda < 2\)) than under \(H_+\) (\(\lambda > 2\)) — anecdotal, not compelling, evidence for a call rate below 2 per hour. It licenses a tentative lean toward the lower rate; it does not prove \(\lambda < 2\), nor give the probability that \(H_-\) is true (that would need the prior odds).
Marking: correct direction — evidence favours \(H_-\) (0.5); translates 0.42 into “~2.4× more likely under \(H_-\)” (0.5); uses calibrated/probabilistic language (not “reject”/“significant”) and notes BF \(\neq\) posterior probability (0.5).
(Lecture 5 / Ch. 8; cf. posterior-prediction exercises, with-R question)
A marine ecologist models \(\lambda\), the mean number of whale sightings per one-hour boat survey, with a \(\text{Gamma}(2, 1)\) prior. Over 10 surveys the counts are
\[y = (2,\; 0,\; 3,\; 1,\; 2,\; 4,\; 1,\; 0,\; 3,\; 2).\]
a. [0.5] Derive the posterior analytically.
b. [1.5] Write R code to simulate 50,000 draws from the posterior-predictive distribution for a single future survey. Plot it, and compute \(E(Y' \mid y)\) and \(P(Y' = 0 \mid y)\). Confirm \(P(Y'=0\mid y)\) against the analytic Negative-Binomial value.
c. [1] The team plans a block of 5 future surveys. Simulate the total number of sightings across the block and estimate \(E(Y_\text{total} \mid y)\) and \(P(Y_\text{total} \ge 10 \mid y)\).
d. [1] (Interpret, max 70 words.) A colleague wants to report only the predictive mean from (b) as “the” forecast for the next survey, with no uncertainty. What is wrong with this, and what should they report instead?
Attempt before looking at the solution.
a. \(Y_i \mid \lambda \overset{iid}{\sim} \text{Poisson}(\lambda)\); conjugate update \(\lambda \mid y \sim \text{Gamma}(s + \sum y_i,\ r + n)\).
y <- c(2, 0, 3, 1, 2, 4, 1, 0, 3, 2)
s0 <- 2; r0 <- 1 # prior Gamma(2, 1)
sp <- s0 + sum(y); rp <- r0 + length(y)
cat("Posterior: Gamma(", sp, ",", rp, "), mean =", round(sp/rp, 3), "\n")Posterior: Gamma( 20 , 11 ), mean = 1.818
\(\sum y_i = 18\), \(n = 10\), so \(\lambda \mid y \sim \text{Gamma}(20, 11)\) with mean \(20/11 \approx 1.82\).
b. Draw \(\lambda^{(s)}\) from the posterior, then \(Y'^{(s)} \sim \text{Poisson}(\lambda^{(s)})\) — this propagates posterior uncertainty into the prediction.
set.seed(2026)
N <- 50000
lambda_post <- rgamma(N, shape = sp, rate = rp)
y_next <- rpois(N, lambda = lambda_post)
cat("E(Y' | y) =", round(mean(y_next), 3), "\n")E(Y' | y) = 1.82
cat("P(Y' = 0 | y) =", round(mean(y_next == 0), 3),
" (analytic NB:", round(dnbinom(0, size = sp, prob = rp/(rp + 1)), 3), ")\n")P(Y' = 0 | y) = 0.174 (analytic NB: 0.175 )
hist(y_next, breaks = -0.5:(max(y_next) + 0.5), probability = TRUE,
main = "Posterior predictive: next survey",
xlab = "Whale sightings", col = "steelblue", border = "white")The predictive distribution is right-skewed over 0–6 sightings. \(E(Y' \mid y) \approx 1.82\) (it equals the posterior mean of \(\lambda\)), and \(P(Y' = 0 \mid y) \approx 0.17\) — the simulation matches the analytic Negative-Binomial value (0.175).
c.
y_block <- replicate(N, sum(rpois(5, lambda = rgamma(1, sp, rp))))
cat("E(Y_total | y) =", round(mean(y_block), 2), "\n")E(Y_total | y) = 9.09
cat("P(Y_total >= 10 | y) =", round(mean(y_block >= 10), 3), "\n")P(Y_total >= 10 | y) = 0.42
About 9.1 sightings expected over the 5-survey block (\(5 \times\) the per-survey mean), with roughly a 42% chance of 10 or more.
d. Model answer (≤70 words). Reporting only the mean (≈1.8) discards all predictive uncertainty — and 1.8 is not even an observable count. They should report the full posterior-predictive distribution, or at least a predictive interval (here a 90% interval is roughly \([0, 4]\) sightings) plus informative summaries such as \(P(Y' = 0 \mid y)\). A bare point estimate cannot distinguish a confident forecast from a highly uncertain one.
Marking: identifies that a point estimate discards predictive uncertainty (0.5); proposes reporting the predictive distribution or a predictive interval / tail probabilities instead (0.5).
Last year’s digital resit (8 July 2025), reproduced as a practice paper. Six exercises, 18 points, spanning Lectures 1–5. Same open-book ANS-with-R format as your exam.
(Lecture 1 / Ch. 2)
Sophie is trying to guess where a friend spent a particular summer day. She remembers that 40% of her friends stayed in the Netherlands and 60% went on holiday to southern Europe, but not who went where. The chance of a great beach day is only 0.10 in the Netherlands, but 0.90 in southern Europe. One day the friend texts: “Today was a perfect beach day!”
Given the friend had a great beach day, what is the probability they spent the day in the Netherlands?
Attempt before looking at the solution.
By Bayes’ rule, with \(B\) = “stayed in the Netherlands” and \(A\) = “great beach day”:
\[P(B \mid A) = \frac{P(A \mid B)\,P(B)}{P(A \mid B)\,P(B) + P(A \mid B^c)\,P(B^c)}.\]
(0.10 * 0.40) / (0.10 * 0.40 + 0.90 * 0.60)[1] 0.06896552
So \(P(\text{Netherlands} \mid \text{beach day}) \approx 0.069\): even though beach days are far more common abroad, hearing about a great beach day makes the Netherlands quite unlikely.
(Lecture 2 / Ch. 3)
A researcher models a success probability with a \(\text{Beta}(3, 10)\) prior and collects \(n = 100\) trials. The Beta-Binomial update gives the posterior \(\text{Beta}(24, 89)\).
a. [0.25] How many successes and how many failures were observed?
b. [0.5] Write the summarize_beta_binomial() call that produces this prior-to-posterior summary.
c. [0.5] Write the code that plots the prior and posterior in one figure, without the scaled likelihood.
d. [3] (Describe, max 130 words.) Describe how the posterior has shifted relative to the prior in terms of its mean, mode, and standard deviation, and what this shift says about how the data updated your beliefs.
Attempt before looking at the solution.
a. The update is \(\text{Beta}(\alpha+y,\ \beta+n-y) = \text{Beta}(3+y,\ 10+n-y) = \text{Beta}(24, 89)\), so \(y = 21\) successes and \(n - y = 79\) failures.
b.
summarize_beta_binomial(alpha = 3, beta = 10, y = 21, n = 100)c.
plot_beta_binomial(alpha = 3, beta = 10, y = 21, n = 100, likelihood = FALSE)d. The posterior mean falls slightly (0.23 to 0.21) while the posterior mode rises slightly (0.18 to 0.21). The two move in opposite directions because the prior is right-skewed, and updating pulls the mean and mode together. The posterior is also far more concentrated: the standard deviation drops from about 0.11 to about 0.04, reflecting much greater certainty. In short the data broadly agreed with the prior; rather than overturning our beliefs, they sharpened them, leaving the estimate near 0.21 but with much less uncertainty.
Marking: mean shifts down and mode shifts up, with the opposite directions attributed to the prior’s skewness; SD shrinks (greater precision); actual prior/posterior values stated; data seen to reinforce rather than overturn the prior.
(Lecture 2 / Ch. 3)
You are modelling a success probability. In each case you start with a Beta prior and observe some successes and failures:
| Case | Prior | Observed data |
|---|---|---|
| A | Beta(1, 1) | 3 successes, 6 failures |
| B | Beta(2, 2) | 6 successes, 4 failures |
| C | Beta(1, 4) | 4 successes, 2 failures |
a. [0.75] Determine the posterior distribution for each case.
b. [1.5] Compute the standard deviation of each posterior.
c. [0.5] Rank the posteriors from highest to lowest confidence about the success probability.
Attempt before looking at the solution.
a. Update each with \(\text{Beta}(\alpha + s,\ \beta + f)\):
b.
post_sd <- function(a, b) sqrt(a * b / ((a + b)^2 * (a + b + 1)))
round(c(A = post_sd(4, 7), B = post_sd(8, 6), C = post_sd(5, 6)), 3) A B C
0.139 0.128 0.144
c. Higher confidence means a smaller posterior SD, so the ranking is B (SD 0.128, highest confidence), then A (SD 0.139), then C (SD 0.144, lowest confidence).
(Lecture 2 / Ch. 4)
The bechdel data in bayesrules records whether films pass the Bechdel test. John models \(\pi\), the proportion of films that pass, with a symmetric \(\text{Beta}(2, 2)\) prior and analyses one year at a time.
a. [0.5] John analyses the 1995 films. Give the posterior, its mean, and its mode.
b. [0.5] The next day he analyses the 2005 films, building on the previous day’s posterior. Give the posterior, mean, and mode.
c. [0.5] On the third day he analyses the 2013 films, building on the previous two analyses. Give the posterior, mean, and mode.
d. [1] (Explain, max 70 words.) Jenna instead analyses 1995, 2005 and 2013 jointly, all at once. Under what conditions will her posterior be identical to John’s?
Attempt before looking at the solution.
a.–c. Each year adds its passes and failures, and yesterday’s posterior becomes today’s prior. Starting from \(\text{Beta}(2, 2)\):
data(bechdel, package = "bayesrules")
a <- 2; b <- 2
for (yr in c(1995, 2005, 2013)) {
d <- bechdel[bechdel$year == yr, ]
y <- sum(d$binary == "PASS"); n <- nrow(d)
a <- a + y; b <- b + (n - y)
cat(yr, ": Beta(", a, ",", b, ") mean =", round(a/(a+b), 3),
" mode =", round((a-1)/(a+b-2), 3), "\n")
}1995 : Beta( 20 , 20 ) mean = 0.5 mode = 0.5
2005 : Beta( 74 , 66 ) mean = 0.529 mode = 0.529
2013 : Beta( 120 , 119 ) mean = 0.502 mode = 0.502
So 1995 gives \(\text{Beta}(20, 20)\) (mean and mode 0.50), 2005 gives \(\text{Beta}(74, 66)\) (mean 0.529), and 2013 gives \(\text{Beta}(120, 119)\) (mean 0.502).
d. Model answer (≤70 words). If Jenna uses the same \(\text{Beta}(2, 2)\) prior and the same combined data, the two posteriors are identical. In Bayesian updating, processing the data sequentially or all at once gives the same posterior, as long as the prior and the total data are the same.
Marking: states the conditions (same prior and same data) and recognises that sequential and batch updating coincide.
(Lecture 3 / Ch. 5)
Analysts model \(\lambda\), the average number of three-pointers made per NBA game.
a. [1] Based on past seasons they expect a mean of 30 with a variance of 60. Construct a Gamma prior for \(\lambda\).
b. [1] Using the current season (284 games, 858 three-pointers in total), compute the posterior mean.
Attempt before looking at the solution.
a. Moment-matching for \(\text{Gamma}(s, r)\) uses mean \(= s/r\) and variance \(= s/r^2\), so \(r = \text{mean}/\text{variance} = 30/60 = 0.5\) and \(s = \text{mean} \times r = 15\). The prior is \(\text{Gamma}(15, 0.5)\).
b. The Gamma-Poisson update is \(\text{Gamma}(s + \sum y_i,\ r + n)\) with \(\sum y_i = 858\) and \(n = 284\) games:
s <- 15; r <- 0.5
sp <- s + 858; rp <- r + 284
cat("Posterior: Gamma(", sp, ",", rp, "), mean =", round(sp/rp, 4), "\n")Posterior: Gamma( 873 , 284.5 ), mean = 3.0685
The posterior mean is about 3.07.
(Lecture 5 / Ch. 8)
A call centre evaluates its complaint rate \(\lambda\) (average complaints per hour) using a \(\text{Gamma}(2, 1)\) prior (shape-rate). After observing \(y = 12\) complaints in \(t = 4\) hours, they update to the posterior \(\text{Gamma}(14, 5)\).
a. [1] The centre aims to keep the rate below 3 per hour. Compute \(P(\lambda < 3 \mid y)\).
b. [1] Compute the prior odds and posterior odds for \(H_0: \lambda \geq 3\) versus \(H_1: \lambda < 3\).
c. [1] Use part (b) to compute \(BF_{10}\).
d. [1.5] (Interpret, max 70 words.) What does this Bayes factor say about the strength of evidence in the data?
e. [1] Test the point hypothesis \(H_0: \lambda = 3\) versus \(H_1: \lambda \neq 3\) with the Savage-Dickey density ratio. Compute \(BF_{01}\).
Attempt before looking at the solution.
a.
pgamma(3, shape = 14, rate = 5)[1] 0.6367822
\(P(\lambda < 3 \mid y) \approx 0.637\).
b.
prior_p <- pgamma(3, 2, 1) # P(lambda < 3) under the prior
post_p <- pgamma(3, 14, 5) # P(lambda < 3 | y)
cat("Prior odds (H1/H0):", round(prior_p/(1 - prior_p), 3), "\n")Prior odds (H1/H0): 4.021
cat("Posterior odds (H1/H0):", round(post_p/(1 - post_p), 3), "\n")Posterior odds (H1/H0): 1.753
Prior odds of \(H_1\) over \(H_0\) are about 4.02; posterior odds about 1.75. (Equivalently, \(H_0\) over \(H_1\): 0.25 and 0.57.)
c.
BF10 <- (post_p/(1 - post_p)) / (prior_p/(1 - prior_p))
cat("BF10 =", round(BF10, 3), "\n")BF10 = 0.436
d. Model answer (≤70 words). \(BF_{10} \approx 0.44\) is below 1, so the data favour \(H_0\) (\(\lambda \geq 3\)) over \(H_1\) (\(\lambda < 3\)): the observed data are about \(1/0.44 \approx 2.3\) times more likely under \(H_0\) than under \(H_1\). The observed rate of 3 per hour pulled belief toward higher rates, so this is only weak evidence, and it points away from the “below 3” hypothesis.
Marking: states that the evidence favours \(H_0\); translates 0.44 into “about 2.3 times more likely under \(H_0\)”; uses probabilistic language rather than “reject” or “significant”.
e.
cat("BF01 (Savage-Dickey) =", round(dgamma(3, 14, 5) / dgamma(3, 2, 1), 3), "\n")BF01 (Savage-Dickey) = 3.201
\(BF_{01} \approx 3.20\): the posterior density at \(\lambda = 3\) is about three times the prior density there, so the data are about three times more consistent with exactly \(\lambda = 3\) than the diffuse alternative expected. Moderate evidence for the point value 3.
The exam covers all lecture material. Mock Exam 1 focuses on Lectures 3–5 and Mock Exam 2 (last year’s paper) spans Lectures 1–5; work through every box below regardless.
Lecture 1: Introduction & Bayes’ Rule (Ch. 1–2)
Lecture 2: The Beta-Binomial Model & Sequential Testing (Ch. 3–4)
plot_beta(), plot_beta_binomial(), summarize_beta_binomial()Lecture 3: Conjugate Models & Choosing Your Prior (Ch. 5)
Lecture 4: Grid Approximation & MCMC (Ch. 6–7)
Lecture 5: Inference, Prediction & Hypothesis Testing (Ch. 8)
Conjugate updates
\(\theta \mid y \sim \text{Beta}(\alpha+y,\;\beta+n-y)\)
\(\lambda \mid \mathbf{y} \sim \text{Gamma}(s+\sum y_i,\;r+n)\)
\(\mu_\text{post} = w_\text{pr}\,\mu_0 + w_\text{d}\,\bar{y}\)
Testing
\(\text{Post odds} = BF_{10} \times \text{Prior odds}\)
\(BF_{10} = f(\theta_0)\,/\,f(\theta_0 \mid y)\) (Savage–Dickey)
\(BF_{+0} = \dfrac{P(\theta > \theta_0 \mid y)}{P(\theta > \theta_0)}\)
Credible interval: direct probability statement, \(P(\theta \in [L,U] \mid y) = 0.95\).
ETI: qbeta(c(0.025, 0.975), a, b)
Open Q&A: any topic from the past three weeks. Exam is Friday, June 19, 09:00–11:00, IWO 4.04C (Blauw). Good luck!