Results

Repeated Measures ANOVA

To fit the model:

  1. Type a name (I typed Tutor) for the repeated measures factor in the box labelled Repeated Measures Factors
  2. Enter each level of the repeated measures factor (there are 4 tutor names in this case)
  3. Drag the relevant variable to the corresponding cell in Repeated Measures Cells
Within Subjects Effects
Cases Sphericity Correction Sum of Squares df Mean Square F p ω²ₚ
Tutor None 554.125 3.000 184.708 3.700 0.028 0.235
  Greenhouse-Geisser 554.125 1.673 331.245 3.700 0.063 0.235
  Huynh-Feldt 554.125 2.137 259.329 3.700 0.047 0.235
Residuals None 1048.375 21.000 49.923  
  Greenhouse-Geisser 1048.375 11.710 89.528  
  Huynh-Feldt 1048.375 14.957 70.091  
Note.  Type III Sum of Squares
ᵃ Mauchly's test of sphericity indicates that the assumption of sphericity is violated (p < .05).

You’ll find in your output that Mauchley’s test indicates a significant violation of sphericity, but I have argued in the book that you should ignore this test and routinely correct for sphericity anyway, so that’s what we’ll do. The table above tells us about the main effect of Tutor. If we look at the Greenhouse-Geisser corrected values, we would conclude that tutors did not significantly differ in the marks they award, F(1.67, 89.53) = 3.70, p = 0.063. If, however, we look at the Huynh-Feldt corrected values, we would conclude that tutors did significantly differ in the marks they award, F(2.14, 70.09) = 3.70, p = 0.047. Which to believe then? Well, this example illustrates just how silly it is to have a categorical threshold like p < 0.05 that lead to completely opposite conclusions. The best course of action here would be report both results openly, compute some effect sizes and focus more on the size of the effect than its p-value.


Write it up:

Using Greenhouse-Geisser corrected degrees of freedom, there was no significant difference in the marks awarded by different tutors to the essays, F(1.67, 11.71) = 3.70, p = 0.063. However, this lack of significance most likely reflects the small sample size because the effect of markers on the marks awarded was medium,   = 0.24.

Between Subjects Effects
Cases Sum of Squares df Mean Square F p
Residuals 103.375 7 14.768  
Note.  Type III Sum of Squares

Descriptives

Descriptives
Tutor N Mean SD SE Coefficient of variation
Field 8 68.875 5.643 1.995 0.082
Smith 8 64.250 4.713 1.666 0.073
Scrote 8 65.250 6.923 2.448 0.106
Death 8 57.375 7.909 2.796 0.138

Descriptives plots

Raincloud plots

Dependent

Assumption Checks

Test of Sphericity
  Mauchly's W Approx. Χ² df p-value Greenhouse-Geisser ε Huynh-Feldt ε Lower Bound ε
Tutor 0.131 11.628 5 0.043 0.558 0.712 0.333

Post Hoc Tests

Post Hoc Comparisons - Tutor
95% CI for Mean Difference 95% CI for Cohen's d
Mean Difference Lower Upper SE df t Cohen's d Lower Upper pholm
Field Smith 4.625 0.682 8.568 1.085 7 4.264 0.721 -0.211 1.653 0.022 *
  Scrote 3.625 -6.703 13.953 2.841 7 1.276 0.565 -1.136 2.267 0.485
  Death 11.500 -5.498 28.498 4.675 7 2.460 1.793 -1.379 4.965 0.217
Smith Scrote -1.000 -10.320 8.320 2.563 7 -0.390 -0.156 -1.617 1.305 0.708
  Death 6.875 -9.039 22.789 4.377 7 1.571 1.072 -1.619 3.763 0.481
Scrote Death 7.875 -7.572 23.322 4.249 7 1.854 1.228 -1.460 3.916 0.425
 * p < .05
Note.  P-value and confidence intervals adjusted for comparing a family of 6 estimates (confidence intervals corrected using the bonferroni method).

The table above shows the Holm post hoc tests, which we should ignore if we’re wedded to p-values. The only significant difference between group means is between Prof Field and Prof Smith. Looking at the means of these markers, we can see that I give significantly higher marks than Prof Smith. However, there is a rather anomalous result in that there is no significant difference between the marks given by Prof Death and myself, even though the mean difference between our marks is higher (11.5) than the mean difference between myself and Prof Smith (4.6). The reason is the sphericity in the data. The interested reader might like to run some correlations between the four tutors’ grades. You will find that there is a very high positive correlation between the marks given by Prof Smith and myself (indicating a low level of variability in our data). However, there is a very low correlation between the marks given by Prof Death and myself (indicating a high level of variability between our marks). It is this large variability between Prof Death and myself that has produced the non-significant result despite the average marks being very different (this observation is also evident from the standard errors).