Education ∪ Math ∪ Technology

Error bars on grading

Educators make mistakes when grading. It happens. Sometimes we mark a student’s work lower than we should, compared to their peers, and sometimes we mark it higher than we should. The question is, what effect does this have on a student’s overall mark?

Here are some sample grades. The sample column is the original grade, the low column is a mark 1 lower than the sample, the high column is a mark 1 higher than the sample.

Grades Sample Low High
Quizzes 5 4 6
  6 5 7
  7 6 8
  5 4 6
  6 5 7
  7 6 8
  5 4 6
  6 5 7
  5 4 6
  6 5 7
average 5.8 4.8 6.8
 
Homework 5 3 5
  5 3 5
  3 1 5
  3 1 5
  1 1 3
  1 1 3
  3 1 5
  5 3 5
  3 1 3
average 3.222222 1.666667 4.333333
 
Tests 40 35 45
  45 40 50
  35 30 40
  40 35 45
  30 25 35
average 38 33 43
 
Overall Grade 70.1 55.9 82.5

 

The overall grade was calculated here by finding the averages of the three categories (quizzes, homework, and tests – standard categories in many classes) with quizzes worth 20%, homework worth 20%, and tests worth 60% of the overall grade. These aren’t particularly unusual grades. Note, however, how wide the possible error is in the final grade, which could potentially actually range from 55.9% to 82.5%, which is a 26.6%, or a HUGE amount in any grading system.

Of course, teachers aren’t likely to mark everything low, or everything high. One could make an assumption that both of these cases are equally likely, and then instead of using the likely minimum mark, and the likely maximum mark, we could try and aim for 2 standard deviations from the mean of the possible grading outcomes. In other words, what’s a likely range?

I created a script (warning: takes a while to run in some browsers) which randomly generates a sample of 10,000 overall grades, starting with the baseline above, and randomly adding errors in grading for each assignment, assuming that teachers were equally likely to assign a lower grade as a higher grade, and as getting the grade exactly correct (this assumption is probably false, but I had to start somewhere). For one sample of 10,000 grades, the minimum grade is 60.2, and the maximum grade is 77.5, suggesting that the distribution of grades isn’t symmetrical (teachers are more likely to assign a grade which is too low to students who are at above 50% overall, and too high for students who are at below 50% overall). The standard deviation of these scores is 2.32, which means that 95% of the time, the grade will fall between 64.6% and 73.9% (the mean of the data set was 69.2). This is a range of likely values of over 9%!

Note that this script doesn’t account for a host of other reasons that the grades for this individual student could be in error. It doesn’t account for lost assignments, misread names, addition errors, etc…

How many teachers know that there are error bars on the percentages they are expected to give to students? Maybe if we reported this student’s grades as 70.1% ± 4.6%, students and parents might recognize that grading is more subjective than they realize? Maybe we could stop the practice of assigning letter grades to students work based on strict boundaries?

I remember than in grade 12, I was assigned a grade of 84% overall in English 12, with an A being an 86% in my school. This meant that I missed out on a major award at university (it was my only B in grade 12) and that I had to write an entrance exam to get into my first year English course (I passed). I’ve obviously done fine despite this grade, but I remember it often, and it is a reminder to me of the often arbitrary nature of teacher assigned grades.