Testing Inter-Rater Reliability in Rubrics for Large Scale Undergraduate Independent Projects

Alan Chong, Lisa Romkey


This work outlines the process of testing
inter-rater reliability in rubrics for large scale
undergraduate independent projects; more specifically,
the thesis program within the Division of Engineering
Science at the University of Toronto, in which 200
students work with over 100 supervisors on an
independent research project. Over the last few years,
rubrics have been developed to both guide the students in
the creation of their thesis deliverables, and to improve
the consistency of supervisor assessment. To examine
inter-rater reliability, 12 final thesis reports were
assessed using the course rubric by the two generalist
experts, who have worked extensively with the thesis
course and designed the rubrics, alongside the project
supervisor. We found substantial agreement between the
two generalist experts, but only fair agreement between
the generalist experts and the supervisors, suggesting that
while the rubric does help towards developing a common
set of expectations, there may be other aspects of the
supervisor’s assessment practice that need to be

Full Text: