Inter-reader Reliability in Analytically Assessing Online Writing Portfolios
In writing assessment, reliability is achieved by having two readers
score each portfolio on a scale from one to six; if the scores are not
adjacent, the portfolio is given to a third or fourth reader. The adjudicated
scores are then added together for a total range of two to twelve. Before
each reading, we have a calibration session in which we score sample portfolios
then compare our scores and discuss why we chose to score each paper as
we did. This brings us into close accord during the reading and helps
define our expectations from semester to semester.
During the reading, each instructor scores portfolios separately, without
knowledge of the other’s score. Instructors do not score portfolios
from their own students. We score ten separate (independent) predictor
variables and an overall (dependent ) outcome portfolio score. This process
gives numerical value to a complex and shifting goal – the quality
of human communication.
In order to ensure reliability, we used both Pearson’s correlation
and Cronbach’s Alpha (Table 1) to analyze the inter-reader agreement
each semester. The analysis in Table 1 presents the adjudicated scores
over three semesters. In the two- tailed Pearson’s correlation,
we assumed the null hypothesis unless the level of agreement reached
the .05 confidence level (95%)—a guard against Type 1 error. The
reliability increased steadily from semester to semester: we were becoming
more comfortable with the assessment process and more calibrated as
a group. An analysis of the unadjudicated scores shows a similar pattern.
These results show that our inter-reader reliability is good enough
to ensure that we could agree on and sustain standards for evaluation
of the online portfolios.
|