Interrater reliability (Kappa)
Using SPSS
See
www.stattutorials.com/SPSSDATA
for files mentioned in this tutorial © TexaSoft, 2008
These SPSS statistics tutorials briefly explain the use and
interpretation of standard statistical analysis techniques for Medical,
Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples
include howto instructions for SPSS Software.
Interrater reliability (Kappa)
Interrater reliability is a measure used to
examine the agreement between two people (raters/observers) on the assignment
of categories of a categorical variable. It is an important measure in
determining how well an implementation of some coding or measurement system
works.
A statistical measure of interrater reliability is
Cohen’s Kappa which ranges generally from 0 to 1.0 (although negative numbers are possible) where large numbers mean better
reliability, values near or less than zero suggest that agreement is attributable to
chance alone.
Example
Interrater reliability analysis
Using an example
from Fleiss (1981, p 213), suppose you have 100 subjects whose diagnosis is
rated by two raters on a scale that rates the subject’s disorder as being
either psychological, neurological, or organic. The data are given below: (KAPPA.SAV)


RATER A 


Psychological 
Neurological 
Organic 
RATER
B 
Psychological 
75 
1 
4 
Neurological 
5 
4 
1 
Organic 
0 
0 
10 
The data set KAPPA.SAV contains variables, Rater_A,
Rater_B and Count. The figure below shows the data file in count
(summarized) form.
To analyze this data follow these steps:
1.
Open the file KAPPA.SAV. Before performing the analysis on this
summarized data, you must tell SPSS that the Count variable is a
“weighted” variable. Select Data/Weight Cases...and select the “weight
cases by” option with Count as the Frequency variable
2.
Select Analyze/Descriptive Statistics/Crosstabs.
3.
Select Rater A as Row, Rater B as Col.
4.
Click on the Statistics button, select Kappa and Continue.
5.
Click OK to display the results for the Kappa test shown here:
The results of the interrater analysis are
Kappa = 0.676 with p < 0.001. This measure of agreement, while statistically
significant, is only marginally convincing. As a rule of thumb values
of Kappa from 0.40 to 0.59 are considered moderate, 0.60 to 0.79
substantial, and 0.80 outstanding (Landis & Koch, 1977). Most statisticians
prefer for Kappa values to be at least 0.6 and most often higher than 0.7
before claiming a good level of agreement. Although not displayed in the
output, you can find a 95 % confidence interval using the generic formula for
95% confidence intervals:
Estimate ± 1.96SE
Using this formula and the
results in the table an approximate 95% confidence interval on Kappa is
(0.504, 0.848). Some statisticians prefer the use of a weighted Kappa,
particularly if the categories are ordered. The weighted Kappa allows “close”
ratings to not simply be counted as “misses.” However, SPSS does not
calculate weighted Kappas.
A more complete list of how Kappa might be interpreted (Landis & Koch, 1977) is given in the following table
Kappa 
Interpretation 
< 0

Poor agreement

0.0 – 0.20

Slight agreement

0.21 – 0.40

Fair agreement

0.41 – 0.60

Moderate agreement

0.61 – 0.80

Substantial agreement

0.81 – 1.00

Almost perfect agreement

Reporting the
results of an interrater reliability analysis
The following illustrate how
you might report this interrater analysis in a publication format.
Narrative for the methods section:
“An interrater reliability analysis using the Kappa statistic was performed
to determine consistency among raters.”
Narrative for the results section:
“The interrater reliability for the raters was found to be Kappa = 0.68 (p
<.0.001), 95% CI (0.504, 0.848). ”
Reference
Landis, J. R., Koch, G. G. (1977). The measurement of
observer agreement for categorical data. Biometrics 33:159174.
See
www.stattutorials.com/SPSSDATA
for files mentioned in this tutorial © TexaSoft, 2008