Inter-Rater Reliability/KAPPA
using SAS PROC FREQ
See www.stattutorials.com/SASDATA for files mentioned in this tutorial, © TexaSoft, 2007-10
These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software.
Inter-Rater Reliability/KAPPA
Cohen’s Kappa coefficient is a method for assessing the degree of agreement between two raters. The weighted Kappa method is designed to give partial, although not full credit to raters to get “near” the right answer, so it should be used only when the degree of agreement can be quantified.For example, using an example from Fleiss (1981, p 213), suppose you have 100 subjects rated by two raters on a psychological scale that consists of three categories. The data are given below:
RATER A
Psyc. Neuro. Organic 1
2
3
Rater Psych 1 75
1
4
80
B Neuro 2 5
4
1
10
Organic 3 0
0
10
10
80
5
15
100
To perform this analysis in SAS open the file PROCFREQ-KAPPA.SAS as shown here
DATA;
DO RATER1 = 1 TO 3;
DO RATER2 = 1 TO 3;
INPUT WT @@;
OUTPUT;
END;
END;
DATALINES;
75 1 4
5 4 1
0 0 10
;
ODS RTF;
PROC FREQ;
WEIGHT WT;
TABLE RATER1*RATER2 / AGREE ; TEST WTKAP;
TITLE 'KAPPA EXAMPLE FROM FLEISS';
RUN;
ODS RTF CLOSE;
This data statement creates a data set to create the 3x3 table shown above. The analysis is performed using PROC FREQ.
To get the KAPPA statistics use the ‘/AGREE>” option. This produces the results for a standard KAPPA analysis. The weighted KAPPA analysis is requested using the “TEST WTKAP” option.
From the code above the following output is created.
1. The (Bowker’s) Test of Symmetry tests the hypothesis that that pij = pji (marginal homogeneity). If r=c=2 then this is the same as McNemar’s test. If this test is non-significant, it indicates that the two raters have the same propensity to select categories. If it significant if means that the raters are selecting the categories in differing proportions.
Test of Symmetry
Statistic (S)
7.6667
DF
3
Pr > S
0.0534
2. The simple Kappa Coefficient measures the level of agreement between two raters. When Kappa is large (most would say .7 or higher) it indicates a strong level of agreement.
Simple Kappa Coefficient
Kappa
0.6765
ASE
0.0877
95% Lower Conf Limit
0.5046
95% Upper Conf Limit
0.8484
3. The weighted Kappa method is designed to give partial, although not full credit to raters to get “near” the right answer, so it should be used only when the degree of agreement can be quantified – that is, the categories must be ordinal.
Weighted Kappa Coefficient
Weighted Kappa
0.7222
ASE
0.0843
95% Lower Conf Limit
0.5570
95% Upper Conf Limit
0.8874
Test of H0: Weighted Kappa = 0
ASE under H0
0.0879
Z
8.2201
One-sided Pr > Z
<.0001
Two-sided Pr > |Z|
<.0001
The Kappa and Weighted Kappa results are displayed, along with 95% confidence limits. Kappa generally ranges in value from 0 to 1 with a value of 1 meaning perfect agreement. (Negative values are possible.) The higher the value of Kappa, the better the strength of agreement.
End of tutorial


