WINKS SDAStatistical Software
Affordable. Reliable. Relevant.
Free Trial
www.texasoft.com
 


Masters in Applied Statistics
Degree in Dallas


BeSmartNotes

BeSmartNotes
Handy SAS Reference Sheets

 

 

SAS Tutorial Menu

Main Tutorial Menu

Numbers

Inter-Rater Reliability/KAPPA
using SAS PROC FREQ

See www.stattutorials.com/SASDATA for files mentioned in this tutorial, © TexaSoft, 2007-10

These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software.

Inter-Rater Reliability/KAPPA

Cohen’s Kappa coefficient is a method for assessing the degree of agreement between two raters. The weighted Kappa method is designed to give partial, although not full credit to raters to get “near” the right answer, so it should be used only when the degree of agreement can be quantified.

For example, using an example from Fleiss (1981, p 213), suppose you have 100 subjects rated by two raters on a psychological scale that consists of three categories. The data are given below:

    

RATER A

 
    Psyc. Neuro. Organic  
   

1

2

3

 
Rater Psych 1

75

1

4

80

B Neuro 2

5

4

1

10

  Organic 3

0

0

10

10

   

80

5

15

100

To perform this analysis in SAS open the file PROCFREQ-KAPPA.SAS as shown here

DATA;

   DO RATER1 = 1 TO 3;

       DO RATER2 = 1 TO 3;

          INPUT WT @@;

          OUTPUT;

       END;

   END;

DATALINES;

75 1 4

5 4 1

0 0 10

;

ODS RTF;

PROC FREQ;

   WEIGHT WT;

   TABLE  RATER1*RATER2 / AGREE ; TEST WTKAP;

   TITLE 'KAPPA EXAMPLE FROM FLEISS';

RUN;

ODS RTF CLOSE;

This data statement creates a data set to create the 3x3 table shown above. The analysis is performed using PROC FREQ.

To get the KAPPA statistics use the ‘/AGREE>” option. This produces the results for a standard KAPPA analysis. The weighted KAPPA analysis is requested using the “TEST WTKAP” option.

 From the code above the following output is created.

1. The (Bowker’s) Test of Symmetry tests the hypothesis that that pij = pji (marginal homogeneity). If r=c=2 then this is the same as McNemar’s test. If this test is non-significant, it indicates that the two raters have the same propensity to select categories. If it significant if means that the raters are selecting the categories in differing proportions.

Test of Symmetry

Statistic (S)

7.6667

DF

3

Pr > S

0.0534

2. The simple Kappa Coefficient measures the level of agreement between two raters. When Kappa is large (most would say .7 or higher) it indicates a strong level of agreement.

Simple Kappa Coefficient

Kappa

0.6765

ASE

0.0877

95% Lower Conf Limit

0.5046

95% Upper Conf Limit

0.8484

3. The weighted Kappa method is designed to give partial, although not full credit to raters to get “near” the right answer, so it should be used only when the degree of agreement can be quantified – that is, the categories must be ordinal.

Weighted Kappa Coefficient

Weighted Kappa

0.7222

ASE

0.0843

95% Lower Conf Limit

0.5570

95% Upper Conf Limit

0.8874

 

Test of H0: Weighted Kappa = 0

ASE under H0

0.0879

Z

8.2201

One-sided Pr >  Z

<.0001

Two-sided Pr > |Z|

<.0001

The Kappa and Weighted Kappa results are displayed, along with 95% confidence limits. Kappa generally ranges in value from 0 to 1 with a value of 1 meaning perfect agreement. (Negative values are possible.) The higher the value of Kappa, the better the strength of agreement.

 

End of tutorial