SAS Tutorials Home

 


 

New
Quick Reference Guide for SAS

BeSmartNotes (tm)

Click for more info

Order

 

For quick and simple statistical analysis use WINKS SDA
Click for more info

 

 

 

 

Correlation Analysis for SAS

PROC CORR

See www.stattutorials.com/SASDATA for files mentioned in this tutorial
© TexaSoft, 2007

These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software

  

Correlation Analysis using PROC CORR

 

The correlation coefficient allows researchers to determine if there is a possible linear relationship between two variables measured on the same subject (or entity). When these two variables are of a continuous nature (they are measurements such as weight, height, length, etc.) the measure of association most often used is Pearson’s correlation coefficient.

 

This association may be expressed as a number (the correlation coefficient) that ranges from –1 to +1. The population correlation is usually expressed as the Greek letter rho (r) and the sample statistic (correlation coefficient) is r.

 

The correlation measures how well a straight line fits through a scatter of points when plotted on an x – y axis.  If the correlation is positive, it means that when one variable increases, the other tends to increase. If the correlation is negative, it means that when one variable increases, the other tends to decrease. When a correlation coefficient is close to +1 (or –1), it means that there is a strong correlation – the points are scattered along a straight line. For example, a correlation r = 0.7 may be considered strong. However, the closer a correlation coefficient gets to 0, the weaker the relationship, where the cloud (scatter) of points is not close to a straight line. For example, a correlation r = 0.1 might be considered weak. For scientific purposes, a t-test is utilized to determine if the correlation coefficient is “strong” or “significant” or not. This will be discussed later.

 

Assumptions: Before using the Pearson correlation coefficient as a measure of association, you should be aware of its assumptions and limitations. As mentioned earlier, this correlation coefficient measures a linear relationship. That is, the relationship between the two variables measures how close the two measurements form a straight line when plotted on an x-y chart. Therefore, it is important that data be graphed before the correlation is interpreted. For example, it is possible that data, when plotted, may show a curved relationship instead of a straight line. When this is the case, a Pearson correlation may not be the best measure of association. There are other conditions when a correlation coefficient may appear important, but when considered in light of a graph, is not a good measure of relationship. In the following graphs, all of them have a correlation coefficient of about 0.72, yet most do not fit the assumption of a linear relationship. To avoid misinterpreting a correlation, always accompany the calculation with a graph.

 

 

Another assumption of correlation is that the both of the variables (the measurements) be of continuous data measured on an interval/ratio scale. Data that are not continuous, such as categorical (i.e. hair color) or binomial (i.e., gender) data would not be acceptable. Also, each variable should be approximately normally distributed.

 

The SAS procedure most often used to calculate correlations is PROC CORR. The syntax for this procedure is:

 

      PROC CORR <options>; <statements>;

 

The most commonly used option is

 

      DATA=datsetname;

 

The most commonly used information statements are:

 

      VAR variablelist;

      BY varlist

 

As an example, to find the correlations between variables in the SOMEDATA data set use the following program (PROCCORR1.SAS) (Also requires the file SOMEDATA.SAS7BDAT.)

 

*   ASSUMES YOU HAVE A SAS LIBRARY NAMED MYDATA

*   THAT INCLUDES THE FILE SOMEDATA.SAS7BDAT;

ods rtf;

PROC CORR data=mydata.somedata;

      VAR AGE TIME1-TIME2;

TITLE 'Example correlation calculations using PROC CORR';

run;

ods rtf close;

 

The (partial) output from this program is:

 

 

 

Pearson Correlation Coefficients, N = 50
Prob > |r| under H0: Rho=0

 

AGE

TIME1

TIME2

AGE
Age on Jan 1, 2000

1.00000

 

0.50088
0.0002

0.38082
0.0064

TIME1
Baseline

0.50088
0.0002

1.00000

 

0.76396
<.0001

TIME2
6 Months

0.38082
0.0064

0.76396
<.0001

1.00000

 

 

 

The output includes descriptive statistics on each variable and a table of Pearson Correlation Coefficients (r). For example, the correlation between AGE and TIME1 is 0.50088, or r=0.50088. The number under each correlation is a p-value. It tests to see if r is statistically significant. In statistical terminology, this is a test of the following hypotheses

 

H0: rho = 0 (the null hypothesis)

Ha: rho <> 0 (the alternative hypothesis)

 

If the p-value for the test is small (usually less than 0.05) then the conclusion is that rho is not 0, thus the relationship is statistically significant. A research will then have to make a professional judgment to determine if the association is significant in terms of the experiment performed.

 

Care must be taken when interpreting a statistically significant correlation. If your sample size is small or not representative of the population from which you sampled, you may not be able to generalize the correlation to your intended population. Also, a cause and effect relationship cannot be inferred except under special conditions when you have designed the study specifically to detect those phenomena.

 

Note – to have the program output both PEARSON and SPEARMAN (non-parametric) correlations, use the statement:

 

PROC CORR data=mydata.somedata PEARSON SPEARMAN;

 

To observe a scatterplot for each correlation, use this slight variation on the program (PROCCORR2.SAS). Notice the addition of the ODS GRAPHICS statements and PLOTS=MATRIX.

 

ODS RTF;

ODS GRAPHICS ON;

PROC CORR DATA=MYDATA.SOMEDATA PLOTS=MATRIX;

      VAR AGE TIME1-TIME2;

TITLE 'Example correlation calculations using PROC CORR';

RUN;

ODS RTF CLOSE;

ODS GRAPHIC OFF;

 

This produces the following matrix of scatterplots:

 

 

 

 

 

 

Note that in this plot the upper and lower half are identical – the plot is symmetric, so you really only have to look at half of it.

 

 

End of tutorial

See http://www.stattutorials.com/SAS

 

 

Get the SAS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

© Copyright TexaSoft, 1996-2007

This site is not affiliated with SAS(r) or SAS Institute