SAS Tutorials
Home

also

New
Quick Reference Guide for SAS

BeSmartNotes (tm)

SAS Reference notes for Statsitical Analysis

Click for more info

Order

 

For quick and simple statistical analysis use WINKS SDAWINKS Statistical Data Analysis software
Click for more info

 

 

 

One-Way ANOVA using SAS

PROC ANOVA & PROC GLM

See www.stattutorials.com/SASDATA for files mentioned in this tutorial

 

These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software.

 

 

Using PROC ANOVA – One-Way Analysis

 

A one-way analysis of variance is an extension of the independent group t‑test where there are more than two groups.

 

Assumptions: It is assumed that subjects are randomly assigned to one of 3 or more groups and that the data within each group are normally distributed with equal variances across groups. Sample sizes between groups do not have to be equal, but large differences in sample sizes for the groups may affect the outcome of some multiple comparisons tests.

 

Test: The hypotheses for the comparison of independent groups are: (k is the number of groups)

 

Ho: m1 = m2  ...  = mk   (means of the all groups are equal)

Ha: mi ¹ mj                    (means of the two or more groups are not equal)

 

The test statistic reported is an F test with k‑1 and N‑k degrees of freedom, where N is the number of subjects. A low p‑value for the F-test is evidence to reject the null hypothesis. In other words, there is evidence that at least one pair of means are not equal. For example, suppose you are interested in comparing WEIGHT (gain) across the 4 levels of a GROUP variable, to determine if weight gain of individuals across groups is significantly different.

 

The following SAS code can perform the test:

 

PROC ANOVA DATA=ANOVA;

CLASS GROUP;

MODEL WEIGHT=GROUP;

TITLE 'Compare WEIGHT across GROUPS';

RUN;

 

GROUP is the "CLASS" or grouping variable (containing four levels), and WEIGHT is the continuous variable, whose means across groups are to be compared. The MODEL statement can be thought of as

 

DEPENDENT VARIABLE = INDEPENDENT VARIABLE(S);

 

where the DEPENDENT variable is the "response" variable, or one you measured, and the independent variable(s) is the observed data. The model statement generally indicated that given the information on the right side of the equal sign you can predict something about the value of the information on the left side of the equal sign. (Under the null hypothesis there is no relationship.)

 

Since the rejection of the null hypothesis does not specifically tell you which means are different, a multiple comparison test is often performed following a significant finding in the One‑Way ANOVA. To request multiple comparisons in PROC ANOVA, include a MEANS statement with a multiple comparison option. The syntax for this statement is

 

MEANS SOCIO /testname;

 

where testname is a multiple comparison test. Some of the tests available in SAS include:

 

BON               - Performs Bonferroni t-tests of differences

DUNCAN            - Duncan’s multiple range test

SCHEFFE           - Scheffe multiple comparison procedure

SNK               - Student Newman Keuls multiple range test

LSD               - Fisher’s Least Significant Difference test

TUKEY             - Tukey’s studentized range test

DUNNETT (‘x’)     - Dunnett’s test – compare to a single control

 

You may also specify

 

ALPHA = p   - selects level of significance for comparisons    (default is 0.05)

 

For example, to select the TUKEY test, you would use the statement

 

MEANS GROUP /TUKEY;

 

Graphical comparison: A graphical comparison allows you to visually see the distribution of the groups. If the p‑value is low, chances are there will be little overlap between the two or more groups. If the p‑value is not low, there will be a fair amount of overlap between all of the groups. A simple graph for this analysis can be created using the PROC PLOT or PROC GPLOT procedure. For example:

 

PROC GPLOT; PLOT GROUP*WEIGHT;

 

will produce a plot showing WEIGHT by group.

 

Thus, the code for the complete analysis becomes:

 

PROC ANOVA;

CLASS GROUP;

MODEL WEIGHT=GROUP;

MEANS GROUP /TUKEY;

TITLE 'Compare WEIGHT across GROUPS';

PROC GPLOT; PLOT GROUP*WEIGHT;

      RUN;

 

Following is a SAS job that performs a one-way ANOVA and produces a plot.


 

One-Way ANOVA Example

 

Suppose you are comparing the time to relief of three headache medicines -- brands 1, 2, and 3. The time to relief data is reported in minutes. For this experiment, 15 subjects were randomly placed on one of the three medicines. Which medicine (if any) is the most effective? The data for this example are as follows:

 

Brand 1     Brand 2    Brand 3

24.5        28.4        26.1

23.5        34.2        28.3

26.4        29.5        24.3

27.1        32.2        26.2

29.9        30.1        27.8

 

Notice that SAS expects the data to be entered as two variables, a group and an observation.

 

Here is the SAS code to analyze these data. (AANOVA EXAMPLE2.SAS)

 

DATA ACHE;

INPUT BRAND RELIEF;

CARDS;

1 24.5

1 23.5

1 26.4

1 27.1

1 29.9

2 28.4

2 34.2

2 29.5

2 32.2

2 30.1

3 26.1

3 28.3

3 24.3

3 26.2

3 27.8

;

ODS RTF;ODS LISTING CLOSE;

PROC ANOVA DATA=ACHE;

    CLASS BRAND;

    MODEL RELIEF=BRAND;

    MEANS BRAND/TUKEY CLDIFF;

TITLE 'COMPARE RELIEF ACROSS MEDICINES  - ANOVA EXAMPLE';

PROC GPLOT;

       PLOT RELIEF*BRAND;

PROC BOXPLOT;

    PLOT RELIEF*BRAND;

       TITLE 'ANOVA RESULTS';

RUN;

QUIT;

ODS RTF close;

ODS LISTING;

 

Following is the (partial) output for the headache relief study: 

 

ANOVA Procedureu

Dependent Variable: Relief

 

 

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

2

66.7720000

33.3860000

7.14

0.0091

Error

12

56.1280000

4.6773333

 

 

Corrected Total

14

122.9000000

 

 

 

 

 

 

R-Square

Coeff Var

Root MSE

RELIEF Mean

0.543303

7.751664

2.162714

27.90000

 

 

 

Source

DF

Anova SS

Mean Square

F Value

Pr > F

BRAND

2

66.77200000

33.38600000

7.14

0.0091

 

 

 

uThe initial table in this listing is the Analysis of Variance Table. The most important line to observe in this table is the “Model.” At the right of this line is the p-value for the overall ANOVA test. It is listed as “Pr > F” and is p = 0.0091. This tests the overall model to determine if there is a difference in means between BRANDS. In this case, since the p-value is small, you can conclude that there is evidence that there is a statistically significant difference in brands.

 

v Now that you know that there are differences in BRAND, you need to determine where the differences lie. In this case, that comparison is performed by the Tukey Studentized Range comparison (at the alpha = 0.05 level). See the tables below.

 

The Tukey Grouping table displays those differences. Notice the grouping labels “A” and “B” in this table.  There is only one mean associated with the “A” group, and that is brand 2. This indicates that the mean for brand 2 is significantly larger than the means of all other groups. There are two means associated with the “B” group – brands 1 and 3.  Since these two means are grouped, it tells you that they were not found to be significantly different.

 

Tukey's Studentized Range (HSD) Test for RELIEFv

 

Alpha

0.05

Error Degrees of Freedom

12

Error Mean Square

4.677333

Critical Value of Studentized Range

3.77278

Minimum Significant Difference

3.649

 

 

 

Means with the same letter are not significantly different.

Tukey Grouping

Mean

N

BRAND

A

30.880

5

2

 

 

 

 

B

26.540

5

3

B

 

 

 

B

26.280

5

1

 

                                    

 

Thus, the Tukey comparison concludes that the mean for brand 2 is significantly higher than the means of brands 1 and 3, and that there is no significant difference between brands 1 and 3. Another way to express the differences is to use the CLDIFF option with TUKEY (same results, difference presentation). For example

 

MEANS BRAND/TUKEY CLDIFF;

 

Using this option produces this versions of a comparison table:

Comparisons significant at the 0.05 level are indicated by ***.

BRAND
Comparison

Difference
Between
Means

Simultaneous 95% Confidence Limits

 

2 - 3

4.340

0.691

7.989

***

2 - 1

4.600

0.951

8.249

***

3 - 2

-4.340

-7.989

-0.691

***

3 - 1

0.260

-3.389

3.909

 

1 - 2

-4.600

-8.249

-0.951

***

1 - 3

-0.260

-3.909

3.389

 

Visual Comparisons: Two graphs of BRAND by RELIEF shows you the distribution of relief across brands, which visually confirms the ANOVA results. The first is a “dot” plot given by the PROC GPLOT command and shows each data point by group. The second plot is a box and whiskers plot created with PROC BOXPLOT. Note than Brand 2 relief results tend to be longer (higher values) than the levels for brands 1 and 3.

SAS Statistics Dot plot

SAS Statistics Box Plot

 

Hands-on exercise:

Modify the PROC ANOVA  program to perform Scheffe, LSD and Dunnett’s test using the following code and compare results.

 

      MEANS BRAND/SCHEFFE;

      MEANS BRAND/LSD;

      MEANS BRAND/DUNNETT ('1');

 

 

One-Way ANOVA using GLM

 

PROC GLM will produce essentially the same results as PROC ANOVA with the addition of a few more options. For example, your can include an OUTPUT statement and output residuals that can then be examined. (PROCGLM1.SAS)

 

ODS RTF; ODS GRAPHICS ON;

PROC GLM DATA=ACHE;

    CLASS BRAND;

    MODEL RELIEF=BRAND;

    MEANS BRAND/TUKEY CLDIFF;

    OUTPUT OUT=FITDATA P=YHAT R=RESID;

* Now plot the residuals;

 PROC GPLOT;

   plot resid*BRAND;

   plot resid*yhat;

run;

ODS RTF CLOSE;

ODS GRAPHICS OFF;

 

Notice also the statements ODS GRAPHICS ON and ODS GRAPHIS OFF. This produces better looking plots than we were able to get using PROC GPLOT in conjunction with PROC ANOVA. This produces the more detailed box and whiskers plot as show here:

 

SAS Statistics Box Plot

 

 

However, there are still a couple of other plots that might be of interest. These are requested using the code

 

PROC GPLOT;

   plot resid*BRAND;

   plot resid*yhat;

run;

 

 

The resulting plots (below) are an analysis of the residuals. The first plot residuals by brand. Typically, you want the residuals to be randomly scattered by group (which looks okay in this plot)

 

SAS Statistics Residual Plot

 

 

 

The second plot looks at residual by YHAT (the estimated RELIEF).  You can see three estimates – related to the three brands. For each estimate the residuals are randomly distributed.

 

SAS Statistics Residual Plots

End of tutorial

See http://www.stattutorials.com/SAS

 

 

 

Get the SAS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

© Copyright TexaSoft, 1996-2007

This site is not affiliated with SAS(r) or SAS Institute