Descriptive Statistics using SAS
PROC MEANS
See www.stattutorials.com/SASDATA for files mentioned in this tutorial © TexaSoft, 2007-10
These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software.
Preliminary information about PROC MEANS
PROC MEANS produces descriptive statistics (means, standard deviation, minimum,
maximum, etc.) for numeric variables in a set of data. PROC MEANS can be used for
- Describing continuous data where the average has meaning
- Describing the means across groups
- Searching for possible outliers or incorrectly coded values
- Performing a single sample t-test
The syntax of the PROC MEANS statement is:
PROC MEANS <options>; <statements>;
Statistical options that may be requested are: (default statistics are underlined.)
|
(New to version 8.0)
|
Other commonly used options available in PROC MEANS include:
- DATA= Specify data set to use
- NOPRINT Do not print output
- MAXDEC=n Use n decimal places to print output
Commonly used statements with PROC MEANS include:
- BY variable list -- Statistics are reported for groups in separate tables
- CLASS variable list – Statistics reported by groups in a single table
- VAR variable list – specifies which numeric variables to use
- OUTPUT OUT = datasetname – statistics will be output to a SAS data file
- FREQ variable - specifies a variable that represents a count of observations
A few quick examples of PROC MEANS
* Simplest invocation – on all numeric variables *;
PROC MEANS;
*Specified statistics and variables *;
PROC MEANS N MEAN STD; VAR SODIUM CARBO;
* Subgroup descriptive statistics using by statement*;
PROC SORT; BY SEX;
PROC MEANS; BY SEX;
VAR FAT PROTEIN SODIUM;
* Subgroup descriptive statistics using class statement*;
PROC MEANS; CLASS SEX;
VAR FAT PROTEIN SODIUM;
Example 1: A simple use of PROC MEANS
This example calculates the means of several specified variables, limiting the output to two decimal places. (PROCMEANS1.SAS)
******************************************
* Data on weight, height, and age of a *
* random sample of 12 *
* nutritionally deficient children. *
******************************************;
DATA CHILDREN;
INPUT WEIGHT HEIGHT AGE;
DATALINES;
64 57 8
71 59 10
53 49 6
67 62 11
55 51 8
58 50 8
77 55 10
57 48 9
56 42 10
51 42 6
76 61 12
68 57 9
;
ODS RTF;
proc means;
Title 'Example 1a - PROC MEANS, simplest use';
run;
proc means maxdec=2;var WEIGHT HEIGHT;
Title 'Example 1b - PROC MEANS, limit decimals, specify variables'
run;
proc means maxdec=2 n mean stderr median;var WEIGHT HEIGHT
Title 'Example 1c – PROC MEANS, specify statistics to report'
run;
ODS RTF CLOSE;
Output for Example 1:
Example 1a - PROC MEANS, simplest use
N |
Mean |
Std Dev |
Minimum |
Maximum |
|
WEIGHT |
12 |
62.7500000 |
8.9861004 |
51.0000000 |
77.0000000 |
Example 1b - PROC MEANS, limit decimals, specify variables
Variable |
N |
Mean |
Std Dev |
Minimum |
Maximum |
WEIGHT |
12 |
62.75 |
8.99 |
51.00 |
77.00 |
Example 1c – PROC MEANS, specify statistics to report
Variable |
N |
Mean |
Std Error |
Median |
WEIGHT |
12 |
62.75 |
2.59 |
61.00 |
Example 2: Using PROC MEANS using “By Group” and Class statements
This example uses PROC MEANS to calculate means for an entire data set or by grouping variables. (PROCMEANS2.SAS)
***************************************************
* Example 2 for PROC MEANS *
***************************************************;
DATA FERTILIZER;
INPUT FEEDTYPE WEIGHTGAIN;
DATALINES;
1 46.20
1 55.60
1 53.30
1 44.80
1 55.40
1 56.00
1 48.90
2 51.30
2 52.40
2 54.60
2 52.20
2 64.30
2 55.00
;
ODS RTF;
PROC SORT DATA=FERTILIZER;BY FEEDTYPE;
PROC MEANS; VAR WEIGHTGAIN; BY FEEDTYPE;
TITLE 'Summary statistics by group';
RUN;
PROC MEANS; VAR WEIGHTGAIN; CLASS FEEDTYPE;
TITLE 'Summary statistics USING CLASS';
RUN;
ODS RTF CLOSE;
Output for this SAS code is:
Summary Statistics by Group
FEEDTYPE=1
Analysis Variable : WEIGHTGAIN |
||||
N |
Mean |
Std Dev |
Minimum |
Maximum |
7 |
51.4571429 |
4.7475808 |
44.8000000 |
56.0000000 |
FEEDTYPE=2
N |
Mean |
Std Dev |
Minimum |
Maximum |
6 |
54.9666667 |
4.7944412 |
51.3000000 |
64.3000000 |
In this first version of the output the BY statement (along with the PROC SORT) creates two tables, one for each value of the BY variable. In this next example, the CLASS statement produces a single table broken down by group (FEEDTYPE.)
Summary statistics USING CLASS
Analysis Variable : WEIGHTGAIN |
||||||
FEEDTYPE |
N Obs |
N |
Mean |
Std Dev |
Minimum |
Maximum |
1 |
7 |
7 |
51.4571429 |
4.7475808 |
44.8000000 |
56.0000000 |
2 |
6 |
6 |
54.9666667 |
4.7944412 |
51.3000000 |
64.3000000 |
Hands on Exercise:
1. Modify the above program to output the following statistics
N MEAN MEDIAN MIN MAX
2. Use MAXDEC=2 to limit number of decimals in output
EXAMPLE 3: Using PROC MEANS to find OUTLIERS
PROC MEANS is a quick way to find large or small values in your data set that may be considered outliers (see PROC UNIVARIATE also.) This example shows the results ofusing PROC means where the MINIMUM and MAXIMUM identify unusual values inthe data set. (PROCMEANS3.SAS)
DATA WEIGHT;
INPUT TREATMENT LOSS @@;
DATALINES;
2 1.0 1 3.0 1 -1.0 1 1.5 1 0.5 1 3.5 1 -99
2 4.5 3 6.0 2 3.5 2 7.5 2 7.0 2 6.0 2 5.5
1 1.5 3 -2.5 3 -0.5 3 1.0 3 .5 3 78 1 .6 2 3 2 4 3 9 1 7 2 2
;
ODS RTF;
PROC MEAN; VAR LOSS;
TITLE 'Find largest and smallest values';
RUN;
ODS RTF CLOSE;
Notice that in this output, PROC means indicates that there is a small value of -99 (could be a missing value code) and a large value of 78 (could be a miscoded number.) This is a quick way to find outliers in your data set.
Analysis Variable : LOSS |
||||
N |
Mean |
Std Dev |
Minimum |
Maximum |
26 |
2.0423077 |
25.4650062 |
-99.0000000 |
78.0000000 |
Also see PROC Univariate for detecting outliers.
EXAMPLE 4: Using PROC MEANS to perform a single sample t-test (or Paired t-test)
To compare two paired groups (such as in a before-after situation) where both observations are taken from the same or matched subjects, you can perform a paired t-test using PROC MEANS. To do this convert the paired data into a difference variable and perform a single sample t-test. For example, suppose your data contained the variables WBEFORE and WAFTER, (before and after weight on a diet), for 8 subjects. To perform a paired t-test using PROC MEANS, follow these steps:
- Read in your data.
- Calculate the difference between the two observations (WLOSS is the amount of weight lost), and
- Report the mean loss, t-statistic and p-value using PROC MEANS.
The hypotheses for this test are:
Ho: μLoss = 0 (The average weight loss was 0)
Ha: μLoss ≠ 0 (The weight loss was different than 0)
For example, the following code performs a paired t-test for weight loss data:
(PROCMEANS4.SAS)
DATA WEIGHT;
INPUT WBEFORE WAFTER;
* Calculate WLOSS in the DATA step *;
WLOSS=WAFTER-WBEFORE;
DATALINES;
200 190
175 154
188 176
198 193
197 198
310 240
245 204
202 178
;
ODS RTF;
PROC MEANS N MEAN T PRT; VAR WLOSS;
TITLE 'Paired t-test example using PROC MEANS';
RUN;
ODS RTF CLOSE;
Notice that the actual test is performed on the new variable called WLOSS, and that is why it is the only variable requested in the PROC MEANS statement. This is essentially a one-sample t-test. The statistics of interest are the mean of WLOSS, the t-statistic associated with the null hypothesis for WLOSS and the p-value. The SAS output is as follows:
Paired t-test example using PROC MEANS
Analysis Variable : WLOSS |
|||
N |
Mean |
t Value |
Pr > |t| |
8 |
-22.7500000 |
-2.79 |
0.0270 |
The mean of the variable WLOSS is –22.75. The t-statistic associated with the null hypothesis is –2.79, and the p-value for this paired t-test is p = 0.027, which provides evidence to reject the null hypothesis.
Continue Tutorial... Using PROC MEANS to output statistics
End of tutorial
For more information... we recommend:
SAS Essentials: Mastering SAS for Research
- provides an introduction to SAS statistical software, the premiere statistical data analysis tool for scientific research. Through its straightforward approach, the text presents SAS with step-by-step examples. SAS Essentials introduces a step-by-step approach to mastering SAS software for statistical data analysis. It's also a valuable reference tool for any researcher currently using SAS. Designed for those new to SAS and filled with illustrative examples, the book shows how to read, write and import data; prepare data for analysis; use SAS procedures; evaluate quantitative data; analyze counts and crosstabulation tables; and compare means using the t-test. The book also provides instruction and examples on analysis of variance, correlation and regression, nonparametric analysis, logistic regression, creating graphs, controlling outputs using ODS, as well as advanced topics in SAS programming.ISBN: 0470461292. Order from publisher -Jossey-Bass/Wiley. Barnes & Noble. Amazon.
- WINKS -- a simple to use and affordable statistical software program that will help you analyze, interpret and write-up your results. Download a free trial copy.
- Against All Odds VIDEOS - Now in DVD format -- Teaching Videos from Annenberg/PBS --"This highly engaging primer on statistical methods and inference introduces the practical applications of statistics. Produced by the Consortium for Mathematics and Its Applications and Chedd-Angier." Click here for info
- BeSmartNotes Reference sheets for SAS, SAS ODS, SAS Functions, SPSS and WINKS - Click here for info. (www.besmartnotes.com)
- Statistical Analysis Quick Reference Guidebook: With SPSS Examples is a practical "cut to the chase" handbook that quickly explains the when, where, and how of statistical data analysis as it is used for real-world decision-making in a wide variety of disciplines. It contains examples using SPSS Statistics software. In this one-stop reference, the authors provide succinct guidelines for performing an analysis, avoiding pitfalls, interpreting results, and reporting outcomes. Paperback. Sage Publishers ISBN: 1412925606 Order book from Amazon

