Descriptive Statistics using SAS
Advanced PROC UNIVARIATE
See
www.stattutorials.com/SASDATA
for files mentioned in this tutorial © TexaSoft, 2006
These SAS statistics tutorials briefly explain the use and
interpretation of standard statistical analysis techniques for Medical,
Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples
include how-to instructions for SAS Software.
Evaluating
more than one category of a variable
Suppose you have
several groups that you are comparing and you want to examine the distribution
of the variable by group. The following example provides examples of how you
could create histograms by RACE_CATEGORY using PROC UNIVARIATE. (PROCUNI2.SAS)
PROC
UNIVARIATE
DATA=SASDATA2.SBPDATA
NOPRINT;
CLASS
RACE_CATEGORY;
VAR
SBP;
HISTOGRAM
/NORMAL
(COLOR=RED
W=5)
NROWS=3;
RUN;
In this example
data is from a trauma data set (SBPDATA extracted from the National Trauma
Data set, 2004). The new statements used in this example include:
-
NOPRINT – since we’re only
interested in producing the graph, this option suppress other output
-
CLASS RACE_CATEGORY -- This
statement indicates that the data is to be examined for each category
(classification) of the RACE_CATEGORY variable.
-
ROWS=3 -- Since we know that
there are three categories (BLACK, WHITE and OTHER), we add the option
“NROWS=3” to the HISTOGRAM statement to indicate how many graphs to put on a
singe page.
The following
plot is created:

Notice that the
three histograms are for the three values of RACE_CATEGORY which are
BLACK,”“OTHER,” and “WHITE.” This graph is helpful in comparing the
distribution of data in two or more groups. In this case, there is visual
agreement that SBP is similarly distributed for all races.
Graph by two
factors
Suppose you have
two grouping variables and you want to produce a series of histograms to
compare distributions.
The following
program (PROCUNI3.SAS) produces a series of histograms by GENDER and
WOUND type. Since this is a more detailed program the parts are annotated and
described below:
uPROC
FORMAT;
VALUE
FMTWOUND
0="NONPENETRATE"
1="PENETRATE";
RUN;
vTITLE
'HISTOGRAMS of SBP by GENDER and WOUND TYPE';
w
PROC
UNIVARIATE
DATA=SASDATA2.SBPDATA
NOPRINT;
CLASS
WOUND GENDER;
VAR
SBP;
xHISTOGRAM
/
NROWS=2
NCOLS=2
CFILL=BLUE
PFILL=M3N45;
yINSET
N='N:'
(4.0)
MIN='MIN:'
(4.1)
MAX='MAX:'
(4.1)
/
NOFRAME
POSITION=NE
HEIGHT=2;
FORMAT
WOUND
FMTWOUND.;
RUN;
u PROC FORMAT – this
procedure creates a format for the WOUND variable to describe the coded 0,1
variables. Using this format allows you to display the groups in the graph by
clearer category names (PENETRATE and NONPENETRATE) than by the cryptic 0 and
1. (See Chapter 3 for more information on PROC FORMAT.)
v
TITLE statement – this places a title at the top of the graph.
If you use other title statements such as TITLE2, the subsequent titles will
be smaller by default than the first title (unless you change that in code.)
(See chapter 3 for more information on titles.)
w
CLASS statement – In this example there are two grouping
variables indicated in the CLASS statement.
CLASS
WOUND GENDER;
x
HISTOGRAM STATEMENT --
The options within the HISTOGRAM statement define how the graph will
appear. The columns and rows: The statements
NROWS=2
NCOLS=2
produce 2
histograms per row (for WOUND – first item in the CLASS statement) and 2
histograms for per COL (for GENDER or 2nd item in the CLASS
statement)
The histogram bar
colors are specified by the CFILL (color fill) statement:
CFILL=BLUE
In this case, the
bars will be blue. Some of the colors available in SAS (there are thousands to
choose from) include
BLACK WHITE RED GREEN BLUE PURPLE
VIOLET ORANGE YELLOW PINK CYAN MAGENTA
BROWN GOLD LIME GRAY LILAC MAROON
SALMON TAN ROSE CREAM
The default color
is black.
The pattern for
the bars is specified by the PFILL (Pattern fill) statement
PFILL=M3N45
You can select
from a number of available patterns. The default pattern is solid. Here are
some of the other patterns you can select:

y
INSET option – this defines an inset or key to the graph. This
example illustrates several of the options:
INSET
N='N:'
(4.0)
MIN='MIN:'
(4.1)
MAX='MAX:'
(4.1)
/
NOFRAME
POSITION=NE
HEIGHT=2;
The statement
N='N:'
(4.0)
MIN='MIN:'
(4.1)
MAX='MAX:'
(4.1)
defines which
statistics will be included in the inset. In this case N (the sample size)
will be designated with “N:” and will be displayed using the SAS output format
4.0. The MIN and MAX are similarly defined.
The remaining
options
/
NOFRAME
POSITION=NE
HEIGHT=2;
specify that
there be
-
no frame around the inset
-
that its position will be in the
NE = North-East corner of the graph
-
and that the height of the
characters will be set at 2 units.
When this SAS
code is run, it produces the following graphs:

Exercise: Experiment with the colors, patterns and
inset to see how they effect the graph.
- Make the histogram color Green
- Add the option MEAN='MEAN:' (4.1) to the inset option.
- Add the
NORMAL
(COLOR=BROWN
W=3)statement
to superimpose a normal plot
- How does this change the plot?
Exercise: Using the SBPDATA create the following
histograms:
- Create a matrix of histograms with RACE_CATEGORY (3
categories) using the pattern M3XO and CFILL=RED.
- Place the key on the upper left corner (NW).
- Add
MEAN='MEAN:' (4.1) to the list of statistics reported.
- Put your name in a TITLE2 statement.
- Redo the plot using a solid blue bars.
-
Capture the output using ODS PDF and print the results.
The
resulting graphs should look like this:

