Hypothetical data from several branch banks in Southern California contain information on how many IRAs (Individual Retirement Accounts) were set up in 21 locations during a three month period. The variable is called IRA Setup. These data are counts and are appropriately classified as quantitative data since, for example, it makes sense to calculate a mean number of accounts per bank. Before calculating and reporting the mean or other parametric measures of these values you may want to assess the normality of the data. One way to do that is to perform a statistical test. This example illustrates how you can assess the normality of the IRA setup variable.
Click continue and OK. And the following output is displayed (abbreviated here).
Case Processing Summary |
|
Cases |
Valid |
Missing |
Total |
N |
Percent |
N |
Percent |
N |
Percent |
IRA Setup |
19 |
100.0% |
0 |
.0% |
19 |
100.0% |
The Kolmogorov-Smirnov and Shapiro-Wilk tests can be used to test the hypothesis that the distribution is normal. (SPSS recommends these tests only when your sample size is less than 50.) The hypotheses used in testing data normality are:
Ho: The distribution of the data is normal
Ha: The distribution of the data is not normal
If a test does not reject normality, this suggests that a parametric procedure that assumes normality (e.g. a t-test) can be safely used. However, we emphasize again that it is always a good idea to examine data graphically in addition to the formal tests for normality.
The plots in the output provide a visual description of the distribution of the data. They include
- Histogram: When a histogram’s shape approximates a bell-curve it suggests that the data may have come for a normal population.
- Boxplot: A boxplot that is symmetric with the median line in approximately the center of the box and with symmetric whiskers somewhat longer than the subsections of the center box suggests that the data may have come for a normal distribution.
- Q-Q plot: A quantile-quantile (q-q) plot is a graph used to display the degree to which the quantiles of a reference (known) distribution (in this case the normal distribution) differ from the sample quantiles of the data. When the data fit the reference distribution, then the points will lie in a tight random scatter around the reference line. For the IRA data, the curvature of the points in the plot indicates a possible departure from normality and the point lying outside the overall pattern of points indicates an outlier.
Stem-and-leaf plot: A stem and leaf plot is a method of displaying data that shows the data in a histogram-like pattern but retains information about actual data values. Each observation is broken down into a stem and a leaf where typically the stem of the number includes all but the last digit and the leaf is the last digit.
Here are two of those plots:


In both plots, there is a single value that appears to be considerably different. One term used to describe such as point is an “outlier.” This happens to be observation number 5 in the data set.
Step 3: To eliminate the outlying value (IRASetup >= 5), return to the data editor and select Data/Select Cases… and select the option “If condition is satisfied…” Click on the “If…” button. In the formula text box enter the expression
IRA Setup > = 5
Click Continue and OK. A slash appears in the IRA data file next to record 16 indicating that record will not be included in subsequent analyses (as shown here.) (This is not to imply that you can arbitrarily exclude data from an analysis.)

Step 4: To display the revised histogram select Graphs/Histogram and select IRA Setup >5 (FILTER) as the analysis variable. Select the “Display normal curve” checkbox and click OK. A histogram is displayed. Double click on the graph and from the Graph Editor select Element/ Show Distribution curve. This places a normal (beel shaped/Gaussian) curve on the graph. Exit the Graph editor. The following graph is displayed.

Note that the plot no longer has the “outlier.” Also, check the normality tests, and see that both are now non-significant, which implies acceptance of the hypothesis of normality.
Step 6: To remove the select cases criterion, return to the data editor and select Data/Select Cases… and select the option “All cases” and OK.
See
www.stattutorials.com/SPSSDATA for files mentioned in this tutorial © TexaSoft, 2008