SPSS Statistics Tutorials
 
 

 

SPSS Tutorials
Home

 

New
Quick Reference Guides for SPSS, SAS and WINKS

BeSmartNotes (tm)

Order

 

For quick and simple statistical analysis use WINKS SDA
Click for more info

 


 

Logistic Regression
Using SPSS

 


See www.stattutorials.com/SPSSDATA for files mentioned in this tutorial © TexaSoft, 2008

 

These SPSS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SPSS Software.

 

Logistic Regression in SPSS

            This example is adapted from information in Statistical Analysis Quick Reference Guidebook (2007).

 

    

A sales director for a chain of appliance stores wants to find out what circumstances encourage customers to purchase extended warranties after a major appliance purchase. The response variable is an indicator of whether or not a warranty is purchased. The predictor variables they want to consider are

  • Customer gender

  • Age of the customer

  • Whether a gift is offered with the warranty

  • Price of the appliance

  • Race of customer

There are several strategies you can take to develop the “best” model for the data. It is recommended that you examine several models before determining which one is best for your analysis.  (In this example we allow the computer to help specify important variables, but it is inadvisable to accept a computer designated model without examining alternatives.) Begin by examining the significance of each variable in a fully populated model.

1.   Open the data set named WARRANTY.SAV (downloadable from the data section) and choose Analyze/Regression/Binary Logistic.

2.  Select Bought as the dependent variable and Gender, Gift, Age, Price, and Race as the covariates (i.e. the independent or predictor) variables.

3.     Click on the Categorical checkbox (It is a button in SPSS version 16) and specify Race as a categorical variable.  Click Continue and then OK.  This produces the following SPSS output table. 

Variables in the Equation

B

S.E.

Wald

df

Sig.

Exp(B)

Step 1

Gender

-3.772

2.568

2.158

1

.142

.023

Price

.001

.000

3.363

1

.067

1.001

Age

.091

.056

2.638

1

.104

1.096

Gift

2.715

1.567

3.003

1

.083

15.112

Race

2.827

3

.419

Race(1)

3.773

13.863

.074

1

.785

43.518

Race(2)

1.163

13.739

.007

1

.933

3.199

Race(3)

6.347

14.070

.203

1

.652

570.898

Constant

-12.018

14.921

.649

1

.421

.000

 

The “Variables in the Equation” table shows the output resulting from including all of the candidate predictor variables in the equation. Notice that the Race variable, which was originally coded as 1=White, 2=African American, 3=Hispanic and 4=0ther has been changed (by the SPSS logistic procedure) into three (4 - 1) indicator variables called Race(1), Race(2), and Race (3).  These three variables each enter the equation with their own coefficient and p-value and there is an overall p-value given for Race.

The significance of each variable is measured using a Wald statistic. Using p=0.10 as a cutoff criterion for not including variables in the equation, it can be seen that Gender (p=0.142) and Race (p=0.419) do not appear to be important predictor variables. Age is marginal (p=0.104), but we’ll leave it in for the time being. Rerun the analysis again after taking out Gender and Race as predictor variables. The analysis is rerun without these “unimportant” variables, yields the following output:

 

Variables in the Equation

B

S.E.

Wald

df

Sig.

Exp(B)

Step 1

Price

.000

.000

6.165

1

.013

1.000

Age

.064

.032

4.132

1

.042

1.066

Gift

2.339

1.131

4.273

1

.039

10.368

Constant

-6.096

2.142

8.096

1

.004

.002

 

            This reduced model indicates that there is a significant predictive power for the variables Gift (p=0.039), Age (p=0.042), and Price (p=0.013). Although the p-value for Price is small, notice that the OR = 1 and the coefficient for Price is zero to three decimal places.  These seemingly contradictory bits of information (i.e. small p-value but OR = 1.0, etc.) are suggestive that the values for Price are hiding the actual Odds Ration (OR) relationship.  If the same model is run with the variable Price100, which is Price divided by 100, the odds ratio for Price100 is 1.041 and the estimated coefficient for Price100 is 0.040 as shown below.

 

Variables in the Equation

B

S.E.

Wald

df

Sig.

Exp(B)

Step 1

Age

.064

.032

4.132

1

.042

1.066

Gift

2.339

1.131

4.273

1

.039

10.368

Price100

.040

.016

6.165

1

.013

1.041

Constant

-6.096

2.142

8.096

1

.004

.002

 

All of the other values in the table remain the same. All we have done is to recode Price into a more usable number. Another tactic often used is to standardize values such as Price by subtracting the mean and dividing by the standard deviation. Using standardized scores eliminates the problem observed with the Price variable, and also simplifies the comparison of odds ratios for different variables.

 

            The result is that we can now see that the odds that a customer who is offered a gift will purchase a warranty is 10 (see Exp(B) for Gift) times greater than the corresponding odds for a customer not offered a gift.  We also observe that for each additional $100 in Price, the odds that a customer will purchase a warranty increases by about 4%. This tells us that people tend to be more likely to purchase warranties for more expensive appliances. Finally, the OR for age, 1.066, tells us that older buyers are more likely to purchase a warranty.

            One way to assess the model is by the Hosmer-Lemeshoi criteria. To product this information:

 

4.      Rerun the analysis and  click on the Options checkbox and select the select the Hosmer-Lemeshow goodness-of-fit.  Click Continue and OK.

 

Hosmer and Lemeshow Test

Step

Chi-square

df

Sig.

1

1.792

8

.987

 

This test divides the data into several groups based on  values, then computes a chi-square from observed and expected frequencies of subjects falling in the two categories of the binary response variable within these groups.  Large chi-square values (and correspondingly small p-values) indicate a lack of fit for the model.  In the table above we see that the Hosmer-Lemeshow chi-square test for the final warranty model yields a p-value of 0.987 thus suggesting a model with good predictive value. Note that the Hosmer and Lemeshow chi-square test is not a test of importance of specific model parameters (which may also appear in your computer printout). It is a separate post-hoc test performed to evaluate a specific model.

 

Interpretation of the multiple logistic regression model

 

            Once we are satisfied with the model, it can be used for prediction just as in the simple logistic example above. For this model, the prediction would be

 

 

            (For more details in predicting see Statistical Analysis Quick Reference Guideboo (Elliott, 2007.)

Using this equation it would be reasonable to predict that a person with the characteristics (Age = 54, Price = $3,850, and Gift = 1) would purchase a warranty because  and the person where no gift is offered would not be predicted to purchase a warranty because. The typical cutoff for the decision would be 0.5 (or 50%). Thus, using this cutoff anyone whose score was higher than 0.5 would be predicted to buy the warranty and anyone with a lower score would be predicted to not buy the warranty. However, there may be times when you want to adjust this cutoff value. Neter et al (1996) suggests three ways to select a cutoff value for predicting:

  • Use the standard 0.5 cutoff value.

  • Determine a cutoff value that will give you the best predictive fit for your sample data. This is usually determined through trial and error.

  • Select a cutoff value that will separate your sample data into a specific proportion of your two states based on a prior known proportion split in your population.

For example, to use the second option for deciding on a cutoff value, examine the model classification table that is part of the SPSS logistic output

 

Classification Tablea

Observed

Predicted

Bought

No

Yes

Percentage Correct

Step 1

Bought

No

12

2

85.7

Yes

1

35

97.2

Overall Percentage

94.0

a. The cut value is .500

 

This table indicates that the final model correctly classifies 94% of the cases correctly. The model used the default 0.5 cutoff value to classify each subject’s outcome. (Notice the footnote on the table “The cut value is .500.”)  You can rerun the analysis with a series of cutoff values such as 0.4, 0.45, 0.55 and 0.65 to see if the cutoff value could be adjusted for a better fit. For this particular model, these alternate cutoff values do not lead to better predictions. In this case, the default 0.5 cutoff value is deemed sufficient. (For more information about classification see Statistical Analysis Quick Reference Guidebook, 2007.)

References

  •  Cohen, J., Cohen, P. West, S.G., and Aiken, L.S. (2002) Applied Multiple regression/Correlation Analysis for the Behavioral Sciences, Third Edition, Lawrence Erlbaum Associates, Publishers.

  • Elliott, A., and Woodward, W. (2007) Statistical Analysis Quick Reference Guidebook, Thousand Oaks: Sage.

  • Hosmer, D.W. and Lemeshow, S. (2000).  Applied Logistic Regression, 2nd edition, New York: John Wiley and Sons, Inc.

  • Neter, J., Wasserman, W., Nachtsheim, C. J., & Kutner, M. H. (1996) Applied Linear Regression Models (3rd Ed.).Chicago: Irwin.

  See www.stattutorials.com/SPSSDATA for files mentioned in this tutorial © TexaSoft, 2008

End of tutorial

See http://www.stattutorials.com/SPSS

 

Stat book coverAlso, we recommend this book: Statistical Analysis Quick Reference Guidebook: With SPSS Examples is a practical "cut to the chase" handbook that quickly explains the when, where, and how of statistical data analysis as it is used for real-world decision-making in a wide variety of disciplines. In this one-stop reference, the authors provide succinct guidelines for performing an analysis, avoiding pitfalls, interpreting results, and reporting outcomes.
Paperback. Sage Publishers
ISBN: 1412925606
Order book from Amazon

 

WINKS Numbers

 

 

Get the SPSS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

© Copyright TexaSoft, 1996-2008

This site is not affiliated with SPSS(r)