THIS EXERCISE IS DUE BY MONDAY APRIL 20 5 PM. My mailbox 3210 Stone Building
(I will take these in class Thursday April 16 as well.)

We will go over it in class April 16. I'll do my best to get corrected assignments back to you by the 23rd if I can. It may not be possible.
READINGS GUIDE 1: ISSUES IN MODELING
GUIDE 2: TERMINLOGY
GUIDE 3: THE LOWLY 2 X 2 TABLE
GUIDE 4: BASICS ON FITTING MODELS
GUIDE 5: SOME REVIEW, EXTENSIONS, LOGITS
GUIDE 6: LOGLINEAR & LOGIT MODELS
GUIDE 7: LOG-ODDS AND MEASURES OF FIT
GUIDE 8: LOGITS,LAMBDAS & OTHER GENERAL THOUGHTS
OVERVIEW

 
 

EDF 6937-01       SPRING 2009
THE MULTIVARIATE ANALYSIS OF CATEGORICAL DATA
EXERCISE 4: LOGITS AND MORE LOGITS
20 points
Susan Carol Losh
Department of Educational Psychology and Learning Systems
Florida State University

In this exercise, you will investigate a possible two-stage causal model for owning a home computer (HOMEPC, coded so that 1 = yes and 0 = no. HOMEPC is already in your SPSS database; it's the original coding of the variable), GENDER, highest level of high school math (the recoded HIHSMATH) and two categories of education (REDUC2).

We'll use the General program (use HOMECOMP for this one) and the binary logistic regression program (use HOMEPC for this one).

At the end of the exercise you'll find a (non graded) Multinomial Regression exercise too.

 This exercise moves us to logistic regression.

For the binomial situation, you will predict first Reduc2, and then HOMEPC.

In the multinomial dependent variable distribution, DEGLEV4  has four categories or values to it, from high school or less to an advanced college degree. (Please use DEGLEV4 ONLY.)

The possible binomial model is:

Gender and high school math influence degree level (reduc2).
Gender, high school math and degree level influence owning a home computer .

This model is testable and can be falsified if the data fail to support it. If the data are consistent with the model, this does NOT mean the model is "true" but that it is suggestive and was not falsified by the analytic results.

The binomial and multinomial regression packages offer you choices about how you want to create the contrasts on your dependent variable.

Please use SPSS VERSION 15 if possible.
 
 
PROGRAM AND OTHER NUANCES
TABLES
PRELIMINARIES
PARAMETER
ESTIMATES
ASSIGNMENT
QUESTIONS

 

REVIEW: PROGRAM AND OTHER NOTES FOR THIS EXERCISE 

Keep your SATURATED program runs from MODEL SELECTION and GENERAL from Exercise 3.

If you haven't already, run the GENERAL program for all main effects and all two way effects for these four variables. Request the estimates and keep the output! (You can use your exercise 3 output here too, if that's the "Best Model" that you went with.)

Recently discovered program glitch in GENERAL: be sure to enter the main effects in the model FIRST. Very strange things happen to the effect estimate parameters if you don't.

You will use the Loglinear  General... program to test your chosen model.

Examine the causal order here: Gender and High School Math   degree attainment  owning a home computer

In addition, the model could postulate a possible direct causal effect of either gender or high school  math (or both) on owning a home computer.

Can either owning a home computer or having a college degree change someone's gender?
(I guess one never knows...but see me for a brief biology lesson if you think either one is likely to occur.)

Is it more likely that gender influenced level of educational attainment or is it more likely that the highest degree level attained caused one's gender? Which PROBABLY came first in time (remember these are adults): high school math or owning a home computer?

(NOTE: this is probably the last year I'll be able to postulate computer ownership as a dependent variable in this causal series since most children now grow up with a home PC.)

You will need your CD with the BIGDIGITALC19832006.sav data file on it for Exercise 4.


To say that gender has a DIRECT causal effect on owning a home PC means that the gender by homepc PARTIAL ASSOCIATION is nonzero, controlling for other variables in the equation (i.e., degree level or high school math in this example).

To say that gender has an INDIRECT causal effect on watching science tv statistically means that both the gender by degree level PARTIAL ASSOCIATION is nonzero AND the degree level by homepc PARTIAL ASSOCIATION is nonzero.
  


TABLES

For this exercise, you will keep the four-way table from Exercise 3 (reproduced below):
 
 

AT LEAST A BA DEGREE

MATH LEVEL 2 YEARS H.S. ALGEBRA OR MORE  1 YEAR H.S. ALGEBRA OR LESS
GENDER MALE FEMALE     MALE FEMALE  
HAS HOME COMPUTER
73.9%
69.1%
1256
 
60.7%
59.2%
112
EVERYONE ELSE
26.1
30.9
497
 
39.3
40.8
75
 
100.0%
942
100.0%
811
 

1753
 
100.0%
84
100.0%
103
 

187

JUNIOR COLLEGE OR LESS

MATH LEVEL 2 YEARS H.S. ALGEBRA OR MORE 1 YEAR H.S. ALGEBRA OR LESS
GENDER MALE FEMALE     MALE FEMALE  
HAS HOME COMPUTER
49.4%
46.0%
1829
 
31.0%
30.7%
922
EVERYONE ELSE
50.6
54.0
2014
 
69.0
69.3
2067
 
100.0%
1827
100.0%
2016
 

3843
 
100.0%
1273
100.0%
1716
 

2989

Source: NSF Surveys of Public Understanding of Science and Technology, 1990, 1995, 1997, 1999 and 2006, Directors: Jon D. Miller and Linda Kimmel; Opinion Research Corporation/MACRO, General Social Survey; available n = 8772 (weighted data).



For the multinomial exercise, degree level with four categories will be your final dependent variable and gender and high school math attainment will be the independent variables. Below is the percentage table that corresponds to this model.

MALES ONLY

HIGH SCHOOL MATH ELECTIONS HIGH LOW
TOTALS
HIGH SCHOOL OR LESS
48.8%
82.0%
2466 (59.8%)
TWO YEAR DEGREE
17.2%
11.9
637 (15.4)
FOUR YEAR DEGREE
21.3%
4.1
645 (15.6)
GRADUATE DEGREE
12.7%
2.0
378 (9.2)
 
100.0%
2768
100.0%
1358
100.0%
4126

FEMALES ONLY

HIGH SCHOOL MATH ELECTIONS HIGH LOW
TOTALS
HIGH SCHOOL OR LESS
53.5%
83.0%
3020 (65.0%)
TWO YEAR DEGREE
17.9%
11.5
714 (15.4)
FOUR YEAR DEGREE
18.9%
4.0
605 (13.0)
GRADUATE DEGREE
9.8%
1.6
306 (6.6)
 
100.0%
2826
100.1%
1819
100.0%
4645

Source: NSF Surveys of Public Understanding of Science and Technology, 1990, 1995, 1997, 1999 and 2006, Directors: Jon D. Miller and Linda Kimmel; Opinion Research Corporation/MACRO, General Social Survey; available n = 8771 (weighted data; the fractional respondent that sometimes occurs is due to incorporation of the weight variable in analysis.)
 



PRELIMINARIES AND THE SATURATED MODEL: YOUR SPSS GENERAL PROGRAM RUN

Open the SPSS 15 program and load the BIGDIGITALC19832006.SAV file into the Data Editor.


This is the four variable model: gender, hihsmath, reduc2 and HOMECOMP.  Either reuse your output from Exercise 3, or regenerate the GENERAL (1) saturated model and (2) the model incorporating all marginals and all two way effects. Make sure to include the parameter estimates and that the constant term box is checked.

 TIP (SPSS recommends): under the Options portion, check Estimates and leave the checks on Frequencies and Residuals. Uncheck any options under Plots. The adjusted residuals will come out in your Frequencies/Residuals table and you will save some paper when you print your results (plots takes A LOT of paper).

[Test for your direct and indirect effects by dropping the terms that correspond to those particular partial associations. If the G2 goes up significantly when you drop terms, you must return those terms to the model. Use the partitioning of nested models with the G2s and their associated degrees of freedom. Use a X2 table (in the back of most texts if needed) to see if the difference in the G2s is statistically significant. Alternatively you can run and then use the partial association tests in the MODEL SELECTION program (you should have these from Exercise 3 also. They should deliver equivalent substantive results.]

Turn in your output with your exercise answers.



THE BINARY LOGISTIC REGRESSION PROGRAM



Under Analyze and Regression, go to the Binary Logistic Regressionprogram

In phase 1, in the Dependent box, put reduc2

Enter these variables into the Covariates box: gender    hihsmath
 
 
We'll double check to see if there's a three way interaction among gender, hihsmath and reduc2 also:

Hold down the control key and select both gender and hihsmath.

This will light up the >a*b> or interaction box.
Load the gender*hihsmath (implicitly the *reduc2) interaction into the Covariates box.

Click on the Categorical... box and load gender and hihsmath to the right hand box.

Make sure the Indicator contrast is used for both these variables (click the change contrast box if you need to).
Leave the reference category as last and click Continue.

Under Options, leave the defaults in place and click on OK
 
 

 
Notice that your dependent variable is no longer listed among the variables in the Covariates: box. 

If you list interaction effects (e.g., gender by hihsmath) as we are doing initially, this means you are really looking at the three-way interaction among gender, hihsmath and reduc2. In general, don't include interactions unless they really are part of your best model. The program will automatically fit the interactions and associations among the independent variables, but that is in the background, out of sight. It will also fit all the univariate marginals for the independent variables and you won't see those on the logit output either.

You have just completed the program run for the predictive equation for reduc2.

However, since I have placed reduc2 as a mediator variable in my causal model, now we need to estimate the:

gender and hshimath (and also reduc2)-->homepc parameter estimates through a second logistic regression program run.

So, go ahead and modify your logistic regression run: make the dependent variable "homepc" (be sure to use homepc with its 1 and 0 coding) and the covariates "gender" "hihsmath" and "reduc2".

Add the interactions GENDER*HIHSMATH  GENDER*REDUC2 and HIHSMATH*REDUC2
(these correspond to the three way interactions with HOMEPC.)

Make sure that reduc2 in this run is ALSO classified as a categorical variable with indicator coding.

Print the logistic program results for your two runs and also turn them in with Exercise 4.

THOUGHT: What happens to your postulated causal model if gender has NO direct effect on reduc2?
ANOTHER THOUGHT: Remember any interaction effects from your first logistic regression run. What did these mean?


 
AND NOW, JUST FOR FUN, THE MULTINOMIAL REGRESSION RUN: FIRST TRY

Let's do a multinomial regression run. We won't examine all the pieces in this analysis.

Try to get SPSS version 15 if you can. SPSS is changing things around again. For example, if you saved output for any earlier versions, such as those  from version 15--SPSS 16 won't read it!! It will tell you its not an output viewer file. Ironically you CAN probably pull up 6.1.3 output as a syntax file but since 6.1.3 doesn't have a multinomial program this won't help you.

I had some other problems with 16 in our LRC. For example, having entered an interaction term, then wanting to rerun the program without it, I tried to get rid of the interaction--and no matter what I did, the program wouldn't let me do it. If you don't have any problems, that's terrific!

Under Analyze, select Regression, then Multinomial logistic...

Make DEGLEV4 your Dependent variable.

Make gender and hihsmath the Factor(s)

Keep the "ref" (referent) category as the last category.

Check Custom/stepwise for the model choice under Model.
Use the TOP Build Terms box to make Gender and Hihsmath Forced Entry Terms:
(we aren't doing a stepwise in this program run.)
Keep the check on Include intercept in the model box.
Click on Continue.

Under Statistics..., see that these boxes are checked:

We're not going to use all these statistics this time, but it's good to get familiar with them.

Click Continue.

Leave the Criteria and Options... sections alone.

Click on OK.

Remember this one is optional. You can do a second Multinomial Logistic Regression run with homepc as the dependent variable and gender hihsmath and deglev4 as the independent variables too if you would like to.


ASSIGNMENT QUESTIONS 

1. Your SPSS GENERAL and BINARY LOGISTIC REGRESSION (2 runs) output (2 points)
(Multinomial regression output is optional)

Although your output does not have a large weight, you must turn it in. That way, if needed, I can compare your output and your exercise answers. (I'm assuming you had the frequencies runs from Exercise 3.)

PLUS YOUR ANSWERS TO QUESTIONS 2 - 10  BELOW:
 
Questions 2-10 use the GENERAL and LOGISTIC REGRESSION RUNS RESULTS (consulting them together)

2. (1 point) Did gender have a causal effect on reduc2? How did you know?

3. (1 point) Did hihsmath have a causal effect on reduc2? How did you know?

4. (2 points) Did you have any moderator or interaction effects of gender and hihsmath on reduc2?
How did you know?

5. (3 points) Now, for HOMEPC (both logistic regression runs; HOMECOMP in the GENERAL program run):

Did gender have any causal effect on HOMEPC?
Was this effect direct or indirect or moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?

6. (3 points)

Did high school math have any causal effect on HOMEPC?
Was this effect direct or indirect or moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?

7. (2 points) What kind of effect did reduc2 have on HOMEPC? What it direct or indirect (or nonexistent?)

8 (2 points) Write out the entire (i.e., use all the appropriate coefficients you were given) numeric logistic regression estimate for REDUC2. (Remember the constant term!)

Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.

9. (2 points) Write out the entire (i.e., use all the appropriate coefficients you were given) numeric logistic regression estimate for HOMEPC.
(Remember the constant term!)

Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.

(NOTE: How do you want to handle the terms that were statistically zero or no effect? Good idea to mention in questions 8 and 9.)

10. (2 points) Using all your output together,what do you think describes the best causal model to describe how gender, hihsmath and reduc2 affect having a home computer. This means talking about the associations and possible interactions among the variables, not presenting numeric loglinear results or symbols. Imagine that you are describing the results in a non-technical fashion to a colleague at a conference who is not familiar with loglinear or logit analysis. (You are, of course, allowed to allude to raising and lower effects and relative magnitude...)

Use BOTH words and a diagram to describe this model.

DUE MONDAY APRIL 20 2009 BY 5 PM. My mailbox 3210 Stone Building
(I will take these in class Thursday April 16 as well.)
 
 
OVERVIEW
READINGS

This page created with Netscape Composer
Susan Carol Losh
April 8 2009