We will go over it in class April 16. I'll do my best to get corrected assignments back to you by the 23rd if I can. It may not be possible.
|
THE MULTIVARIATE ANALYSIS OF CATEGORICAL DATA EXERCISE 4: LOGITS AND MORE LOGITS 20 points Susan Carol Losh Department of Educational Psychology and Learning Systems Florida State University |
In this exercise, you will investigate a possible two-stage causal model for owning a home computer (HOMEPC, coded so that 1 = yes and 0 = no. HOMEPC is already in your SPSS database; it's the original coding of the variable), GENDER, highest level of high school math (the recoded HIHSMATH) and two categories of education (REDUC2).
We'll use the General program (use HOMECOMP for this one) and the binary logistic regression program (use HOMEPC for this one).
At the end of the exercise you'll find a (non graded) Multinomial Regression exercise too.
For the binomial situation, you will predict first Reduc2, and then HOMEPC.
In the multinomial dependent variable distribution, DEGLEV4 has four categories or values to it, from high school or less to an advanced college degree. (Please use DEGLEV4 ONLY.)
The possible binomial model is:
Gender and high school
math influence degree level (reduc2).
Gender, high school
math and degree level influence owning a home computer .
This model is testable and can be falsified if the data fail to support it. If the data are consistent with the model, this does NOT mean the model is "true" but that it is suggestive and was not falsified by the analytic results.
The binomial and multinomial regression packages offer you choices about how you want to create the contrasts on your dependent variable.
Please
use SPSS VERSION 15 if possible.
|
|
|
|
ESTIMATES |
QUESTIONS |
|---|
|
|
Keep
your SATURATED program runs from MODEL SELECTION and GENERAL from Exercise
3.
If you haven't already, run the GENERAL program for all main effects and all two way effects for these four variables. Request the estimates and keep the output! (You can use your exercise 3 output here too, if that's the "Best Model" that you went with.)
Recently discovered program glitch in GENERAL: be sure to enter the main effects in the model FIRST. Very strange things happen to the effect estimate parameters if you don't.
You will use the Loglinear General...
program to test your
chosen model.
Examine
the causal order here: Gender and High School Math
degree attainment
owning a home computer
In addition, the model could postulate a possible direct causal effect of either gender or high school math (or both) on owning a home computer.
Can either owning
a home computer or having a college degree change someone's gender?
(I guess one never
knows...but see me for a brief biology lesson if you think either one is
likely to occur.)
Is it more likely that gender influenced level of educational attainment or is it more likely that the highest degree level attained caused one's gender? Which PROBABLY came first in time (remember these are adults): high school math or owning a home computer?
(NOTE: this is probably the last year I'll be able to postulate computer ownership as a dependent variable in this causal series since most children now grow up with a home PC.)
You will need your CD with the BIGDIGITALC19832006.sav data file on it
for Exercise 4.
To say that gender has a DIRECT causal effect on owning a home PC means that the gender by homepc PARTIAL ASSOCIATION is nonzero, controlling for other variables in the equation (i.e., degree level or high school math in this example).
To say that gender
has an INDIRECT causal effect on watching science tv statistically means
that both the gender by degree level
PARTIAL ASSOCIATION is nonzero AND the degree level by homepc
PARTIAL
ASSOCIATION is nonzero.
|
|
For this exercise,
you will keep the four-way table from Exercise 3 (reproduced below):
AT LEAST A BA DEGREE
| MATH LEVEL | 2 YEARS H.S. ALGEBRA OR MORE | 1 YEAR H.S. ALGEBRA OR LESS |
| GENDER | MALE | FEMALE | MALE | FEMALE |
| HAS HOME COMPUTER |
73.9%
|
69.1%
|
1256
|
60.7%
|
59.2%
|
112
|
|
| EVERYONE ELSE |
26.1
|
30.9
|
497
|
39.3
|
40.8
|
75
|
|
|
100.0%
942 |
100.0%
811 |
1753 |
100.0%
84 |
100.0%
103 |
187 |
JUNIOR COLLEGE OR LESS
| MATH LEVEL | 2 YEARS H.S. ALGEBRA OR MORE | 1 YEAR H.S. ALGEBRA OR LESS |
| GENDER | MALE | FEMALE | MALE | FEMALE |
| HAS HOME COMPUTER |
49.4%
|
46.0%
|
1829
|
31.0%
|
30.7%
|
922
|
|
| EVERYONE ELSE |
50.6
|
54.0
|
2014
|
69.0
|
69.3
|
2067
|
|
|
100.0%
1827 |
100.0%
2016 |
3843 |
100.0%
1273 |
100.0%
1716 |
2989 |
Source: NSF Surveys of Public Understanding of Science and Technology, 1990, 1995, 1997, 1999 and 2006, Directors: Jon D. Miller and Linda Kimmel; Opinion Research Corporation/MACRO, General Social Survey; available n = 8772 (weighted data).
MALES ONLY
| HIGH SCHOOL MATH ELECTIONS | HIGH | LOW |
|
| HIGH SCHOOL OR LESS |
48.8%
|
82.0%
|
2466
(59.8%)
|
| TWO YEAR DEGREE |
17.2%
|
11.9
|
637
(15.4)
|
| FOUR YEAR DEGREE |
21.3%
|
4.1
|
645
(15.6)
|
| GRADUATE DEGREE |
12.7%
|
2.0
|
378
(9.2)
|
|
100.0%
2768 |
100.0%
1358 |
100.0%
4126 |
FEMALES ONLY
| HIGH SCHOOL MATH ELECTIONS | HIGH | LOW |
|
| HIGH SCHOOL OR LESS |
53.5%
|
83.0%
|
3020 (65.0%)
|
| TWO YEAR DEGREE |
17.9%
|
11.5
|
714 (15.4)
|
| FOUR YEAR DEGREE |
18.9%
|
4.0
|
605 (13.0)
|
| GRADUATE DEGREE |
9.8%
|
1.6
|
306 (6.6)
|
|
100.0%
2826 |
100.1%
1819 |
100.0%
4645 |
Source: NSF Surveys of Public Understanding
of Science and Technology, 1990, 1995, 1997, 1999 and 2006, Directors:
Jon D. Miller and Linda Kimmel; Opinion Research Corporation/MACRO, General
Social Survey; available n = 8771 (weighted data; the
fractional respondent that sometimes occurs is due to incorporation of
the weight variable in analysis.)
|
|
Open the SPSS 15 program and load the BIGDIGITALC19832006.SAV file into the Data Editor.
This is the four variable model: gender, hihsmath, reduc2 and HOMECOMP. Either reuse your output from Exercise 3, or regenerate the GENERAL (1) saturated model and (2) the model incorporating all marginals and all two way effects. Make sure to include the parameter estimates and that the constant term box is checked.
TIP (SPSS recommends): under the Options portion, check Estimates and leave the checks on Frequencies and Residuals. Uncheck any options under Plots. The adjusted residuals will come out in your Frequencies/Residuals table and you will save some paper when you print your results (plots takes A LOT of paper).
[Test for your direct and indirect effects by dropping the terms that correspond to those particular partial associations. If the G2 goes up significantly when you drop terms, you must return those terms to the model. Use the partitioning of nested models with the G2s and their associated degrees of freedom. Use a X2 table (in the back of most texts if needed) to see if the difference in the G2s is statistically significant. Alternatively you can run and then use the partial association tests in the MODEL SELECTION program (you should have these from Exercise 3 also. They should deliver equivalent substantive results.]
Turn in your output with your exercise answers.
|
|
In phase 1, in the Dependent box, put reduc2
Enter these variables
into the Covariates box: gender hihsmath
|
|
Hold down the control key and select both gender and hihsmath.
This will light up the >a*b> or
interaction box.
Load the gender*hihsmath (implicitly the
*reduc2) interaction into the Covariates box.
Click on the Categorical... box and load gender and hihsmath to the right hand box.
Make sure the Indicator contrast
is used for both these variables (click the change contrast box if you
need to).
Leave the reference category as last and
click Continue.
Under Options, leave
the defaults in place and click on OK
|
You have just completed the program run for the predictive equation for reduc2.
However, since I have placed reduc2 as a mediator variable in my causal model, now we need to estimate the:
gender and hshimath (and also reduc2)-->homepc parameter estimates through a second logistic regression program run.
So, go ahead and modify your logistic regression run: make the dependent variable "homepc" (be sure to use homepc with its 1 and 0 coding) and the covariates "gender" "hihsmath" and "reduc2".
Add the interactions
GENDER*HIHSMATH GENDER*REDUC2 and HIHSMATH*REDUC2
(these correspond
to the three way interactions with HOMEPC.)
Make sure that reduc2 in this run is ALSO classified as a categorical variable with indicator coding.
Print the logistic program results for your two runs and also turn them in with Exercise 4.
THOUGHT: What happens
to your postulated causal model if gender has NO direct effect on reduc2?
ANOTHER THOUGHT:
Remember any interaction effects from your first logistic regression run.
What did these mean?
|
|
Let's do a multinomial regression run. We won't examine all the pieces in this analysis.
Try to get SPSS version 15 if you can. SPSS is changing things around again. For example, if you saved output for any earlier versions, such as those from version 15--SPSS 16 won't read it!! It will tell you its not an output viewer file. Ironically you CAN probably pull up 6.1.3 output as a syntax file but since 6.1.3 doesn't have a multinomial program this won't help you.
I had some other problems with 16 in our LRC. For example, having entered an interaction term, then wanting to rerun the program without it, I tried to get rid of the interaction--and no matter what I did, the program wouldn't let me do it. If you don't have any problems, that's terrific!
Under Analyze, select Regression, then Multinomial logistic...
Make DEGLEV4 your Dependent variable.
Make gender and hihsmath the Factor(s)
Keep the "ref" (referent) category as the last category.
Check Custom/stepwise for the model
choice under Model.
Use the TOP
Build Terms box to make Gender and Hihsmath Forced Entry Terms:
(we aren't doing a stepwise in this program
run.)
Keep the check on Include intercept
in
the model box.
Click on Continue.
Under Statistics..., see that these boxes are checked:
Click Continue.
Leave the Criteria and Options... sections alone.
Click on OK.
Remember this one is optional. You can do a second Multinomial Logistic Regression run with homepc as the dependent variable and gender hihsmath and deglev4 as the independent variables too if you would like to.
|
|
1. Your SPSS GENERAL
and BINARY LOGISTIC REGRESSION (2 runs) output (2 points)
(Multinomial regression
output is optional)
Although your output does not have a large weight, you must turn it in. That way, if needed, I can compare your output and your exercise answers. (I'm assuming you had the frequencies runs from Exercise 3.)
PLUS YOUR ANSWERS
TO QUESTIONS 2 - 10
BELOW:
|
|
2. (1 point) Did gender have a causal effect on reduc2? How did you know?
3. (1 point) Did hihsmath have a causal effect on reduc2? How did you know?
4. (2 points) Did you have any moderator
or interaction effects of gender and hihsmath on reduc2?
How did you know?
5. (3 points) Now, for HOMEPC (both logistic regression runs; HOMECOMP in the GENERAL program run):
Did gender have any causal effect
on HOMEPC?
Was this effect direct or indirect or
moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?
6. (3 points)
Did high school math have any causal effect
on HOMEPC?
Was this effect direct or indirect or
moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?
7. (2 points) What kind of effect did reduc2 have on HOMEPC? What it direct or indirect (or nonexistent?)
8 (2 points) Write out the entire (i.e., use all the appropriate coefficients you were given) numeric logistic regression estimate for REDUC2. (Remember the constant term!)
Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.
9. (2 points) Write out the entire (i.e.,
use all the appropriate coefficients you were given) numeric logistic
regression estimate for HOMEPC.
(Remember the constant term!)
Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.
(NOTE: How do you want to handle the terms that were statistically zero or no effect? Good idea to mention in questions 8 and 9.)
10. (2 points) Using all your output together,what do you think describes the best causal model to describe how gender, hihsmath and reduc2 affect having a home computer. This means talking about the associations and possible interactions among the variables, not presenting numeric loglinear results or symbols. Imagine that you are describing the results in a non-technical fashion to a colleague at a conference who is not familiar with loglinear or logit analysis. (You are, of course, allowed to allude to raising and lower effects and relative magnitude...)
Use BOTH words and a diagram to describe this model.
DUE MONDAY APRIL 20 2009 BY 5
PM. My mailbox 3210 Stone Building
(I will take these in class
Thursday April 16 as well.)
![]() |
|
READINGS |
|
This page created with Netscape
Composer
Susan Carol Losh
April 8 2009
![]()