|
Greetings all, I have been undertaking some multiple regressions in SPSS,
starting with OLS and moving away from this method if the assumptions are not
met. I have a DV where it is an average of counts of response
options from 4 questions in a survey, censored between 0 and 14, so this
variable has an explicit order. For the IVs, there is a series of these and they mainly
comprise dummy variables, for example sex is coded 0 (male)/1 (female), income
quintiles have the bottom quintile as the omitted comparator with 0/1 codes
against the four quintile dummy variables included in the model. There are also
3 ratio level IVs. For the normal OLS regression, the dummy variables are
accurately output – SPSS seems to automatically recognise the dummy variables
as such, and I get nice parameter estimates. In PLUM, on the other hand, SPSS is giving output on both
levels of all dummy variables (so I get a line for the “0” value
and a line for the “1” value in the parameter estimates table, with
all the “1” values being footnoted as “This parameter is set
to zero as it is redundant”. Basically, it’s looking like the model
is over specified. On probably a related note, I get a warning “Warnings:
There are 3381 (75.0%) cells (i.e., dependent variable levels by combinations
of predictor variable values) with zero frequencies.” A subset of the
parameter values is:
All the dummy variables are specified as factors in the PLUM
command, and the three ratio variables are entered as covariates. I have looked
at the PLUM information in the help menu, and done a search online but cannot
find any information on what I am doing wrong. I have also emailed the SPSS support,
but they don’t seem to be familiar with PLUM. I have done a search of the
archives for this listserv and have been unable to locate a similar issue. I’m running PLUM as my DV is highly non-normal. I
tried it with an unbinned, and then a binned DV, and the error remains
(initially I thought the issue might have been too many DV categories). Could someone please advise me where I am going wrong on
this? The syntax I have used is: PLUM BINNEDOVERALLBENEFIT BY CLAIMPRESENT AGE18TO34
AGE35TO54 SEXFEMALE COUNTRYAUSTRALIA DEPENDENTS INCOMEQUINTILE2 INCOMEQUINTILE3
INCOMEQUINTILE4 INCOMEQUINTILE5 NOINCOMEGIVEN EDUCATIONHIGH EDUCATIONGIVEN
CONCERNSGENERAL CONCERNSSPECIFIC CONCERNSBOTH NUTRITCORMOD NUTRITCORHIGH NUTRITMOTMOD NUTRITMOTHIGH WITH MICRONUTKNOW MICRONUTFAML
FRUITANDVEG /CRITERIA=CIN(95) DELTA(0) LCONVERGE(0) MXITER(100) MXSTEP(5)
PCONVERGE(1.0E-6) SINGULAR(1.0E-8) /LINK=LOGIT /PRINT=FIT PARAMETER SUMMARY. Cheers Michelle Michelle Gosse Consumer and Social Sciences Food Standards Australia New Zealand 108 The Terrace Wellington New Zealand ph: 0064-4-978-5652 email: [hidden email] ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager.
This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses.
www.clearswift.com **********************************************************************
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Administrator
|
Hi Michelle. The key word BY precedes "factors", which are categorical variables. So for your income quintiles, you would need a SINGLE variable with values from 1 to 5; and for your age groups, a single variable with values from 1 to 3. But given that you've already computed indicator variables for everything, you could also just get rid of the BY, and put all your indicator variables (and continuous variables) after the key word WITH. BUT...it looks to me like you have a much more serious problem here. I.e., you are fitting 24 parameters (including the constant), and so you would need an ENORMOUS sample size, with enough cases falling into each of the outcome bins to avoid SERIOUS over-fitting of the model. For ordinary binary logistic regression, for example, one needs in the order of 15-20 'events' per model parameter to avoid over-fitting (where 'event' = the less frequent outcome category). I've not come across a similar guideline geared specifically to ordinal logistic regression; but I suspect you have far more variables in the model than your data can support. For a nice readable discussion of over-fitting in regression models, see Mike Babyak's article. http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Hi Bruce,
Thanks for the quick and clear response, this had me tearing my hair out half of yesterday. So handy to know that I don't have to spend another hour or so doing recoding, and then checking I did the recoding correctly, that is fantastic news about rejigging the model line in PLUM. My sample size is 1127, I had worked on a rule of thumb of 10 observations per IV, your recommendation sounds a bit different to that. I have created the bins so that there are in the order of 50+ observations per dummy category - I'll look into the event versus observation methods for deciding on the maximum number of IVs, I hadn't heard of the other method until you mentioned it, so thanks for that information. It sounds very useful in assisting in working out sample size in the first place. The variables are in the model because previous research tells us that these factors should be important for the work I am doing, so the model specification is theory-based. I've come back to SPSS after numerous years of using SAS, and SPSS code is very different. Thanks again for your help. :) Cheers Michelle <snip my stuff out> Hi Michelle. The key word BY precedes "factors", which are categorical variables. So for your income quintiles, you would need a SINGLE variable with values from 1 to 5; and for your age groups, a single variable with values from 1 to 3. But given that you've already computed indicator variables for everything, you could also just get rid of the BY, and put all your indicator variables (and continuous variables) after the key word WITH. BUT...it looks to me like you have a much more serious problem here. I.e., you are fitting 24 parameters (including the constant), and so you would need an ENORMOUS sample size, with enough cases falling into each of the outcome bins to avoid SERIOUS over-fitting of the model. For ordinary binary logistic regression, for example, one needs in the order of 15-20 'events' per model parameter to avoid over-fitting (where 'event' = the less frequent outcome category). I've not come across a similar guideline geared specifically to ordinal logistic regression; but I suspect you have far more variables in the model than your data can support. For a nice readable discussion of over-fitting in regression models, see Mike Babyak's article. http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf HTH. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.clearswift.com ********************************************************************** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
