I have
1) a dependent ordinal variable with 4 groups (low inadequate, high inadequate, adequate and excess) 2) a nominal dependent variable with 2 groups (normal and abnormal) 3) sample size 87 4) one cell value less than 5. I am using SPSS 20. Can anyone tell what test I should do - parametric/non-parametric ? If Parametric whether I have to do Exact Fisher test or not ? |
Administrator
|
You've identified both variables as dependent (points 1 and 2 below). Given that Group is often an explanatory variable, I guess you meant to say group is the independent variable. If so, you could use the linear-by-linear chi-square test in the output from CROSSTABS. For more info, see this note by Dave Howell:
http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Sorry for my typing mistake. Actually I have an independent variable with 4 categories (low inadequate, high inadequate, adequate and excess).
I have another independent variable with 4 categories (underweight, normal weight, overweight and obese). I used chi-square followed by binary logistic regression with these variables. I want to know how can I detect multicollinearity between these two categorical independent variable? |
What is the construct behind the first independent variable,
i.e., the one with (low inadequate, high inadequate, adequate and
excess) as values.
Your situation is still unclear. Are you trying to to predict one or more other variables from these? Or are you just interested in the relation of these two ordinal and even possibly interval level variables? What are those variables? What are the values they can have? multicollinearity is usually a consideration when using variables as independent in a general linear model like regression. Art Kendall Social Research ConsultantsOn 4/1/2013 10:24 AM, dr_msantu [via SPSSX Discussion] wrote: Sorry for my typing mistake. Actually I have an independent variable with 4 categories (low inadequate, high inadequate, adequate and excess).
Art Kendall
Social Research Consultants |
Administrator
|
Art asks about the first independent variable. For the second, it appears you are using BMI categories. If you have the actual BMI, why not just analyze it as a continuous variable? After all, the cut-points used to form categories are somewhat arbitrary, and using categories throws away information and uses up degrees of freedom.
But, to address the question about assessing multicollinearity when one has indicator variables for categorical variables, you could take a look at the generalized variance inflation factor (GVIF). E.g., see section 3.1 in the notes found here: http://rbakker.myweb.uga.edu/week10.7014.2008.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
My independent variables are:
1) BMI (Basal Metabolic Index) measured as continuous variable and also grouped into 4 categories: underweight, normal weight, overweight and obese as defined by WHO. 2) Gestational weight gain measured as continuous variable and grouped into 3 categories: inadequate, recommended and excess as defined by Institute of Medicine, 2009. 3) Maternal age measured as continuous variable in years. 4) Socioeconomic status in modified Kupuswamy scale as: lower, upper lower and lower middle. My dependent variables are: 1) Birth weight (measured as continuous) 2) Birth weight category (low or normal) 3) Birth category (preterm or term) 4) Condition at birth (asphyxiated or normal) 5) Mode of delivery (caesarean or vaginal) 6) Maternal hypertension (present or not) 7) Maternal gestational diabetes (present or not) I have used binary logistic regression for the above variables. How can I detect multicollinearity among my independent variables or is this at all possible in binary logistic regression. |
On Sun, Apr 7, 2013 at 3:50 PM, dr_msantu <[hidden email]> wrote: My independent variables are: |
Hi I'll be in Cairo this week but checking mail as much as i can subject to the usual interventions; political, technical ... and other Thanks John |
In reply to this post by dr_msantu
Ryan gives a good reference for obtaining the multicollinearity stats.
For this set of IVs, it does not look like it should be a problem, so long as you don't try putting into one analysis both ways of measuring BMI or weight gain. Another problem might be the sample size, which you stated as N=87 in your first post. A recommended standard for ordinary regression is to have 10 (or twenty) cases for each d.f. of predictor. The similar standard for for Logistic regression is to have 10 cases for each d.f. in the *smaller* of the two outcome groups. You meet that criterion for whichever outcomes are pretty evenly split; otherwise, not. And if you use your categorical predictors as categories -- then, not. You might "confirm" your LR analyses by seeing that their coefficients are not too different from those of the similar OLS regression. The LR coefficients will become very extreme if the sample happens to provide 100% separation for some prediction equation. They will be distorted to a lesser amount if the separation is near 100% and hinges on just a couple of cases. (This is a point that is somewhat subtle, and is not well-recognized.) -- Rich Ulrich > Date: Sun, 7 Apr 2013 12:50:14 -0700 > From: [hidden email] > Subject: Re: Nominal Depndent with Ordinal Independent > To: [hidden email] > > My independent variables are: > 1) BMI (Basal Metabolic Index) measured as continuous variable and also > grouped into 4 categories: underweight, normal weight, overweight and obese > as defined by WHO. > 2) Gestational weight gain measured as continuous variable and grouped into > 3 categories: inadequate, recommended and excess as defined by Institute of > Medicine, 2009. > 3) Maternal age measured as continuous variable in years. > 4) Socioeconomic status in modified Kupuswamy scale as: lower, upper lower > and lower middle. > > My dependent variables are: > 1) Birth weight (measured as continuous) > 2) Birth weight category (low or normal) > 3) Birth category (preterm or term) > 4) Condition at birth (asphyxiated or normal) > 5) Mode of delivery (caesarean or vaginal) > 6) Maternal hypertension (present or not) > 7) Maternal gestational diabetes (present or not) > > I have used binary logistic regression for the above variables. How can I > detect multicollinearity among my independent variables or is this at all > possible in binary logistic regression. > > ... |
Administrator
|
Here's a nice readable article that speaks to Rich's point about "over-fitting" your models.
http://people.duke.edu/~mababyak/papers/babyakregression.pdf For logistic regression, that 10 events-per variable (EPV) is a bare minimum in the simulations by Peduzzi et al (cited in the article above). IIRC, Babyak (and Frank Harrell) suggest that 15 or 20 EPV is better. Re linear regression, here are some comments from Dave Howell's popular textbook. http://www.angelfire.com/wv/bwhomedir/notes/linreg_rule_of_thumb.txt HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Hi I'll be in Cairo this week but checking mail as much as i can Thanks John |
In reply to this post by Bruce Weaver
I am out of the office 4/8-16/13. I will respond to email when I return to the office.
·
[hidden email] for questions or assistance with
FY14 proposal process questions including Idea Market, Strategy Market or Innovation@MITRE (sponsor portal)
·
[hidden email] for questions or assistance with
MIP funded project processes or the website
·
[hidden email] for questions about or assistance with MIP ProjectPages
(page functionality, content on the pages, i.e. PI name, project title, etc.)
·
[hidden email] for questions about charge numbers
·
[hidden email] for assistance with
other Innovation Zone sites, such as CI&T InZone Regards, Mary Lou |
Administrator
|
In reply to this post by Ryan
I don't think the approach described there is ideal for categorical variables. This excerpt from John Fox's book (2008, p. 322) explains why.
--- Beginning of excerpt from Fox (2008) --- The correlations among a set of dummy regressors are affected by the choice of baseline category. Similarly, the correlations among a set of polynomial regressors in an explanatory variable X are affected by adding a constant to the X values. Neither of these changes alters the fit of the model to the data, however, so neither is fundamental. It is indeed always possible to select an orthogonal basis for dummy-regressor or polynomial-regressor subspace (although such a basis does not employ dummy variables or simple powers of X). What is at issue is the subspace itself and not the arbitrarily chosen basis for it. We are not concerned, therefore, with the "artificial" collinearity among dummy regressors or polynomial regressors in the same set. We are instead interested in the relationships between the subspaces generated to represent the effects of different explanatory variables. --- End of excerpt from Fox (2008) --- He then goes on to discuss the generalized variance inflation factor (GVIF), which does handle dummy- and polynomial-regressors appropriately. Earlier in the thread, I pointed to section 3.1 in the notes found here for a brief description of GVIF: http://rbakker.myweb.uga.edu/week10.7014.2008.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce, Â I don't recall the OP stating that categorical predictors were included. I guess I missed it. Nonetheless, the excerpt is interesting. Â Best,
 Ryan On Mon, Apr 8, 2013 at 9:00 AM, Bruce Weaver <[hidden email]> wrote: I don't think the approach described there is ideal for categorical |
Hi I'll be out of the office until the afternoon of Friday 8th April but checking mail as much as i can Thanks John |
In reply to this post by Ryan
My full sample size is 308.
I am still confused regarding the detection of multicollinearity among the variables of my data set. I could not find the wright answer. Is there any method by which I can adjust age or socioeconomic status (so that I can detect whether BMI or gestational weight gain can affect pregnancy outcome independent of age and socioeconmic status)? |
Banned User
|
I will be out of the office until Monday, April 15, 2013, with no access to email. However, please know that your message is very important to me and I will respond when I return.
If this is an emergency, please call our office at 734.459.1030.
Thank you.
Sincerely, Cheryl _____________________________________________________ Cheryl A. Boglarsky, Ph.D. Human Synergistics, Inc. 39819 Plymouth Road Plymouth, MI 48170 734.459.1030
This message includes legally privileged and confidential information that is intended only for the use of the recipient named above. All readers of this message, other than the intended recipient, are hereby notified that any dissemination, modification, distribution or reproduction of this e-mail is strictly forbidden. |
In reply to this post by dr_msantu
I am currently attending a course interstate and will return on Monday April 15th. I will respond to your email as soon as possible after my return.
Thank you. |
In reply to this post by dr_msantu
When reviewing the website I sent you (you are welcome BTW), Rich and Bruce's responses, the regression book from which you were taught, and an extensive literature search on the matter, what answer(s) did you find?
Ryan On Apr 10, 2013, at 4:06 AM, dr_msantu <[hidden email]> wrote: > My full sample size is 308. > > I am still confused regarding the detection of multicollinearity among the > variables of my data set. I could not find the wright answer. > > Is there any method by which I can adjust age or socioeconomic status (so > that I can detect whether BMI or gestational weight gain can affect > pregnancy outcome independent of age and socioeconmic status)? > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Nominal-Depndent-with-Ordinal-Independent-tp5718913p5719374.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by dr_msantu
You have correctly identified my problem. I think I should not try to
found "multicollinearity" in logistic regression, rather I should concentrate on "confounding". I am very much thankful to all of you for making such thorough discussion regarding "multicollinearity", which helped me a lot. I have two more questions: 1) Is my sample size (308) is adequate enough for this kind of binary logistic analysis. 2) I have run the binary logistic regression with BMI category(ordinal), Gestational weight gain category(ordinal), age(continuous) and socioeconomic status(ordinal) as independent variable for my dichotomous dependent variable (such as LBW and NBW). I used SPSS 20. I found that age was not significantly associated as p >0.05 and change of coefficient is minimum (for underweight BMI it cganged to 1.182 from 1.181. But, when I run with socioeconomic status the change of coefficient is much more and it is also statistically significant p < 0.05. From these may I conclude that socioeconomic status is correlated with BMI and Wgain ?. If the confounding factor is present I could I make adjustment for this? On 10/04/2013, Rich Ulrich <[hidden email]> wrote: > I think you are concerned more with what is usually discussed as > "confounding" rather than as "multicollinearity." > > Originally - and still, sometimes - multicollinearity refers to > what you get, say, when you use 3 dummy variables to code > three categories. Or you include a set of items along with their > total score. That is, you have redundancy, and the computer > algorithms will choke and fail when they try to divide by zero > (or, allowing for round-off, too near to zero). > > The usual extended version of concern with multicollinearity arises > when there is near-redundancy, and this results in "variance inflation" > for the predictors. That is: if your best predictor is equal parts of A > and B, scaled the same, then when A and B are too similar, you will > find that (0.9*A+ 0.1*B) is practically the same as (0.1*A+ 0.9*B) ... > Then, even if A and B separately have small CIs on their coefficients, > the combined regression will show large CIs on whatever the coefficients > come out as. That is the variance inflation. (One simple solution that > sometimes works great is to replace predictors [A, B] with the orthogonal > pair [(A+B), (A-B)]. YMMV.) > > If you want to know whether Age and SES change the partial-associations > seen for BMI and WGain, the simple answer is the direct one: Do it. > Run an analysis of outcome for BMI and WGain; run another analysis for > BMI and WGain that controls for the demographics. Do the coefficients > change? > > If the demographic variables are not correlated with BMI and WGain, there > will be no change. If correlated, there may or may not be change. It is > possible for the coefficients to the same while the tests become more > significant. It is appropriate, in the discussion of results, to comment on > > whether there was much reason to expect confounding. Is either BMI or > WGain associated with age or SES? If they are not associated even a little > bit, then there is confounding possible. But it does not require a huge > association for some confounding to show its effect, if that is merely > shifting > a p-value from 0.051 to 0.049. > > -- > Rich Ulrich > >> Date: Wed, 10 Apr 2013 01:06:21 -0700 >> From: [hidden email] >> Subject: Re: Nominal Depndent with Ordinal Independent >> To: [hidden email] >> >> My full sample size is 308. >> >> I am still confused regarding the detection of multicollinearity among >> the >> variables of my data set. I could not find the wright answer. >> >> Is there any method by which I can adjust age or socioeconomic status (so >> that I can detect whether BMI or gestational weight gain can affect >> pregnancy outcome independent of age and socioeconmic status)? >> >> >> ... > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |