I am performing logistic regression. I understand assumptions of logistic regression - Outliers, Multicollinearity. What i didn't understand how to select variables at beginning of model preparation. Do i need to check outcome of Y (event) with each independent variable and plot a scatter diagram using these two variable? Or Should i plot scatter diagram using both event and non-event against each independent variable? What are the criteria to select and eliminate variable? I have seen some researchers take log, exp of independent variables to see variable importance. I am aware of variable selection techniques - backward, forward and stepwise. But these variable selection techniques come into use when you include them into model.
Next question : If an independent variable is continuous, we grouped them in deciles and then we would see relationship between grouped categories and Y. If relationship is positive for some categories and negative for some categories. Should we use two variables - one for positive relation and other one for negative relation? Should we use numerical values or categorized values in this case? I'm sorry to post this question in SPSS forum. The idea is to ask this question to the right people as there are a lot of researchers, statisticians active in this forum. |
In general, the same principles that apply for normal distribution regression apply to logistic regression.
Some independent variables (IVs) are specified by the hypothesis or question. Others are control or background covariates and are entered if there is a significant bivariate relationship. My experience has been that scatter plots are not informative for dichotomous DVs because there are only two values. Better to use crosstabs for categorical IVs. Linearity is not an issue because categorical IVs are going to go in as contrast terms, just like in normal regression. The issue with continuous IVs is linearity and categorizing their distribution is good but once you do that you have a categorical variable. Thus crosstabs. Remember, however, that the linearity is not with proportion but with the logit. Better to run a bivariate logistic with the categorized continuous variable and see how the slope coefficients change. Alternatively, you could compute quadratic, cubic, etc terms for the continuous IV and add them one at a time to the regression along with the original IV. And, you could compute and plot the logits against the categorized IV. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Riya Sent: Thursday, November 06, 2014 1:04 PM To: [hidden email] Subject: Variable Selection for Logistic Regression I am performing logistic regression. I understand assumptions of logistic regression - Outliers, Multicollinearity. What i didn't understand how to select variables at beginning of model preparation. Do i need to check outcome of Y (event) with each independent variable and plot a scatter diagram using these two variable? Or Should i plot scatter diagram using both event and non-event against each independent variable? What are the criteria to select and eliminate variable? I have seen some researchers take log, exp of independent variables to see variable importance. I am aware of variable selection techniques - backward, forward and stepwise. But these variable selection techniques come into use when you include them into model. Next question : If an independent variable is continuous, we grouped them in deciles and then we would see relationship between grouped categories and Y. If relationship is positive for some categories and negative for some categories. Should we use two variables - one for positive relation and other one for negative relation? Should we use numerical values or categorized values in this case? I'm sorry to post this question in SPSS forum. The idea is to ask this question to the right people as there are a lot of researchers, statisticians active in this forum. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-Selection-for-Logistic-Regression-tp5727829.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |