how does spss decide on the number of coefficients when fitting best subsets?
the number of coefficients, n, used in the model appears to be for all predictors with p < .10, no matter what criterion is used. thus the corrected model fit uses n, even when it has decided that the number of significant predictors is m n-m can vary from 0 to as much as 8 if tough criterion is specified. this seems rather odd to me. please do NOT tell me that best subset, like all forms of 'automatic' variable selection is evil. I know that this is a widely accepted view. however, in some situations it is better [less biassed] than the 'theoretical' opinion, model, of the researcher all help gratefully received best diana
___________ Professor Diana Kornbrot Work University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK +44 (0) 170 728 4626 skype: kornbrotme Home 19 Elmhurst Avenue London N2 0LT, UK +44 (0) 208 444 2081 |
With best subsets, individual significance
levels for entry and removal do not apply - you can see that they are disabled
in the dialog box in the Build Options tab if best subsets is used for
model selection. The include/remove settings are in the Stepwise
selection group. The confidence level on the Basics tab is not related
to model selection (as indicated in the help).
You can color variables by significance using the sliders in the output model view coefficients pane, but that is just a display option. As you can see from that pane, the uncolored coefficients are still there in the model. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: "Kornbrot, Diana" <[hidden email]> To: [hidden email], Date: 04/07/2014 09:00 AM Subject: [SPSSX-L] best subset regression Sent by: "SPSSX(r) Discussion" <[hidden email]> how does spss decide on the number of coefficients when fitting best subsets? the number of coefficients, n, used in the model appears to be for all predictors with p < .10, no matter what criterion is used. thus the corrected model fit uses n, even when it has decided that the number of significant predictors is m n-m can vary from 0 to as much as 8 if tough criterion is specified. this seems rather odd to me. please do NOT tell me that best subset, like all forms of 'automatic' variable selection is evil. I know that this is a widely accepted view. however, in some situations it is better [less biassed] than the 'theoretical' opinion, model, of the researcher all help gratefully received best diana ___________ Professor Diana Kornbrot Work University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK +44 (0) 170 728 4626 d.e.kornbrot@... http://dianakornbrot.wordpress.com/ http://go.herts.ac.uk/Diana_Kornbrot skype: kornbrotme Home 19 Elmhurst Avenue London N2 0LT, UK +44 (0) 208 444 2081 |
Administrator
|
I assume this discussion is about the "Automatic Linear Modeling" procedure (ALM)--is that right?
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
Now that I've had time to glance at the FM, I see that the procedure for "Automatic Linear Modeling" is actually called LINEAR, not ALM. Under Build Options, one can select FORWARDSTEPWISE, BESTSUBSETS, or NONE. Here is what the FM says about BESTSUBSETS.
BESTSUBSETS. This checks "all possible" models, or at least a larger subset of the possible models than forward stepwise, to choose the best according to the best subsets criterion. The model with the greatest value of the criterion is chosen as the best model. Note that Best subsets selection is more computationally intensive than forward stepwise selection. When best subsets is performed in conjunction with boosting, bagging, or very large datasets, it can take considerably longer to build than a standard model built using forward stepwise selection. CRITERIA_BEST_SUBSETS. This is the statistic used to choose the "best" model when best subsets selection is used. If MODEL_SELECTION = FORWARDSTEPWISE is not specified, this keyword is ignored. **************** Jon, Rick & other IBM-SPSS folks who may be lurking: That looks like a typo to me in the CRITERIA_BEST_SUBSETS section. I believe it should say, "If MODEL_SELECTION = BESTSUBSETS is not specified, this keyword is ignored." **************** The options for CRITERIA_BEST_SUBSETS are as follows. AICC. Information Criterion (AICC) is based on the likelihood of the data given the model, and is adjusted to penalize overly complex models. ADJUSTEDRSQUARED. Adjusted R-squared is based on the fit of the data, and is adjusted to penalize overly complex models. ASE. Overfit Prevention Criterion (ASE) is based on the fit of the overfit prevention set. The overfit prevention set is a random subsample of approximately 30% of the original dataset that is not used to train the model. Q. Why is ASE short for Overfit Prevention Criterion? ASE is more often short for asymptotic standard error!
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |