|
I'd like help understanding the relationship between the Fixed subcommand,
what the routine does, and the relationship to the underlying model. I've been reading Peugh and Enders paper, which is very nice because of the explicit focus on spss, the Singer and Willet book and a book orieted to sas proc mixed, which doesn't seem to have a Fixed subcommand based on the few examples I've seen (e.g., Singer's article on proc mixed). The two things that Fixed does is control whether significace tests for the named variables and interactions are printed and allow specification of interactions between variables named following the By and With keywords. But is does something else because a) and b) do not give the same log likelihood, intercept estimate, or variance components. a) Mixed y with time/fixed=time/random time | subject(id) covtype(id). b) Mixed y with time/random time | subject(id) covtype(id). Are the underlying equations for level 1 and 2 the same? If not, how are they different? If yes, where does difference in log likelihood come from? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Gene,
The first model includes a time fixed effect whereas the second model does not. You are fitting two different models.
Ryan
On Wed, May 19, 2010 at 11:52 AM, Gene Maguin <[hidden email]> wrote: I'd like help understanding the relationship between the Fixed subcommand, |
|
Ryan,
I'm guessing that you are talking about something other than the fact that one model has the keyword 'fixed' and the other doesn't. I don't know what that is and this is what I'm trying to understand. It seems that the presence/absence of the Fixed subcommand implies different level 1 and 2 equations. Is this true? Gene Maguin >>The first model includes a time fixed effect whereas the second model does not. You are fitting two different models. Ryan On Wed, May 19, 2010 at 11:52 AM, Gene Maguin <[hidden email]> wrote: I'd like help understanding the relationship between the Fixed subcommand, what the routine does, and the relationship to the underlying model. I've been reading Peugh and Enders paper, which is very nice because of the explicit focus on spss, the Singer and Willet book and a book orieted to sas proc mixed, which doesn't seem to have a Fixed subcommand based on the few examples I've seen (e.g., Singer's article on proc mixed). The two things that Fixed does is control whether significace tests for the named variables and interactions are printed and allow specification of interactions between variables named following the By and With keywords. But is does something else because a) and b) do not give the same log likelihood, intercept estimate, or variance components. a) Mixed y with time/fixed=time/random time | subject(id) covtype(id). b) Mixed y with time/random time | subject(id) covtype(id). Are the underlying equations for level 1 and 2 the same? If not, how are they different? If yes, where does difference in log likelihood come from? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
Is this what you're looking for? " The FIXED subcommand specifies the fixed effects in the mixed model. ... • The default model is generated if the FIXED subcommand is omitted or empty. The default model consists of only the intercept term (if included). " (from the MIXED syntax reference) Note that this is different from GLM, which produces a full factorial model with covariates as main effects if you omit the DESIGN subcommand. Alex Reutter |
|
Alex,
So it seems that the following statement would be true. In the model statement fragment Mixed y by x1 x2 with x3 x4/ ... X1 thru x4 are, by definition, Fixed effects and are to be listed on the Fixed subcommand variable list as in Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3 x4/ ... Or, stated another way, every variable listed on the mixed command but the DV is a fixed variable. Is there a counter example where a variable other than the DV is correctly not listed on the fixed subcommand list? I apologize for being kind of a stupid pain in the ass but I'm more used to working with Mplus and an LGM model formulation and I find, have always found, the mixed command, both spss and sas, very opaque. Thanks, Gene Maguin Is this what you're looking for? " The FIXED subcommand specifies the fixed effects in the mixed model. ... . The default model is generated if the FIXED subcommand is omitted or empty. The default model consists of only the intercept term (if included). " (from the MIXED syntax reference) Note that this is different from GLM, which produces a full factorial model with covariates as main effects if you omit the DESIGN subcommand. Alex Reutter ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Gene,
Your model "b)" is set up such that time is a random effect only, so the variables listed in the first line do not have to be fixed effects.
Is this what you're asking?
Ryan On Wed, May 19, 2010 at 2:35 PM, Gene Maguin <[hidden email]> wrote: Alex, |
|
On Wed, May 19, 2010 at 2:35 PM, Gene Maguin <[hidden email]> wrote: > > Is there a counter example where a variable other than the DV is correctly > not listed on the fixed subcommand list? As a simple example that goes for most modeling procedures, say I run Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3 x4. and then want to compare it to a model without x4 as a main effect. I run Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3. in order to maintain a consistent case basis for comparing the models. MIXED performs listwise deletion, so if there are cases that have missing values on x4 but valid values on the other variables, the results from running Mixed y by x1 x2 with x3 /fixed x1 x2 x3. are not really comparable to the first model. I could alternatively run SELECT CASES before running the third MIXED command to remove the cases, but leaving x4 on the main variable list is faster. A variable can also be a random effect without being a fixed effect, so Mixed y by x1 x2 with x3 x4/fixed x1 x3/random x2. is entirely valid. Alex |
|
Administrator
|
Aha...I didn't think of that, and used the SELECT-CASES method when I ran into this problem. Thanks Alex.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Ryan
I have got this set of data where the goal is to build a glm
model with actual weight and height as dependent variables and
questionnaire data (stated weight and height, gender,...) as predictors. When
the modelling is done, I want to use another dataset with the same variables to
make predictions, using the model from the first step. Is there any neat way to
do such predictions? It could naturally be done by setting up a linear
combination using the coefficients from the modelling step in the new dataset,
but it seems a bit awkward. Any suggestions for a simpler solution? A matrix
approach should work, but it doesn't seem that easy to build the needed matrices
from the SPSS output. Am I missing something obvious here?
Robert
**********************
Robert Lundqvist Norrbotten regional council
Lulea
Sweden
Robert Lundqvist
|
|
Hi Robert,
is very good and answers a lot of questions around this subject, with references and practical examples. It is addressed "bestglm:
Best Subject GLM". FWIW, the route of having a training set to develop the model (and this is where the above ref should be helpful) and then of using a different "Validation Set" of data to test that model is standard practice. But there are certain situations where statisticians have worked to assist this.
Within SAS there is the procedure called GLMSELECT that involves "LASSO". This was introduced in SAS/STAT v9.2 (ie recently). I've lifted the following excerpt from
"Model Selection
The GLMSELECT procedure performs effect selection in the framework of general linear models. A variety of model selection methods are available, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et. al (2004). The procedure offers extensive capabilities for customizing the selection with a wide variety of selection and stopping criteria, from traditional and computationally efficient significance-level-based criteria to more computationally intensive validation-based criteria. The procedure also provides graphical summaries of the selection search. PROC GLMSELECT focuses on the standard independently and identically distributed general linear model for univariate responses. It offers great flexibility in the model selection algorithm. PROC GLMSELECT provides results (displayed tables, output data sets, and macro variables) that make it convenient to analyze the selected model in a subsequent procedure such as REG or GLM. PROC GLMSELECT provides:
GLMSELECT is further discussed here:
"
When faced with a predictive modeling problem that has many possible predictor effects—dozens, hundreds, or even thousands—a natural question is “What subset of the effects provides the best model for the data?” Statistical model selection seeks to answer this question, employing a variety of definitions of the “best” model as well as a variety of heuristic procedures for approximating the true but computationally infeasible solution. The GLMSELECT procedure implements statistical model selection in the framework of general linear models. Methods include not only extensions to GLM-type models of methods long familiar in the REG procedure (forward, backward, and stepwise) but also the newer LASSO and LAR methods of Tibshirani (1996) and Efron et al. (2004), respectively."
A copy of the GLMSELECT procedure is available here: http://support.sas.com/rnd/app/papers/glmselect.pdf but it is entitled "Experimental" when GLMSELECT was available as far back as Aug 2005, I believe.
The authors of GLMSELECT were Peter Flom and David Cassell, I think. In this thread, Peter says, "You're right that GLMSELECT is not intended as a final step in building a model. It says so right in the documentation. And, certainly, more data allows more models to be considered. Finally, it's also true that GLMSELECT results in biased estimators, but in the work I've done, the bias has been very small. Of course, that doesn't mean it will always be small..... I've run very few experiments" so I'd better give you that link so you can read about the procedure, "From the horse's mouth." http://www.mathkb.com/Uwe/Forum.aspx/sas/33508/Rules-for-GLMSELECT (You'll see that I initiated the discussion). "Martin:
Sigurd, in another thread on comp.soft-sys.sas, the SAS forum, http://groups.google.com/group/comp.soft-sys.sas/msg/abf8635f4910d88e says, " I'd supplement GLMSELECT with a classification tree (CART, CHAID, TreeNet)
The rest of the thread is useful in understanding the limitations of GLMSELECT and how to use it, and again, Peter Flom has contributed to the thread.
I'm sure I should be recommending Frank Harrell, Jr's book, "Regression Modeling Strategies", Springer.
When I started writing this, I didn't expect it to be so long. In summary, R: bestglm or SAS: (CART or CHAID or TreeNet) followed by GLMSELECT are the options I know about. I hope this helps,
Martin Holt
From: Robert Lundqvist <[hidden email]> To: [hidden email] Sent: Friday, 21 May, 2010 8:45:53 Subject: Predictions from glm modelling? I have got this set of data where the goal is to build a glm model with actual weight and height as dependent variables and questionnaire data (stated weight and height, gender,...) as predictors. When the modelling is done, I want to use another dataset with the same variables to make predictions, using the model from the first step. Is there any neat way to do such predictions? It could naturally be done by setting up a linear combination using the coefficients from the modelling step in the new dataset, but it seems a bit awkward. Any suggestions for a simpler solution? A matrix approach should work, but it doesn't seem that easy to build the needed matrices from the SPSS output. Am I missing something obvious here?
Robert
**********************
Robert Lundqvist Norrbotten regional council
Lulea
Sweden |
|
Administrator
|
In reply to this post by Robert L
1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set. 2. If the outcome variables exist in the new data set, compute copies of the outcome variables, but only for the original data set. 3. Run your model using the copies of the outcome variables, and save the fitted values from the model. By using the copies of the outcome variables, you ensure that only the original data are used for building the model; but fitted values will be saved for all cases in the file. I should add that some procedures (e.g., REGRESSION) may allow you to choose via a sub-command which cases to use in building the model; but I'm not sure if all procedures have this. The method of setting the outcome variables to missing for non-selected cases will always work though.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Robert L
If you have the server (institutional version) many procedures procedures provide for xlm output. In CATREG you can include categorical and continuous predictors use regularization (variable selection). Art Kendall Social Research Consultants Categorical Regression RegularizationCategorical Regression,Categorical
Regression,Categorical Regression
regularization,regularization,regularization
ridge regression,ridge regression,ridge regression
in Categorical Regression,in Categorical
Regression,in Categorical Regression
lasso,lasso,lasso
in Categorical Regression,in Categorical
Regression,in Categorical Regression
elastic net,elastic net,elastic net
in Categorical Regression,in Categorical
Regression,in Categorical Regression
<a id="N3080B_opener"
href="javascript:top.expando('N3080B');" name="N3080B_opener">
Show details<a style="display: none;" id="N3080B_closer"
href="javascript:top.expando('N3080B');" name="N3080B_closer"> Hide details
Regularization dialog box
Method. Regularization methods can improve the predictive error of the model by reducing the variability in the estimates of regression coefficient by shrinking the estimates toward 0. The Lasso and Elastic Net will shrink some coefficient estimates to exactly 0, thus providing a form of variable selection. When a regularization method is requested, the regularized model and coefficients for each penalty coefficient value are written to an external PASW Statistics data file or dataset in the current session. See the topic Categorical Regression Save for more information. • Ridge regression. Ridge regression shrinks coefficients by introducing a penalty term equal to the sum of squared coefficients times a penalty coefficient. This coefficient can range from 0 (no penalty) to 1; the procedure will search for the "best" value of the penalty if you specify a range and increment. • Lasso. The Lasso's penalty term is based on the sum of absolute coefficients, and the specification of a penalty coefficient is similar to that of Ridge regression; however, the Lasso is more computationally intensive. • Elastic net. The Elastic Net simply combines the Lasso and Ridge regression penalties, and will search over the grid of values specified to find the "best" Lasso and Ridge regression penalty coefficients. For a given pair of Lasso and Ridge regression penalties, the Elastic Net is not much more computationally expensive than the Lasso. Display regularization plots. These are plots of the regression coefficients versus the regularization penalty. When searching a range of values for the "best" penalty coefficient, it provides a view of how the regression coefficients change over that range. Elastic Net Plots. For the Elastic Net method, separate regularization plots are produced by values of the Ridge regression penalty. All possible plots uses every value in the range determined by the minimum and maximum Ridge regression penalty values specified. For some Ridge penalties allows you to specify a subset of the values in the range determined by the minimum and maximum. Simply type the number of a penalty value (or specify a range of values) and click Add. On 5/21/2010 3:45 AM, Robert Lundqvist wrote:
Art Kendall
Social Research Consultants |
|
In reply to this post by Bruce Weaver
I think that's spot-on, Bruce, although when I read it, a couple of statements confused me. It could be me, but if not they'll likely confuse the OP.
So I hope you don't mind, but I've made a couple of alterations. If I've got it wrong, I'm sure you'll tell me.
1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set for which you want predictions.
2. If the outcome variables exist in this new combined data set, compute a model using data for which there are outcome variables, but only using instances from the very original data set. Save the fitted values for all of the data (the entire merged file): you can now look at the individual fitted cases from the new data set (which was combined with the very original data set) as these are your predictions for that data set. 3. By using the cases where there are outcome variables in the original data set (and not from the other dataset), you ensure that only the original data are used for building the model; but fitted values will be saved for *all* cases in the file, from which you can read the predicted values for the cases of interest.
I hope I'm not being a pedant. By writing this out I could check my understanding.
Best Wishes,
Martin Holt
From: Bruce Weaver <[hidden email]> To: [hidden email] Sent: Friday, 21 May, 2010 12:31:24 Subject: Re: Predictions from glm modelling? Robert Lundqvist-3 wrote: > > I have got this set of data where the goal is to build a glm model with > actual weight and height as dependent variables and questionnaire data > (stated weight and height, gender,...) as predictors. When the modelling > is done, I want to use another dataset with the same variables to make > predictions, using the model from the first step. Is there any neat way to > do such predictions? It could naturally be done by setting up a linear > combination using the coefficients from the modelling step in the new > dataset, but it seems a bit awkward. Any suggestions for a simpler > solution? A matrix approach should work, but it doesn't seem that easy to > build the needed matrices from the SPSS output. Am I missing something > obvious here? > > Robert > 1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set. 2. If the outcome variables exist in the new data set, compute copies of the outcome variables, but only for the original data set. 3. Run your model using the copies of the outcome variables, and save the fitted values from the model. By using the copies of the outcome variables, you ensure that only the original data are used for building the model; but fitted values will be saved for all cases in the file. I should add that some procedures (e.g., REGRESSION) may allow you to choose via a sub-command which cases to use in building the model; but I'm not sure if all procedures have this. The method of setting the outcome variables to missing for non-selected cases will always work though. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://old.nabble.com/Questions-about-mixed-%281%29-tp28610254p28632623.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Robert L
Robert,
Are you trying to predict weight and height simultaneously? Ryan On Fri, May 21, 2010 at 3:45 AM, Robert Lundqvist <[hidden email]> wrote:
|
|
If the OP is interested in predicting weight and height simultaneously (i.e. two dependent variables), then he might consider fitting a multivariate model. One option would be to use the MIXED procedure. Of course, there are assumptions to fitting such a model. I could provide an example, but I'll hold off until I know if the OP is interested.
Ryan
On Fri, May 21, 2010 at 8:58 AM, Ryan Black <[hidden email]> wrote:
|
|
Administrator
|
In reply to this post by Martin Holt
Hi Martin. Yes, I think you've got it. The key point I was trying to get across was to use ONLY the original data set for creation of the model. It might have been clearer if I'd said to DELETE the outcome variables from the cross-validation set (if they exist) before merging with the original data set. This would ensure that the outcome variables were set to SYSMIS for the validation data set, so they would not be used in building the model.
Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
| Free forum by Nabble | Edit this page |
