SPSSX Discussion

Questions about mixed (1)

Classic

List

Threaded

16 messages Options

Maguin, Eugene

Questions about mixed (1)

I'd like help understanding the relationship between the Fixed subcommand,
what the routine does, and the relationship to the underlying model. I've
been reading Peugh and Enders paper, which is very nice because of the
explicit focus on spss, the Singer and Willet book and a book orieted to sas
proc mixed, which doesn't seem to have a Fixed subcommand based on the few
examples I've seen (e.g., Singer's article on proc mixed).

The two things that Fixed does is control whether significace tests for the
named variables and interactions are printed and allow specification of
interactions between variables named following the By and With keywords. But
is does something else because a) and b) do not give the same log
likelihood, intercept estimate, or variance components.

a) Mixed y with time/fixed=time/random time | subject(id) covtype(id).

b) Mixed y with time/random time | subject(id) covtype(id).

Are the underlying equations for level 1 and 2 the same? If not, how are
they different? If yes, where does difference in log likelihood come from?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Questions about mixed (1)

Gene,

The first model includes a time fixed effect whereas the second model does not. You are fitting two different models.

Ryan

On Wed, May 19, 2010 at 11:52 AM, Gene Maguin <[hidden email]> wrote:

I'd like help understanding the relationship between the Fixed subcommand,
what the routine does, and the relationship to the underlying model. I've
been reading Peugh and Enders paper, which is very nice because of the
explicit focus on spss, the Singer and Willet book and a book orieted to sas
proc mixed, which doesn't seem to have a Fixed subcommand based on the few
examples I've seen (e.g., Singer's article on proc mixed).

The two things that Fixed does is control whether significace tests for the
named variables and interactions are printed and allow specification of
interactions between variables named following the By and With keywords. But
is does something else because a) and b) do not give the same log
likelihood, intercept estimate, or variance components.

a) Mixed y with time/fixed=time/random time | subject(id) covtype(id).

b) Mixed y with time/random time | subject(id) covtype(id).

Are the underlying equations for level 1 and 2 the same? If not, how are
they different? If yes, where does difference in log likelihood come from?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Questions about mixed (1)

Ryan,

I'm guessing that you are talking about something other than the fact that
one model has the keyword 'fixed' and the other doesn't. I don't know what
that is and this is what I'm trying to understand. It seems that the
presence/absence of the Fixed subcommand implies different level 1 and 2
equations. Is this true?

Gene Maguin

>>The first model includes a time fixed effect whereas the second model does
not. You are fitting two different models.

Ryan

On Wed, May 19, 2010 at 11:52 AM, Gene Maguin <[hidden email]> wrote:

I'd like help understanding the relationship between the Fixed
subcommand,
what the routine does, and the relationship to the underlying model.
I've
been reading Peugh and Enders paper, which is very nice because of
the
explicit focus on spss, the Singer and Willet book and a book
orieted to sas
proc mixed, which doesn't seem to have a Fixed subcommand based on
the few
examples I've seen (e.g., Singer's article on proc mixed).

The two things that Fixed does is control whether significace tests
for the
named variables and interactions are printed and allow specification
of
interactions between variables named following the By and With
keywords. But
is does something else because a) and b) do not give the same log
likelihood, intercept estimate, or variance components.

a) Mixed y with time/fixed=time/random time | subject(id)
covtype(id).

b) Mixed y with time/random time | subject(id) covtype(id).

Are the underlying equations for level 1 and 2 the same? If not, how
are
they different? If yes, where does difference in log likelihood come
from?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Alex Reutter

Re: Questions about mixed (1)

In reply to this post by Maguin, Eugene

Is this what you're looking for?

"
The FIXED subcommand specifies the fixed effects in the mixed model.
...
• The default model is generated if the FIXED subcommand is omitted or empty. The default model consists of only the intercept term (if included).
"
(from the MIXED syntax reference)

Note that this is different from GLM, which produces a full factorial model with covariates as main effects if you omit the DESIGN subcommand.

Alex Reutter

Maguin, Eugene

Re: Questions about mixed (1)

Alex,

So it seems that the following statement would be true. In the model
statement fragment

Mixed y by x1 x2 with x3 x4/ ...

X1 thru x4 are, by definition, Fixed effects and are to be listed on the
Fixed subcommand variable list as in

Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3 x4/ ...

Or, stated another way, every variable listed on the mixed command but the
DV is a fixed variable.

Is there a counter example where a variable other than the DV is correctly
not listed on the fixed subcommand list?

I apologize for being kind of a stupid pain in the ass but I'm more used to
working with Mplus and an LGM model formulation and I find, have always
found, the mixed command, both spss and sas, very opaque.

Thanks, Gene Maguin

Is this what you're looking for?

"
The FIXED subcommand specifies the fixed effects in the mixed model.
...
. The default model is generated if the FIXED subcommand is omitted or
empty. The default model consists of only the intercept term (if included).
"
(from the MIXED syntax reference)

Note that this is different from GLM, which produces a full factorial model
with covariates as main effects if you omit the DESIGN subcommand.

Alex Reutter

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Questions about mixed (1)

Gene,

Your model "b)" is set up such that time is a random effect only, so the variables listed in the first line do not have to be fixed effects.

Is this what you're asking?

Ryan

On Wed, May 19, 2010 at 2:35 PM, Gene Maguin <[hidden email]> wrote:

Alex,

So it seems that the following statement would be true. In the model
statement fragment

Mixed y by x1 x2 with x3 x4/ ...

X1 thru x4 are, by definition, Fixed effects and are to be listed on the
Fixed subcommand variable list as in

Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3 x4/ ...

Or, stated another way, every variable listed on the mixed command but the
DV is a fixed variable.

Is there a counter example where a variable other than the DV is correctly
not listed on the fixed subcommand list?

I apologize for being kind of a stupid pain in the ass but I'm more used to
working with Mplus and an LGM model formulation and I find, have always
found, the mixed command, both spss and sas, very opaque.

Thanks, Gene Maguin

Is this what you're looking for?

"
The FIXED subcommand specifies the fixed effects in the mixed model.
...
. The default model is generated if the FIXED subcommand is omitted or
empty. The default model consists of only the intercept term (if included).
"
(from the MIXED syntax reference)

Note that this is different from GLM, which produces a full factorial model
with covariates as main effects if you omit the DESIGN subcommand.

Alex Reutter

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Alex Reutter

Re: Questions about mixed (1)

On Wed, May 19, 2010 at 2:35 PM, Gene Maguin <[hidden email]> wrote:
>
> Is there a counter example where a variable other than the DV is correctly
> not listed on the fixed subcommand list?

As a simple example that goes for most modeling procedures, say I run

Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3 x4.

and then want to compare it to a model without x4 as a main effect. I run

Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3.

in order to maintain a consistent case basis for comparing the models. MIXED performs listwise deletion, so if there are cases that have missing values on x4 but valid values on the other variables, the results from running

Mixed y by x1 x2 with x3 /fixed x1 x2 x3.

are not really comparable to the first model. I could alternatively run SELECT CASES before running the third MIXED command to remove the cases, but leaving x4 on the main variable list is faster.

A variable can also be a random effect without being a fixed effect, so

Mixed y by x1 x2 with x3 x4/fixed x1 x3/random x2.

is entirely valid.

Alex

Bruce Weaver

Re: Questions about mixed (1)

Administrator

Alex Reutter wrote

On Wed, May 19, 2010 at 2:35 PM, Gene Maguin <emaguin@buffalo.edu> wrote:
>
> Is there a counter example where a variable other than the DV is
correctly
> not listed on the fixed subcommand list?

As a simple example that goes for most modeling procedures, say I run

Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3 x4.

and then want to compare it to a model without x4 as a main effect. I run

Mixed y by x1 x2 with x3 x4/fixed x1 x2 x3.

in order to maintain a consistent case basis for comparing the models.
MIXED performs listwise deletion, so if there are cases that have missing
values on x4 but valid values on the other variables, the results from
running

Mixed y by x1 x2 with x3 /fixed x1 x2 x3.

are not really comparable to the first model. I could alternatively run
SELECT CASES before running the third MIXED command to remove the cases,
but leaving x4 on the main variable list is faster.

A variable can also be a random effect without being a fixed effect, so

Mixed y by x1 x2 with x3 x4/fixed x1 x3/random x2.

is entirely valid.

Alex

Aha...I didn't think of that, and used the SELECT-CASES method when I ran into this problem. Thanks Alex.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Robert L

Predictions from glm modelling?

In reply to this post by Ryan

I have got this set of data where the goal is to build a glm model with actual weight and height as dependent variables and questionnaire data (stated weight and height, gender,...) as predictors. When the modelling is done, I want to use another dataset with the same variables to make predictions, using the model from the first step. Is there any neat way to do such predictions? It could naturally be done by setting up a linear combination using the coefficients from the modelling step in the new dataset, but it seems a bit awkward. Any suggestions for a simpler solution? A matrix approach should work, but it doesn't seem that easy to build the needed matrices from the SPSS output. Am I missing something obvious here?

Robert

**********************
Robert Lundqvist

Norrbotten regional council

Lulea
Sweden

Robert Lundqvist

Martin Holt

Re: Predictions from glm modelling?

Hi Robert,

http://cran.r-project.org/web/packages/bestglm/vignettes/bestglm.pdf

is very good and answers a lot of questions around this subject, with references and practical examples. It is addressed "bestglm:

Best Subject GLM". FWIW, the route of having a training set to develop the model (and this is where the above ref should be helpful) and then of using a different "Validation Set" of data to test that model is standard practice. But there are certain situations where statisticians have worked to assist this.

Within SAS there is the procedure called GLMSELECT that involves "LASSO". This was introduced in SAS/STAT v9.2 (ie recently). I've lifted the following excerpt from

http://support.sas.com/rnd/app/da/new/dastat92.html

"Model Selection

The GLMSELECT procedure performs effect selection in the framework of general linear models. A variety of model selection methods are available, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et. al (2004). The procedure offers extensive capabilities for customizing the selection with a wide variety of selection and stopping criteria, from traditional and computationally efficient significance-level-based criteria to more computationally intensive validation-based criteria. The procedure also provides graphical summaries of the selection search.

PROC GLMSELECT focuses on the standard independently and identically distributed general linear model for univariate responses. It offers great flexibility in the model selection algorithm. PROC GLMSELECT provides results (displayed tables, output data sets, and macro variables) that make it convenient to analyze the selected model in a subsequent procedure such as REG or GLM. PROC GLMSELECT provides:

Forward, backward, and stepwise selection
Least angle regression (LAR) and LASSO
Selection based on information criteria and predictive performance
Models with crossed and nested effects
Selection from very large number of effects (tens of thousands)
Internal partitioning of data into training, validation, and testing roles"

GLMSELECT is further discussed here:

http://www2.sas.com/proceedings/sugi31/207-31.pdf

When faced with a predictive modeling problem that has many possible predictor effects—dozens, hundreds,

or even thousands—a natural question is “What subset of the effects provides the best model for

the data?” Statistical model selection seeks to answer this question, employing a variety of definitions of

the “best” model as well as a variety of heuristic procedures for approximating the true but computationally

infeasible solution. The GLMSELECT procedure implements statistical model selection in the framework of

general linear models. Methods include not only extensions to GLM-type models of methods long familiar

in the REG procedure (forward, backward, and stepwise) but also the newer LASSO and LAR methods of

Tibshirani (1996) and Efron et al. (2004), respectively."

A copy of the GLMSELECT procedure is available here:

http://support.sas.com/rnd/app/papers/glmselect.pdf

but it is entitled "Experimental" when GLMSELECT was available as far back as Aug 2005, I believe.

The authors of GLMSELECT were Peter Flom and David Cassell, I think. In this thread, Peter says, "You're right that GLMSELECT is not intended as a final step in building a model. It says so right in the documentation. And, certainly, more data allows more models to be considered. Finally, it's also true that GLMSELECT results in biased estimators, but in the work I've done, the bias has been very small. Of course, that doesn't mean it will always be small..... I've run very few experiments" so I'd better give you that link so you can read about the procedure, "From the horse's mouth."

http://www.mathkb.com/Uwe/Forum.aspx/sas/33508/Rules-for-GLMSELECT

(You'll see that I initiated the discussion).
Sigurd Hermansen, in that thread, says,

"Martin:
Since GLMSELECT should be used only for exploratory modelling, and
whatever model you select should be estimated using PROC LOGISTIC, PROC
GENMOD, PROC MIXED, or another regression procedure, the same rules
should apply whether or not you are using PROC GLMSELECT to help specify
a model. I do think that automated exploration of predictive model
specifications would require substantially more observations than
specification of a model for the purpose of testing a specific
hypothesis.
S

Sigurd, in another thread on comp.soft-sys.sas, the SAS forum, http://groups.google.com/group/comp.soft-sys.sas/msg/abf8635f4910d88e

says, " I'd supplement GLMSELECT with a classification tree (CART, CHAID, TreeNet)
program during the specification search phase and take a close look at
prediction errors (concordance c statistic) and their distributions
across bootstrap samples. An omitted covariate that appears toward the
top of a decision tree or a somewhat low c statistic (<0.8) could mean
that GLMSELECT or whatever one is using to reduce model complexity has
excluded something important."

The rest of the thread is useful in understanding the limitations of GLMSELECT and how to use it, and again, Peter Flom has contributed to the thread.

I'm sure I should be recommending Frank Harrell, Jr's book, "Regression Modeling Strategies", Springer.

When I started writing this, I didn't expect it to be so long. In summary, R: bestglm or SAS: (CART or CHAID or TreeNet) followed by GLMSELECT are the options I know about. I hope this helps,

Martin Holt

From: Robert Lundqvist <[hidden email]>
To: [hidden email]
Sent: Friday, 21 May, 2010 8:45:53
Subject: Predictions from glm modelling?

Robert

**********************
Robert Lundqvist

Norrbotten regional council

Lulea
Sweden

Bruce Weaver

Re: Predictions from glm modelling?

Administrator

In reply to this post by Robert L

Robert Lundqvist-3 wrote

I have got this set of data where the goal is to build a glm model with actual weight and height as dependent variables and questionnaire data (stated weight and height, gender,...) as predictors. When the modelling is done, I want to use another dataset with the same variables to make predictions, using the model from the first step. Is there any neat way to do such predictions? It could naturally be done by setting up a linear combination using the coefficients from the modelling step in the new dataset, but it seems a bit awkward. Any suggestions for a simpler solution? A matrix approach should work, but it doesn't seem that easy to build the needed matrices from the SPSS output. Am I missing something obvious here?

Robert

Art Kendall

Re: Predictions from glm modelling?

In reply to this post by Robert L

Categorical Regression Regularization <script type="text/javascript" src="ms-its:mainhelp.chm::/chmhelp.js"></script>If you have the client (regular) version of SPSS, the old-fashioned way of using a variable to indicate training vs target cases all in one file works with many procedures. Many procedures have a built in subcommand e.g. in DISCRIMINANT some cases are "ungrouped" in CATREG some cases are "supplementary" I think that in GLM you would give cases a REGWGT OF zero, you could try it.
If you have the server (institutional version) many procedures procedures provide for xlm output.
In CATREG you can include categorical and continuous predictors use regularization (variable selection).

Art Kendall
Social Research Consultants

Categorical Regression Regularization

Categorical Regression,Categorical Regression,Categorical Regression

regularization,regularization,regularization

ridge regression,ridge regression,ridge regression

in Categorical Regression,in Categorical Regression,in Categorical Regression

lasso,lasso,lasso

in Categorical Regression,in Categorical Regression,in Categorical Regression

elastic net,elastic net,elastic net

in Categorical Regression,in Categorical Regression,in Categorical Regression

<a id="N3080B_opener" href="javascript:top.expando('N3080B');" name="N3080B_opener"> Show details<a style="display: none;" id="N3080B_closer" href="javascript:top.expando('N3080B');" name="N3080B_closer"> Hide details

Method. Regularization methods can improve the predictive error of the model by reducing the variability in the estimates of regression coefficient by shrinking the estimates toward 0. The Lasso and Elastic Net will shrink some coefficient estimates to exactly 0, thus providing a form of variable selection. When a regularization method is requested, the regularized model and coefficients for each penalty coefficient value are written to an external PASW Statistics data file or dataset in the current session. See the topic Categorical Regression Save for more information.

• Ridge regression. Ridge regression shrinks coefficients by introducing a penalty term equal to the sum of squared coefficients times a penalty coefficient. This coefficient can range from 0 (no penalty) to 1; the procedure will search for the "best" value of the penalty if you specify a range and increment.

• Lasso. The Lasso's penalty term is based on the sum of absolute coefficients, and the specification of a penalty coefficient is similar to that of Ridge regression; however, the Lasso is more computationally intensive.

• Elastic net. The Elastic Net simply combines the Lasso and Ridge regression penalties, and will search over the grid of values specified to find the "best" Lasso and Ridge regression penalty coefficients. For a given pair of Lasso and Ridge regression penalties, the Elastic Net is not much more computationally expensive than the Lasso.

Display regularization plots. These are plots of the regression coefficients versus the regularization penalty. When searching a range of values for the "best" penalty coefficient, it provides a view of how the regression coefficients change over that range.

Elastic Net Plots. For the Elastic Net method, separate regularization plots are produced by values of the Ridge regression penalty. All possible plots uses every value in the range determined by the minimum and maximum Ridge regression penalty values specified. For some Ridge penalties allows you to specify a subset of the values in the range determined by the minimum and maximum. Simply type the number of a penalty value (or specify a range of values) and click Add.

On 5/21/2010 3:45 AM, Robert Lundqvist wrote:

I have got this set of data where the goal is to build a glm model with actual weight and height as dependent variables and questionnaire data (stated weight and height, gender,...) as predictors. When the modelling is done, I want to use another dataset with the same variables to make predictions, using the model from the first step. Is there any neat way to do such predictions? It could naturally be done by setting up a linear combination using the coefficients from the modelling step in the new dataset, but it seems a bit awkward. Any suggestions for a simpler solution? A matrix approach should work, but it doesn't seem that easy to build the needed matrices from the SPSS output. Am I missing something obvious here?

Robert

**********************
Robert Lundqvist

Norrbotten regional council

Lulea
Sweden

Art Kendall
Social Research Consultants

Martin Holt

Re: Predictions from glm modelling?

In reply to this post by Bruce Weaver

I think that's spot-on, Bruce, although when I read it, a couple of statements confused me. It could be me, but if not they'll likely confuse the OP.

So I hope you don't mind, but I've made a couple of alterations. If I've got it wrong, I'm sure you'll tell me.

1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set for which you want predictions.

2. If the outcome variables exist in this new combined data set, compute a model using data for which there are outcome variables, but only using instances from the very original data set. Save the fitted values for all of the data (the entire merged file): you can now look at the individual fitted cases from the new data set (which was combined with the very original data set) as these are your predictions for that data set.

3. By using the cases where there are outcome variables in the original data set (and not from the other dataset), you ensure that only the original data are used for building the model;

but fitted values will be saved for *all* cases in the file, from which you can read the predicted values for the cases of interest.

I hope I'm not being a pedant. By writing this out I could check my understanding.

Best Wishes,

Martin Holt

From: Bruce Weaver <[hidden email]>
To: [hidden email]
Sent: Friday, 21 May, 2010 12:31:24
Subject: Re: Predictions from glm modelling?

Robert Lundqvist-3 wrote:

>
> I have got this set of data where the goal is to build a glm model with
> actual weight and height as dependent variables and questionnaire data
> (stated weight and height, gender,...) as predictors. When the modelling
> is done, I want to use another dataset with the same variables to make
> predictions, using the model from the first step. Is there any neat way to
> do such predictions? It could naturally be done by setting up a linear
> combination using the coefficients from the modelling step in the new
> dataset, but it seems a bit awkward. Any suggestions for a simpler
> solution? A matrix approach should work, but it doesn't seem that easy to
> build the needed matrices from the SPSS output. Am I missing something
> obvious here?
>
> Robert
>

1. Merge (via ADD FILES) the original data set with the new
(cross-validation) data set.

2. If the outcome variables exist in the new data set, compute copies of the
outcome variables, but only for the original data set.

3. Run your model using the copies of the outcome variables, and save the
fitted values from the model. By using the copies of the outcome variables,
you ensure that only the original data are used for building the model; but
fitted values will be saved for all cases in the file.

I should add that some procedures (e.g., REGRESSION) may allow you to choose
via a sub-command which cases to use in building the model; but I'm not sure
if all procedures have this. The method of setting the outcome variables to
missing for non-selected cases will always work though.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context: http://old.nabble.com/Questions-about-mixed-%281%29-tp28610254p28632623.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Predictions from glm modelling?

In reply to this post by Robert L

Robert,

Are you trying to predict weight and height simultaneously?

Ryan

On Fri, May 21, 2010 at 3:45 AM, Robert Lundqvist <[hidden email]> wrote:

I have got this set of data where the goal is to build a glm model with actual weight and height as dependent variables and questionnaire data (stated weight and height, gender,...) as predictors. When the modelling is done, I want to use another dataset with the same variables to make predictions, using the model from the first step. Is there any neat way to do such predictions? It could naturally be done by setting up a linear combination using the coefficients from the modelling step in the new dataset, but it seems a bit awkward. Any suggestions for a simpler solution? A matrix approach should work, but it doesn't seem that easy to build the needed matrices from the SPSS output. Am I missing something obvious here?

Robert

**********************
Robert Lundqvist

Norrbotten regional council

Lulea
Sweden

Ryan

Re: Predictions from glm modelling?

If the OP is interested in predicting weight and height simultaneously (i.e. two dependent variables), then he might consider fitting a multivariate model. One option would be to use the MIXED procedure. Of course, there are assumptions to fitting such a model. I could provide an example, but I'll hold off until I know if the OP is interested.

Ryan

On Fri, May 21, 2010 at 8:58 AM, Ryan Black <[hidden email]> wrote:

Robert,

Are you trying to predict weight and height simultaneously?

Ryan

On Fri, May 21, 2010 at 3:45 AM, Robert Lundqvist <[hidden email]> wrote:

I have got this set of data where the goal is to build a glm model with actual weight and height as dependent variables and questionnaire data (stated weight and height, gender,...) as predictors. When the modelling is done, I want to use another dataset with the same variables to make predictions, using the model from the first step. Is there any neat way to do such predictions? It could naturally be done by setting up a linear combination using the coefficients from the modelling step in the new dataset, but it seems a bit awkward. Any suggestions for a simpler solution? A matrix approach should work, but it doesn't seem that easy to build the needed matrices from the SPSS output. Am I missing something obvious here?

Robert

**********************
Robert Lundqvist

Norrbotten regional council

Lulea
Sweden

Bruce Weaver

Re: Predictions from glm modelling?

Administrator

In reply to this post by Martin Holt

Hi Martin. Yes, I think you've got it. The key point I was trying to get across was to use ONLY the original data set for creation of the model. It might have been clearer if I'd said to DELETE the outcome variables from the cross-validation set (if they exist) before merging with the original data set. This would ensure that the outcome variables were set to SYSMIS for the validation data set, so they would not be used in building the model.

Cheers,
Bruce

Martin Holt wrote

I think that's spot-on, Bruce, although when I read it, a couple of statements confused me. It could be me, but if not they'll likely confuse the OP.

So I hope you don't mind, but I've made a couple of alterations. If I've got it wrong, I'm sure you'll tell me.

1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set for which you want predictions.

2. If the outcome variables exist in this new combined data set, compute a model using data for which there are outcome variables, but only using instances from the very original data set. Save the fitted values for all of the data (the entire merged file): you can now look at the individual fitted cases from the new data set (which was combined with the very original data set) as these are your predictions for that data set.

3. By using the cases where there are outcome variables in the original data set (and not from the other dataset), you ensure that only the original data are used for building the model;
but fitted values will be saved for *all* cases in the file, from which you can read the predicted values for the cases of interest.

I hope I'm not being a pedant. By writing this out I could check my understanding.

Best Wishes,

Martin Holt

________________________________
From: Bruce Weaver <bruce.weaver@hotmail.com>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Friday, 21 May, 2010 12:31:24
Subject: Re: Predictions from glm modelling?

Robert Lundqvist-3 wrote:
>
> I have got this set of data where the goal is to build a glm model with
> actual weight and height as dependent variables and questionnaire data
> (stated weight and height, gender,...) as predictors. When the modelling
> is done, I want to use another dataset with the same variables to make
> predictions, using the model from the first step. Is there any neat way to
> do such predictions? It could naturally be done by setting up a linear
> combination using the coefficients from the modelling step in the new
> dataset, but it seems a bit awkward. Any suggestions for a simpler
> solution? A matrix approach should work, but it doesn't seem that easy to
> build the needed matrices from the SPSS output. Am I missing something
> obvious here?
>
> Robert
>

1. Merge (via ADD FILES) the original data set with the new
(cross-validation) data set.

2. If the outcome variables exist in the new data set, compute copies of the
outcome variables, but only for the original data set.

3. Run your model using the copies of the outcome variables, and save the
fitted values from the model. By using the copies of the outcome variables,
you ensure that only the original data are used for building the model; but
fitted values will be saved for all cases in the file.

I should add that some procedures (e.g., REGRESSION) may allow you to choose
via a sub-command which cases to use in building the model; but I'm not sure
if all procedures have this. The method of setting the outcome variables to
missing for non-selected cases will always work though.

-----
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context: http://old.nabble.com/Questions-about-mixed-%281%29-tp28610254p28632623.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD