More information about SPSS regression coef & effect size

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

More information about SPSS regression coef & effect size

Marsha and Mike SZYMCZUK
Thanks to those who responded.

Below is a bit more about the issue and one last question.

From a prior work, the grad student's dissertation chair provided this guidance...

For all individual predictions w stat sig,

    1. estimate the effect size for dichotomous variables - un-standardized reg coef divided by the pooled standard dev of the depended variable.  

    2. estimate the effect size for continuous variables - the effect size estimate is the standardized regression coefficient (beta) for example pre-test scores.

The work did not specify variable order but used the above process for each sig variable.

The first computation of effect size is straight forward.

But, I cannot find a reference to the second method.

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: More information about SPSS regression coef & effect size

Matthew Pirritano
Hey,

So I'm doing a longitudinal analysis and I've discovered that I have a robust curvilinear relationship. I've transformed my dependent and independent variables using the natural log. In order to do so I had to add 1 to my IVs and DVs since some of the had zero for a possible value.

My question is, is this kosher, just arbitrarily adding 1 to the scores to make it possible to do the log transformation? And how can I do a similar tranformation for my mean centered covariates, that obviously have lots of less than zero values? Can I just do something like add the lowest possible value to all values to shift them all up onto the positive side?

Any help, much appreciated.

Thanks
Matt

Matthew Pirritano, Ph.D.

Email: [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Log transformation for regression

Richard Ristow
I've changed the subject line, to reflect that this is a new thread,
not a continuation of "More information about SPSS regression coef &
effect size".

At 08:43 PM 9/28/2008, Matthew Pirritano wrote:

>So I'm doing a longitudinal analysis and I've discovered that I have
>a robust curvilinear relationship. I've transformed my dependent and
>independent variables using the natural log. In order to do so I had
>to add 1 to my IVs and DVs since some of the had zero for a possible value.
>
>My question is, is this kosher?

You have to be very careful with it, at least.

If you log-transform both independents and dependents, you're fitting
a multiplicative model, i.e. one of the form

Y=X1**p1 * X**p2 * X3**p3 ...

>just arbitrarily adding 1 to the scores to make it possible to do
>the log transformation?

Well, what do you think it means for your model?

You do have to watch the log transformation for these models, because
the transformed model gives disproportionate influence to those
observations that are close to 0. (Look at the shape of the log
function.) I'm not sure of best practice, but if your data is subject
to >additive< uncertainty, as is common, you may need to drop the
smallest values from analysis, or down-weight them heavily, because
they have very large >multiplicative< uncertainty, or uncertainty in
the logs of their values.

Adding 1 is probably not the right answer. Dropping the cases with 0
values from analysis is probably more defensible; but the effect of
this depends on your data. (How many cases would thereby be lost?
What is the remaining dynamic range of your data, i.e. the ratio of
largest to smallest values?)

>And how can I do a similar transformation for my mean centered
>covariates, that obviously have lots of less than zero values?

Don't use mean-centered covariates in a multiplicative model.
Mean-centering is appropriate when you're considering additive
effects, but not multiplicative ones.

For multiplicative models, use geometric mean centering: Divide all
observed values by the geometric mean of the data.

I'm sure that others will have points to add, corrections to make, or both.

-Good luck,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Log transformation for regression

Matthew Pirritano
Sorry about that I always think I changed the subject but sometimes I seem to forget.

So Richard, thanks for the advice. Can you give me any references to back up the use of geometric mean centering or does it just naturally follow from the logarithmic transformation. I think I misunderstood what I was doing, I thought I was just adding a curvilinear trend to the model with out having to lose df by adding the linear, and the quadratic terms.

I think I have to add one to the variables. I can't see how it would make any difference. The data are from a questionnaire on a 1 to 5 scale. Wouldn't this just mean that you had to subtract one form the unstandardized betas to make a correct interpretation of the data?

Again, this a multilevel longitudinal model so unstandardized betas are what we'll be interpreting. 

Hope that all makes sense. Please clear the mud in my water if you can.

Thanks
Matt

Matthew Pirritano, Ph.D.

Email: [hidden email]

--- On Sun, 9/28/08, Richard Ristow <[hidden email]> wrote:
From: Richard Ristow <[hidden email]>
Subject: Re: Log transformation for regression
To: [hidden email], [hidden email]
Date: Sunday, September 28, 2008, 7:20 PM

I've changed the subject line, to reflect that this is a new thread,
not a continuation of "More information about SPSS regression coef &
effect size".

At 08:43 PM 9/28/2008, Matthew Pirritano wrote:

>So I'm doing a longitudinal analysis and I've discovered that I
have
>a robust curvilinear relationship. I've transformed my dependent and
>independent variables using the natural log. In order to do so I had
>to add 1 to my IVs and DVs since some of the had zero for a possible value.
>
>My question is, is this kosher?

You have to be very careful with it, at least.

If you log-transform both independents and dependents, you're fitting
a multiplicative model, i.e. one of the form

Y=X1**p1 * X**p2 * X3**p3 ...

>just arbitrarily adding 1 to the scores to make it possible to do
>the log transformation?

Well, what do you think it means for your model?

You do have to watch the log transformation for these models, because
the transformed model gives disproportionate influence to those
observations that are close to 0. (Look at the shape of the log
function.) I'm not sure of best practice, but if your data is subject
to >additive< uncertainty, as is common, you may need to drop the
smallest values from analysis, or down-weight them heavily, because
they have very large >multiplicative< uncertainty, or uncertainty in
the logs of their values.

Adding 1 is probably not the right answer. Dropping the cases with 0
values from analysis is probably more defensible; but the effect of
this depends on your data. (How many cases would thereby be lost?
What is the remaining dynamic range of your data, i.e. the ratio of
largest to smallest values?)

>And how can I do a similar transformation for my mean centered
>covariates, that obviously have lots of less than zero values?

Don't use mean-centered covariates in a multiplicative model.
Mean-centering is appropriate when you're considering additive
effects, but not multiplicative ones.

For multiplicative models, use geometric mean centering: Divide all
observed values by the geometric mean of the data.

I'm sure that others will have points to add, corrections to make, or both.

-Good luck,
  Richard

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Log transformation for regression

Richard Ristow
In reply to this post by Richard Ristow
At 10:46 PM 9/28/2008, Matthew Pirritano wrote:

>Can you give me any references to back up the use of geometric mean
>centering or does it just naturally follow from the logarithmic transformation.

I was just taking it as naturally following from the log transform.

>I think I have to add one to the variables. I can't see how it would
>make any difference. The data are from a questionnaire on a 1 to 5
>scale. Wouldn't this just mean that you had to subtract one from the
>unstandardized betas to make a correct interpretation of the data?

No. The coefficients (betas) in the log-transformed model are not
comparable to those in the linear model, because the models are different.

If you estimate a linear model, you have

Y = B1*X1  + B2*X2  + B3*X3  ...

If you estimate after a log transform (and this is what I was
stressing), you have

Y = X1**B1 * X2**B2 + X3**B3 ...

Now: Do you believe this model? It seems an unlikely one, for your data:

>The data are from a questionnaire on a 1 to 5 scale.

So, your dependent variable should be, in some sense, the >product<
of the questionnaire responses? It's hard for me to think of a
theoretical reason for this.

>Again, this a multilevel longitudinal model so unstandardized betas
>are what we'll be interpreting.

Still more reason >not< to use the log-transformed model. What you
get out of it, is not the unstandardized betas you're looking for.

>I think I misunderstood what I was doing, I thought I was just
>adding a curvilinear trend to the model with out having to lose df
>by adding the linear, and the quadratic terms.

Aye, there's the rub. The tactic I'd think of in your situation is
just that: estimate a quadratic model. And the problem is exactly
what you said: it consumes degrees of freedom copiously.

 From here on, everything depends on your problem and your model: How
many cases do you have? How many questions, and what are your
independent variables -- are you using the individual questions? And
what's the evidence of a non-linear response?

-Good luck,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Log transformation for regression

mpirritano
Thanks for all of your help Richard.

The model is longitudinal. Baseline to 5 years of data.

Looking at the use of different coping strategies for men and women who
are undergoing infertility treatments.

The questions have already been combined into composites. There are
about 150 couples that we have data for at 5 years, starting with about
1000. Using the log transformed variables drastically reduces the
chi-square for the model from over 3000 to around 1200.  And I know that
these types of developmental models often follow such a trend. That the
log transformed relationship is more likely than a quadratic model.

I'm talking to my collaborator tonight. I have prepared all of the
logarithmic models. Maybe it makes sense to just forget about
interpreting the betas and just go with the effects? Does this seem
defensible?

I suppose I could go back to the quadratic model, maybe I just got lured
in by the ln model, and the quadratic wasn't that bad. Oh, now I recall,
the really unpleasant thing about the quadratic model is how to create
quadratic interaction terms! Don't you need to include the linear
interaction (X times Covariate) and the quadratic version (Xsquared time
Covariate) in order to look at the curvilinear effect of the quadratic
interaction term.

Thanks for all of the help.

Thanks
Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Orange County Health Care Agency
(714) 834-6011

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: Monday, September 29, 2008 2:03 PM
To: [hidden email]
Subject: Re: Log transformation for regression

At 10:46 PM 9/28/2008, Matthew Pirritano wrote:

>Can you give me any references to back up the use of geometric mean
>centering or does it just naturally follow from the logarithmic
transformation.

I was just taking it as naturally following from the log transform.

>I think I have to add one to the variables. I can't see how it would
>make any difference. The data are from a questionnaire on a 1 to 5
>scale. Wouldn't this just mean that you had to subtract one from the
>unstandardized betas to make a correct interpretation of the data?

No. The coefficients (betas) in the log-transformed model are not
comparable to those in the linear model, because the models are
different.

If you estimate a linear model, you have

Y = B1*X1  + B2*X2  + B3*X3  ...

If you estimate after a log transform (and this is what I was
stressing), you have

Y = X1**B1 * X2**B2 + X3**B3 ...

Now: Do you believe this model? It seems an unlikely one, for your data:

>The data are from a questionnaire on a 1 to 5 scale.

So, your dependent variable should be, in some sense, the >product<
of the questionnaire responses? It's hard for me to think of a
theoretical reason for this.

>Again, this a multilevel longitudinal model so unstandardized betas
>are what we'll be interpreting.

Still more reason >not< to use the log-transformed model. What you
get out of it, is not the unstandardized betas you're looking for.

>I think I misunderstood what I was doing, I thought I was just
>adding a curvilinear trend to the model with out having to lose df
>by adding the linear, and the quadratic terms.

Aye, there's the rub. The tactic I'd think of in your situation is
just that: estimate a quadratic model. And the problem is exactly
what you said: it consumes degrees of freedom copiously.

 From here on, everything depends on your problem and your model: How
many cases do you have? How many questions, and what are your
independent variables -- are you using the individual questions? And
what's the evidence of a non-linear response?

-Good luck,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Log transformation for regression

Richard Ristow
At 07:54 PM 9/29/2008, Pirritano, Matthew wrote:

>The model is longitudinal. Baseline to 5 years of data. There are
>about 150 couples that we have data for at 5 years, starting with about 1000.

That's a repeated-measures study, unbalanced since you have varying
numbers of observations per subject. The proper generalization of
regression is a mixed-effects model, for which you need the MIXED
module in SPSS.

>Looking at the use of different coping strategies for men and women
>who are undergoing infertility treatments. The questions have
>already been combined into composites.

Is, then, the questionnaire intended to elicit the coping strategies
used, with composites for each of the coping strategies, as you
understand them?

>Using the log transformed variables drastically reduces the
>chi-square for the model from over 3000 to around 1200.  And I know
>that these types of developmental models often follow such a trend.
>That the log transformed relationship is more likely than a quadratic model.

At the point you, the subject specialist, know more than I, a
methodologist; especially, a methodologist who doesn't know the
study, nor anything about the subject. (I'd never work that blind, if
I had more responsibility than as a source on the list.)

You haven't even said what is your dependent variable or variables,
nor what are your independents.

>I'm talking to my collaborator tonight. I have prepared all of the
>logarithmic models. Maybe it makes sense to just forget about
>interpreting the betas and just go with the effects?

To emphasize again: Whatever transformation you carry out, implies a
certain form of model. Make sure you understand what that model is;
can describe it in your publication; and can argue that it is
theoretically reasonable.

>The log transformed relationship is more likely than a quadratic model.

Good; so you do have theoretical support.

Do think about >what< you are transforming. Log-transforming the
dependent variable, only, implies an exponential-growth model
(skipping over independent variables other than time). Such models
are often appropriate.

But, log-transforming independent >and< dependent variables implies a
product-of-powers model. Those are much less common. If you can argue
that it is appropriate in your case, go ahead. See a statistical
consultant, about precautions to estimate such models accurately.
There are pitfalls more subtle than dealing with 0 values.

>I suppose I could go back to the quadratic model, maybe I just got
>lured in by the ln model, and the quadratic wasn't that bad.

Again, if you have something like an exponential-growth model, fine.
If you have a product-of-powers model, fine, >if you have justification<.

>The really unpleasant thing about the quadratic model is how to
>create quadratic interaction terms! Don't you need to include the
>linear interaction (X times Covariate) and the quadratic version
>(Xsquared time Covariate) in order to look at the curvilinear effect
>of the quadratic interaction term.

It sounds like you're distinguishing two classes of independent
variables: whatever X is, and the covariates. I've been writing
without that distinction.

If X is a dependent variable of the first kind, and C is a covariate,
the quadratic terms are X**2; C**2; and X*C. Not X**2*C, which is a
third-order term. (Add the exponents of all the factors.)

But you don't have to estimate a >saturated< quadratic model, with
all second-order terms. You can include quadratic terms for only
those variables where you expect a curvilinear effect; and
cross-terms ('interaction terms') for only those pairs of variables
where you expect an interaction.

And here, your knowledge of your study must take over.

-Onward, in peace,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Log transformation for regression

mpirritano
Thanks so much Richard! Believe it or not, without evening knowing my
variables you've totally clarified things for me. I want to only log
transform my dv which is distress (actually there are three types of
distress, marital, personal, and social, with their own individual
analyses, not a multivariate DV). Yes, the questionnaires are intended
to elicit the coping strategies. That's a whole other can of worms -- an
invariance test for the validity of the measures across years and across
gender! Right now it's just assumed. Believe it or not this analysis is
not my day job!

And I am doing a mixed analysis, it's actually a dyadic analysis using
male female couple variables as repeated measures. I've adapted the
model from Kenny, Kashy, and Campbell 2006, as well as Kashy and
Donnellan, 2008.

Thanks for all the advice. My collaborator who's tenure is awaitin' will
be much relieved to know I've settled on a model.

Thanks
Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Orange County Health Care Agency
(714) 834-6011

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Tuesday, September 30, 2008 1:18 PM
To: Pirritano, Matthew; [hidden email]
Subject: Re: Log transformation for regression

At 07:54 PM 9/29/2008, Pirritano, Matthew wrote:

>The model is longitudinal. Baseline to 5 years of data. There are
>about 150 couples that we have data for at 5 years, starting with about
1000.

That's a repeated-measures study, unbalanced since you have varying
numbers of observations per subject. The proper generalization of
regression is a mixed-effects model, for which you need the MIXED
module in SPSS.

>Looking at the use of different coping strategies for men and women
>who are undergoing infertility treatments. The questions have
>already been combined into composites.

Is, then, the questionnaire intended to elicit the coping strategies
used, with composites for each of the coping strategies, as you
understand them?

>Using the log transformed variables drastically reduces the
>chi-square for the model from over 3000 to around 1200.  And I know
>that these types of developmental models often follow such a trend.
>That the log transformed relationship is more likely than a quadratic
model.

At the point you, the subject specialist, know more than I, a
methodologist; especially, a methodologist who doesn't know the
study, nor anything about the subject. (I'd never work that blind, if
I had more responsibility than as a source on the list.)

You haven't even said what is your dependent variable or variables,
nor what are your independents.

>I'm talking to my collaborator tonight. I have prepared all of the
>logarithmic models. Maybe it makes sense to just forget about
>interpreting the betas and just go with the effects?

To emphasize again: Whatever transformation you carry out, implies a
certain form of model. Make sure you understand what that model is;
can describe it in your publication; and can argue that it is
theoretically reasonable.

>The log transformed relationship is more likely than a quadratic model.

Good; so you do have theoretical support.

Do think about >what< you are transforming. Log-transforming the
dependent variable, only, implies an exponential-growth model
(skipping over independent variables other than time). Such models
are often appropriate.

But, log-transforming independent >and< dependent variables implies a
product-of-powers model. Those are much less common. If you can argue
that it is appropriate in your case, go ahead. See a statistical
consultant, about precautions to estimate such models accurately.
There are pitfalls more subtle than dealing with 0 values.

>I suppose I could go back to the quadratic model, maybe I just got
>lured in by the ln model, and the quadratic wasn't that bad.

Again, if you have something like an exponential-growth model, fine.
If you have a product-of-powers model, fine, >if you have
justification<.

>The really unpleasant thing about the quadratic model is how to
>create quadratic interaction terms! Don't you need to include the
>linear interaction (X times Covariate) and the quadratic version
>(Xsquared time Covariate) in order to look at the curvilinear effect
>of the quadratic interaction term.

It sounds like you're distinguishing two classes of independent
variables: whatever X is, and the covariates. I've been writing
without that distinction.

If X is a dependent variable of the first kind, and C is a covariate,
the quadratic terms are X**2; C**2; and X*C. Not X**2*C, which is a
third-order term. (Add the exponents of all the factors.)

But you don't have to estimate a >saturated< quadratic model, with
all second-order terms. You can include quadratic terms for only
those variables where you expect a curvilinear effect; and
cross-terms ('interaction terms') for only those pairs of variables
where you expect an interaction.

And here, your knowledge of your study must take over.

-Onward, in peace,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD