Dummy vs Continuous predictors in regression

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Dummy vs Continuous predictors in regression

paul wilson-7
Hi,

I'm running a linear regression model using cathegorical dummy (0/1 coding) as well as continuous predicotrs.
My questing is regarding interpretation of standardized beta coefficients and using them for comparisons between dummies and continuous
predictors' effects. I understand the standardized beta values tell us the number of standard deviations the outcome will change as a result of one standard deviation change in predictor. Now, is this applicable to cathegorical/dummy variables as well?
My cathegorical and  continuous predictors are measured in different units so standardized beta would be great for comparing their
impact on the outcome, however I'm not sure that, say .51 standardized beta for cathegorical predictor and .51 standardized beta for continuous predictor have the same "impact". Can anyone confirm?

Thanks




====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Swank, Paul R
I wouldn't rely on standardized betas to judge the difference in impacts across variables. Too many things have to be perfect before they give you good answers. Secondly, the standard deviation for the categorical variable is not of much use since the difference is in the raw unit, ie. your in one group or another. Assuming equal groups and a large n, the standard deviation for the categorical variable will be about .5. So how much do you care about the change in the outcome associated with being halfway between the two groups? Why not just look at the difference in the adjusted means for each group. That controls for all the other variables in the model.

Paul R. Swank, Ph.D
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center
Houston, TX 77038


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of paul wilson
Sent: Thursday, October 09, 2008 10:48 AM
To: [hidden email]
Subject: Dummy vs Continuous predictors in regression

Hi,

I'm running a linear regression model using cathegorical dummy (0/1 coding) as well as continuous predicotrs.
My questing is regarding interpretation of standardized beta coefficients and using them for comparisons between dummies and continuous
predictors' effects. I understand the standardized beta values tell us the number of standard deviations the outcome will change as a result of one standard deviation change in predictor. Now, is this applicable to cathegorical/dummy variables as well?
My cathegorical and  continuous predictors are measured in different units so standardized beta would be great for comparing their
impact on the outcome, however I'm not sure that, say .51 standardized beta for cathegorical predictor and .51 standardized beta for continuous predictor have the same "impact". Can anyone confirm?

Thanks




=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Kooij, A.J. van der
In reply to this post by paul wilson-7
With CATREG in the SPSS CATEGORIES add-on module you can analyze categorical variables (unordered and ordered) as well as numeric variables. For categorical variables you do not need dummies and you will obtain beta coeffcient for the variable as a whole. (CATREG is in menu under Regression, Optimal Scaling).

If you do not have CATEGORIES, you can compute CATREG results for numeric dependent variable and nominal (unordered categorical) and  numeric independent variables as follows:

Use standardized dependent variable in the regression with dummies. Recode the categories of a variable with the B coefficients.

For example for variable v1 with 5 categories:

compute quantv1 = v1.

recode quantv1 (1=B1) (2=B2) (3=B3) (4=B4) (5=0).

Then run descriptives, saving the standardized recoded variables.

DESCRIPTIVES  VARIABLES=quantv1 /STATISTICS=STDDEV /SAVE.

The Std. dev. is the beta for the variable. The saved standardized variable zquantv1 is the quantified variable: category values replaced with optimal nominal quantifications. To create transformation plot:

GRAPH  /LINE(SIMPLE)=MEAN(zquantv1) BY v1.

 

NB: for one category you don't need a dummy. The left out category is recoded to 0. It does not matter which category is left out. You obtain different b's when different left out category, but the quantifications (standardized B's) will not be different.

 

Regards,

Anita van der Kooij

Data Theory Group

Leiden University


________________________________

From: SPSSX(r) Discussion on behalf of paul wilson
Sent: Thu 09/10/2008 17:47
To: [hidden email]
Subject: Dummy vs Continuous predictors in regression



Hi,

I'm running a linear regression model using cathegorical dummy (0/1 coding) as well as continuous predicotrs.
My questing is regarding interpretation of standardized beta coefficients and using them for comparisons between dummies and continuous
predictors' effects. I understand the standardized beta values tell us the number of standard deviations the outcome will change as a result of one standard deviation change in predictor. Now, is this applicable to cathegorical/dummy variables as well?
My cathegorical and  continuous predictors are measured in different units so standardized beta would be great for comparing their
impact on the outcome, however I'm not sure that, say .51 standardized beta for cathegorical predictor and .51 standardized beta for continuous predictor have the same "impact". Can anyone confirm?

Thanks




=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************



====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Import problem (from excel into SPSS)

Valerie Guille-2
Hi all,

I hope that someone will be able to help me here.
I am trying to import a big excel spreadsheet into SPSS using the
"create a new query using Database Wizard" tool in SPSS.

The following warning appears:
GET DATA /TYPE=ODBC /CONNECT=
 'DSN=Excel Files;DBQ=L:\INF\SOD details\INF_1 ASF monitoring SOD
rats\Digi'+
 'gait\SPSS-Cohort 1 and 2.xls;DriverId=790;'
  'MaxBufferSize=2048;PageTimeout=5;'
 /SQL = "SELECT  `Animal ID` AS Animal_ID,  Genotype AS Genotype_,  Sex,
P"+
 "ND_wk,  Limb,  Swing AS `@_Swing`,  Brake AS `@_Brake`, "
  " Propel AS `@_Propel`,  Stance AS `@_Stance`,  Stride AS `@_Stride`,
`#"+
 "Steps` AS `@_#Steps`,  `Gait Symmetry` AS "
  "`@_Gait_Symmetry`,  `Belt Speed` AS `@_Belt_Speed` FROM  `'15cms$'`"
 /ASSUMEDSTRWIDTH=255
 .

>Warning.  Command name: GET DATA
>SQLExecDirect failed :[Microsoft][ODBC Excel Driver] Too few
parameters. Expected 8.

CACHE.
DATASET NAME DataSet1 WINDOW=FRONT.


I can import other excel spreadsheet but this one seems to be a
problem...
If anyone could help that will be greatly appreciated.

Thanks,

Valerie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Import problem (from excel into SPSS)

Peck, Jon
Have you tried to just read the file with GET DATA /TYPE=XLS (or File/Open/Data)?

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Valerie Guille
Sent: Thursday, October 09, 2008 10:24 PM
To: [hidden email]
Subject: [SPSSX-L] Import problem (from excel into SPSS)

Hi all,

I hope that someone will be able to help me here.
I am trying to import a big excel spreadsheet into SPSS using the
"create a new query using Database Wizard" tool in SPSS.

The following warning appears:
GET DATA /TYPE=ODBC /CONNECT=
 'DSN=Excel Files;DBQ=L:\INF\SOD details\INF_1 ASF monitoring SOD
rats\Digi'+
 'gait\SPSS-Cohort 1 and 2.xls;DriverId=790;'
  'MaxBufferSize=2048;PageTimeout=5;'
 /SQL = "SELECT  `Animal ID` AS Animal_ID,  Genotype AS Genotype_,  Sex,
P"+
 "ND_wk,  Limb,  Swing AS `@_Swing`,  Brake AS `@_Brake`, "
  " Propel AS `@_Propel`,  Stance AS `@_Stance`,  Stride AS `@_Stride`,
`#"+
 "Steps` AS `@_#Steps`,  `Gait Symmetry` AS "
  "`@_Gait_Symmetry`,  `Belt Speed` AS `@_Belt_Speed` FROM  `'15cms$'`"
 /ASSUMEDSTRWIDTH=255
 .

>Warning.  Command name: GET DATA
>SQLExecDirect failed :[Microsoft][ODBC Excel Driver] Too few
parameters. Expected 8.

CACHE.
DATASET NAME DataSet1 WINDOW=FRONT.


I can import other excel spreadsheet but this one seems to be a
problem...
If anyone could help that will be greatly appreciated.

Thanks,

Valerie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Michael Pearmain-2
In reply to this post by Kooij, A.J. van der
This is just a minor bug bear of mind with regard to dummy variables, and
people using them as a fixing solution, ( I agree they are useful, but if
you understand why you are using them)
Honestly don't mean to rant...


"NB: for one category you don't need a dummy. The left out category is
recoded to 0. It does not matter which category is left out. You obtain
different b's when different left out category, but the quantifications
(standardized B's) will not be different."

Mathematically speaking it is very important to understand which category
you are leaving out, even more so in ordinal data.

Let's use a specification that uses dummies for test scores up to 10 (an
ordinal variable). As mentioned above, we need to drop one of the dummies.
Let's drop the dummy which takes on the value 1.
Note again that the constant includes both baseline test scores and the lift
due to the test scores of 1, since we cannot separately identify those.
The analysis may lead to interesting insights, however. The results  may
show that only having scored 4 or 7 lead to a significant increase in pay
say.

Now let's drop the 10th, which will keep the model identified and enable us
to quantify the effect of scores of one (which generate more than half the
scores in our dataset lets say and the score 10 accounts for 0.5%).
Exclusion of the dummy with infrequent 1's bring the covariate matrix to
near singularity, which pushes up the standard errors of paramaters and can
make the estimates insignificant.

It's just another case of think what you are doing rather than just doing it

Mike



On Fri, Oct 10, 2008 at 2:22 AM, Kooij, A.J. van der <
[hidden email]> wrote:

> With CATREG in the SPSS CATEGORIES add-on module you can analyze
> categorical variables (unordered and ordered) as well as numeric variables.
> For categorical variables you do not need dummies and you will obtain beta
> coeffcient for the variable as a whole. (CATREG is in menu under Regression,
> Optimal Scaling).
>
> If you do not have CATEGORIES, you can compute CATREG results for numeric
> dependent variable and nominal (unordered categorical) and  numeric
> independent variables as follows:
>
> Use standardized dependent variable in the regression with dummies. Recode
> the categories of a variable with the B coefficients.
>
> For example for variable v1 with 5 categories:
>
> compute quantv1 = v1.
>
> recode quantv1 (1=B1) (2=B2) (3=B3) (4=B4) (5=0).
>
> Then run descriptives, saving the standardized recoded variables.
>
> DESCRIPTIVES  VARIABLES=quantv1 /STATISTICS=STDDEV /SAVE.
>
> The Std. dev. is the beta for the variable. The saved standardized variable
> zquantv1 is the quantified variable: category values replaced with optimal
> nominal quantifications. To create transformation plot:
>
> GRAPH  /LINE(SIMPLE)=MEAN(zquantv1) BY v1.
>
>
>
> NB: for one category you don't need a dummy. The left out category is
> recoded to 0. It does not matter which category is left out. You obtain
> different b's when different left out category, but the quantifications
> (standardized B's) will not be different.
>
>
>
> Regards,
>
> Anita van der Kooij
>
> Data Theory Group
>
> Leiden University
>
>
> ________________________________
>
> From: SPSSX(r) Discussion on behalf of paul wilson
> Sent: Thu 09/10/2008 17:47
> To: [hidden email]
> Subject: Dummy vs Continuous predictors in regression
>
>
>
> Hi,
>
> I'm running a linear regression model using cathegorical dummy (0/1 coding)
> as well as continuous predicotrs.
> My questing is regarding interpretation of standardized beta coefficients
> and using them for comparisons between dummies and continuous
> predictors' effects. I understand the standardized beta values tell us the
> number of standard deviations the outcome will change as a result of one
> standard deviation change in predictor. Now, is this applicable to
> cathegorical/dummy variables as well?
> My cathegorical and  continuous predictors are measured in different units
> so standardized beta would be great for comparing their
> impact on the outcome, however I'm not sure that, say .51 standardized beta
> for cathegorical predictor and .51 standardized beta for continuous
> predictor have the same "impact". Can anyone confirm?
>
> Thanks
>
>
>
>
> =======
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
> **********************************************************************
>
>
>
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Michael Pearmain
Senior Statistical Analyst


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
[hidden email]

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Burleson,Joseph A.
Michael:

I, in turn, beg your pardon if I am not understanding you correctly.

But I believe that Anita meant just what she said, and that she is
correct: If you only have 2 categories (e.g., 0,1) comprising one
variable.

Your example used a 10-categories (or 10-level) variable. What you say
makes perfect sense.

Clearly if one codes the 2-level dummy as 0,1 versus 1,0, then the beta
is reversed in value.

Hence, for a 2-level variable, it is already dummied. That is my
impression of what she meant.

Joe Burleson

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Michael Pearmain
Sent: Friday, October 17, 2008 10:00 AM
To: [hidden email]
Subject: Re: Dummy vs Continuous predictors in regression

This is just a minor bug bear of mind with regard to dummy variables,
and
people using them as a fixing solution, ( I agree they are useful, but
if
you understand why you are using them)
Honestly don't mean to rant...


"NB: for one category you don't need a dummy. The left out category is
recoded to 0. It does not matter which category is left out. You obtain
different b's when different left out category, but the quantifications
(standardized B's) will not be different."

Mathematically speaking it is very important to understand which
category
you are leaving out, even more so in ordinal data.

Let's use a specification that uses dummies for test scores up to 10 (an
ordinal variable). As mentioned above, we need to drop one of the
dummies.
Let's drop the dummy which takes on the value 1.
Note again that the constant includes both baseline test scores and the
lift
due to the test scores of 1, since we cannot separately identify those.
The analysis may lead to interesting insights, however. The results  may
show that only having scored 4 or 7 lead to a significant increase in
pay
say.

Now let's drop the 10th, which will keep the model identified and enable
us
to quantify the effect of scores of one (which generate more than half
the
scores in our dataset lets say and the score 10 accounts for 0.5%).
Exclusion of the dummy with infrequent 1's bring the covariate matrix to
near singularity, which pushes up the standard errors of paramaters and
can
make the estimates insignificant.

It's just another case of think what you are doing rather than just
doing it

Mike



On Fri, Oct 10, 2008 at 2:22 AM, Kooij, A.J. van der <
[hidden email]> wrote:

> With CATREG in the SPSS CATEGORIES add-on module you can analyze
> categorical variables (unordered and ordered) as well as numeric
variables.
> For categorical variables you do not need dummies and you will obtain
beta
> coeffcient for the variable as a whole. (CATREG is in menu under
Regression,
> Optimal Scaling).
>
> If you do not have CATEGORIES, you can compute CATREG results for
numeric
> dependent variable and nominal (unordered categorical) and  numeric
> independent variables as follows:
>
> Use standardized dependent variable in the regression with dummies.
Recode

> the categories of a variable with the B coefficients.
>
> For example for variable v1 with 5 categories:
>
> compute quantv1 = v1.
>
> recode quantv1 (1=B1) (2=B2) (3=B3) (4=B4) (5=0).
>
> Then run descriptives, saving the standardized recoded variables.
>
> DESCRIPTIVES  VARIABLES=quantv1 /STATISTICS=STDDEV /SAVE.
>
> The Std. dev. is the beta for the variable. The saved standardized
variable
> zquantv1 is the quantified variable: category values replaced with
optimal
> nominal quantifications. To create transformation plot:
>
> GRAPH  /LINE(SIMPLE)=MEAN(zquantv1) BY v1.
>
>
>
> NB: for one category you don't need a dummy. The left out category is
> recoded to 0. It does not matter which category is left out. You
obtain
> different b's when different left out category, but the
quantifications

> (standardized B's) will not be different.
>
>
>
> Regards,
>
> Anita van der Kooij
>
> Data Theory Group
>
> Leiden University
>
>
> ________________________________
>
> From: SPSSX(r) Discussion on behalf of paul wilson
> Sent: Thu 09/10/2008 17:47
> To: [hidden email]
> Subject: Dummy vs Continuous predictors in regression
>
>
>
> Hi,
>
> I'm running a linear regression model using cathegorical dummy (0/1
coding)
> as well as continuous predicotrs.
> My questing is regarding interpretation of standardized beta
coefficients
> and using them for comparisons between dummies and continuous
> predictors' effects. I understand the standardized beta values tell us
the
> number of standard deviations the outcome will change as a result of
one
> standard deviation change in predictor. Now, is this applicable to
> cathegorical/dummy variables as well?
> My cathegorical and  continuous predictors are measured in different
units
> so standardized beta would be great for comparing their
> impact on the outcome, however I'm not sure that, say .51 standardized
beta

> for cathegorical predictor and .51 standardized beta for continuous
> predictor have the same "impact". Can anyone confirm?
>
> Thanks
>
>
>
>
> =======
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
the

> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
> **********************************************************************
>
>
>
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Michael Pearmain
Senior Statistical Analyst


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
[hidden email]

If you received this communication by mistake, please don't forward it
to
anyone else (it may contain confidential or privileged information),
please
erase all copies of it, including all attachments, and please let the
sender
know it went to the wrong person. Thanks.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Peck, Jon
In reply to this post by Michael Pearmain-2
I have to disagree a bit with this post.  Mathematically it makes absolutely no difference which dummy you omit (absent exact singularity in a procedure not using a generalized inverse).  Computationally, this is also almost true, since the estimation procedures use numerically stable algorithms to control computational error that could be introduced by near singularity.

Substantively, of course, it makes a big difference, because it changes the meaning of the coefficients.  But you can calculate any estimable effect from any parameterization, and while it may be a little harder to get the standard errors if the effect isn't directly represented in the equation, calculating the standard errors from the coefficient covariance matrix will give the same result as any other variant.

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Pearmain
Sent: Friday, October 17, 2008 8:00 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Dummy vs Continuous predictors in regression

This is just a minor bug bear of mind with regard to dummy variables, and
people using them as a fixing solution, ( I agree they are useful, but if
you understand why you are using them)
Honestly don't mean to rant...


"NB: for one category you don't need a dummy. The left out category is
recoded to 0. It does not matter which category is left out. You obtain
different b's when different left out category, but the quantifications
(standardized B's) will not be different."

Mathematically speaking it is very important to understand which category
you are leaving out, even more so in ordinal data.

Let's use a specification that uses dummies for test scores up to 10 (an
ordinal variable). As mentioned above, we need to drop one of the dummies.
Let's drop the dummy which takes on the value 1.
Note again that the constant includes both baseline test scores and the lift
due to the test scores of 1, since we cannot separately identify those.
The analysis may lead to interesting insights, however. The results  may
show that only having scored 4 or 7 lead to a significant increase in pay
say.

Now let's drop the 10th, which will keep the model identified and enable us
to quantify the effect of scores of one (which generate more than half the
scores in our dataset lets say and the score 10 accounts for 0.5%).
Exclusion of the dummy with infrequent 1's bring the covariate matrix to
near singularity, which pushes up the standard errors of paramaters and can
make the estimates insignificant.

It's just another case of think what you are doing rather than just doing it

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Michael Pearmain-2
In reply to this post by Burleson,Joseph A.
Hi Joe,
Yes, you are right w.r.t Anita's example, i think it was a case of me
reading the post to quickly,
Apologies all, However it holds, as Joe suggest for multi-level variables.

Mike


On Fri, Oct 17, 2008 at 3:21 PM, Burleson,Joseph A. <[hidden email]>wrote:

> Michael:
>
> I, in turn, beg your pardon if I am not understanding you correctly.
>
> But I believe that Anita meant just what she said, and that she is
> correct: If you only have 2 categories (e.g., 0,1) comprising one
> variable.
>
> Your example used a 10-categories (or 10-level) variable. What you say
> makes perfect sense.
>
> Clearly if one codes the 2-level dummy as 0,1 versus 1,0, then the beta
> is reversed in value.
>
> Hence, for a 2-level variable, it is already dummied. That is my
> impression of what she meant.
>
> Joe Burleson
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Michael Pearmain
> Sent: Friday, October 17, 2008 10:00 AM
> To: [hidden email]
> Subject: Re: Dummy vs Continuous predictors in regression
>
> This is just a minor bug bear of mind with regard to dummy variables,
> and
> people using them as a fixing solution, ( I agree they are useful, but
> if
> you understand why you are using them)
> Honestly don't mean to rant...
>
>
> "NB: for one category you don't need a dummy. The left out category is
> recoded to 0. It does not matter which category is left out. You obtain
> different b's when different left out category, but the quantifications
> (standardized B's) will not be different."
>
> Mathematically speaking it is very important to understand which
> category
> you are leaving out, even more so in ordinal data.
>
> Let's use a specification that uses dummies for test scores up to 10 (an
> ordinal variable). As mentioned above, we need to drop one of the
> dummies.
> Let's drop the dummy which takes on the value 1.
> Note again that the constant includes both baseline test scores and the
> lift
> due to the test scores of 1, since we cannot separately identify those.
> The analysis may lead to interesting insights, however. The results  may
> show that only having scored 4 or 7 lead to a significant increase in
> pay
> say.
>
> Now let's drop the 10th, which will keep the model identified and enable
> us
> to quantify the effect of scores of one (which generate more than half
> the
> scores in our dataset lets say and the score 10 accounts for 0.5%).
> Exclusion of the dummy with infrequent 1's bring the covariate matrix to
> near singularity, which pushes up the standard errors of paramaters and
> can
> make the estimates insignificant.
>
> It's just another case of think what you are doing rather than just
> doing it
>
> Mike
>
>
>
> On Fri, Oct 10, 2008 at 2:22 AM, Kooij, A.J. van der <
> [hidden email]> wrote:
>
> > With CATREG in the SPSS CATEGORIES add-on module you can analyze
> > categorical variables (unordered and ordered) as well as numeric
> variables.
> > For categorical variables you do not need dummies and you will obtain
> beta
> > coeffcient for the variable as a whole. (CATREG is in menu under
> Regression,
> > Optimal Scaling).
> >
> > If you do not have CATEGORIES, you can compute CATREG results for
> numeric
> > dependent variable and nominal (unordered categorical) and  numeric
> > independent variables as follows:
> >
> > Use standardized dependent variable in the regression with dummies.
> Recode
> > the categories of a variable with the B coefficients.
> >
> > For example for variable v1 with 5 categories:
> >
> > compute quantv1 = v1.
> >
> > recode quantv1 (1=B1) (2=B2) (3=B3) (4=B4) (5=0).
> >
> > Then run descriptives, saving the standardized recoded variables.
> >
> > DESCRIPTIVES  VARIABLES=quantv1 /STATISTICS=STDDEV /SAVE.
> >
> > The Std. dev. is the beta for the variable. The saved standardized
> variable
> > zquantv1 is the quantified variable: category values replaced with
> optimal
> > nominal quantifications. To create transformation plot:
> >
> > GRAPH  /LINE(SIMPLE)=MEAN(zquantv1) BY v1.
> >
> >
> >
> > NB: for one category you don't need a dummy. The left out category is
> > recoded to 0. It does not matter which category is left out. You
> obtain
> > different b's when different left out category, but the
> quantifications
> > (standardized B's) will not be different.
> >
> >
> >
> > Regards,
> >
> > Anita van der Kooij
> >
> > Data Theory Group
> >
> > Leiden University
> >
> >
> > ________________________________
> >
> > From: SPSSX(r) Discussion on behalf of paul wilson
> > Sent: Thu 09/10/2008 17:47
> > To: [hidden email]
> > Subject: Dummy vs Continuous predictors in regression
> >
> >
> >
> > Hi,
> >
> > I'm running a linear regression model using cathegorical dummy (0/1
> coding)
> > as well as continuous predicotrs.
> > My questing is regarding interpretation of standardized beta
> coefficients
> > and using them for comparisons between dummies and continuous
> > predictors' effects. I understand the standardized beta values tell us
> the
> > number of standard deviations the outcome will change as a result of
> one
> > standard deviation change in predictor. Now, is this applicable to
> > cathegorical/dummy variables as well?
> > My cathegorical and  continuous predictors are measured in different
> units
> > so standardized beta would be great for comparing their
> > impact on the outcome, however I'm not sure that, say .51 standardized
> beta
> > for cathegorical predictor and .51 standardized beta for continuous
> > predictor have the same "impact". Can anyone confirm?
> >
> > Thanks
> >
> >
> >
> >
> > =======
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except
> the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
> >
> >
> > **********************************************************************
> > This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they
> > are addressed. If you have received this email in error please notify
> > the system manager.
> > **********************************************************************
> >
> >
> >
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except
> the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
>
>
>
> --
> Michael Pearmain
> Senior Statistical Analyst
>
>
> Google UK Ltd
> Belgrave House
> 76 Buckingham Palace Road
> London SW1W 9TQ
> United Kingdom
> t +44 (0) 2032191684
> [hidden email]
>
> If you received this communication by mistake, please don't forward it
> to
> anyone else (it may contain confidential or privileged information),
> please
> erase all copies of it, including all attachments, and please let the
> sender
> know it went to the wrong person. Thanks.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Michael Pearmain
Senior Statistical Analyst


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
[hidden email]

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy vs Continuous predictors in regression

Kooij, A.J. van der
No, Joe was not right, I didn't mean variables with just 2 categories. To illustrate that it does not matter which category is left out, I inserted syntax below for 2 predictors with 3 and 4 categories. In the 2 regressions with dummies, different categories for both were left out. Both result in the same beta's: the Std.dev. of the recoded standardized variables (called quantified or transformed variables in CATREG). Also, the transformed variables are the same.
 
btw: if you use dummy regression for an ordinal variable you are treating the variable as nominal. Using dummies and treat it as ordinal, you need "Non-negative least squares" regression.
With CATREG you can choose to treat variables as nominal or ordinal.
 
Regards,
Anita
 

 

data list free/a b y.

begin data.

1 1 6

1 1 7

1 1 5

1 2 7

1 2 6

1 2 8

1 3 4

1 3 5

1 4 5

1 4 3

1 4 2

1 4 4

2 1 4

2 1 5

2 2 4

2 2 3

2 2 5

2 3 9

2 3 8

2 3 7

2 4 7

2 4 8

3 1 10

3 1 8

3 1 9

3 2 5

3 2 3

3 3 7

3 3 6

3 4 10

3 4 8

3 4 9

end data.

 

VECTOR a(3F8.0).

LOOP #i = 1 TO 3.

COMPUTE a(#i)=  (a = (#i)) .

END LOOP.

EXECUTE.

VECTOR b(4F8.0).

LOOP #i = 1 TO 4.

COMPUTE b(#i)=  (b = (#i)) .

END LOOP.

EXECUTE.

DESCRIPTIVES VARIABLES=y /SAVE .

 

REGRESSION

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS R ANOVA

  /CRITERIA=PIN(.05) POUT(.10)

  /NOORIGIN

  /DEPENDENT zy

  /METHOD=ENTER a1 a2 b1 b2 b3 .

compute quanta3 = a.

recode quanta3 (1 = -1.0297) (2=-.6323) (3=0).

compute quantb4 = b.

recode quantb4 (1 = .1874) (2=-.4767) (3=.1267) (4=0).

DESCRIPTIVES VARIABLES=quanta3 quantb4  /SAVE /STATISTICS STDDEV .

 

REGRESSION

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS R ANOVA

  /CRITERIA=PIN(.05) POUT(.10)

  /NOORIGIN

  /DEPENDENT zy

  /METHOD=ENTER a1 a3 b1 b2 b4 .

compute quanta2 = a.

recode quanta2 (1 = -.3974) (2= 0) (3= .6323).

compute quantb3 = b.

recode quantb3 (1 = .0607) (2=-.6034) (3=0) (4=-.1267).

DESCRIPTIVES VARIABLES=quanta2 quantb3  /SAVE /STATISTICS STDDEV .

 

*transformation plots.

GRAPH /LINE(SIMPLE)=MEAN(Zquanta3) BY a .

GRAPH /LINE(SIMPLE)=MEAN(Zquanta2) BY a .

GRAPH /LINE(SIMPLE)=MEAN(Zquantb4) BY b .

GRAPH /LINE(SIMPLE)=MEAN(Zquantb3) BY b .

 

________________________________

From: SPSSX(r) Discussion on behalf of Michael Pearmain
Sent: Fri 17/10/2008 16:30
To: [hidden email]
Subject: Re: Dummy vs Continuous predictors in regression



Hi Joe,
Yes, you are right w.r.t Anita's example, i think it was a case of me
reading the post to quickly,
Apologies all, However it holds, as Joe suggest for multi-level variables.

Mike


On Fri, Oct 17, 2008 at 3:21 PM, Burleson,Joseph A. <[hidden email]>wrote:

> Michael:
>
> I, in turn, beg your pardon if I am not understanding you correctly.
>
> But I believe that Anita meant just what she said, and that she is
> correct: If you only have 2 categories (e.g., 0,1) comprising one
> variable.
>
> Your example used a 10-categories (or 10-level) variable. What you say
> makes perfect sense.
>
> Clearly if one codes the 2-level dummy as 0,1 versus 1,0, then the beta
> is reversed in value.
>
> Hence, for a 2-level variable, it is already dummied. That is my
> impression of what she meant.
>
> Joe Burleson
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Michael Pearmain
> Sent: Friday, October 17, 2008 10:00 AM
> To: [hidden email]
> Subject: Re: Dummy vs Continuous predictors in regression
>
> This is just a minor bug bear of mind with regard to dummy variables,
> and
> people using them as a fixing solution, ( I agree they are useful, but
> if
> you understand why you are using them)
> Honestly don't mean to rant...
>
>
> "NB: for one category you don't need a dummy. The left out category is
> recoded to 0. It does not matter which category is left out. You obtain
> different b's when different left out category, but the quantifications
> (standardized B's) will not be different."
>
> Mathematically speaking it is very important to understand which
> category
> you are leaving out, even more so in ordinal data.
>
> Let's use a specification that uses dummies for test scores up to 10 (an
> ordinal variable). As mentioned above, we need to drop one of the
> dummies.
> Let's drop the dummy which takes on the value 1.
> Note again that the constant includes both baseline test scores and the
> lift
> due to the test scores of 1, since we cannot separately identify those.
> The analysis may lead to interesting insights, however. The results  may
> show that only having scored 4 or 7 lead to a significant increase in
> pay
> say.
>
> Now let's drop the 10th, which will keep the model identified and enable
> us
> to quantify the effect of scores of one (which generate more than half
> the
> scores in our dataset lets say and the score 10 accounts for 0.5%).
> Exclusion of the dummy with infrequent 1's bring the covariate matrix to
> near singularity, which pushes up the standard errors of paramaters and
> can
> make the estimates insignificant.
>
> It's just another case of think what you are doing rather than just
> doing it
>
> Mike
>
>
>
> On Fri, Oct 10, 2008 at 2:22 AM, Kooij, A.J. van der <
> [hidden email]> wrote:
>
> > With CATREG in the SPSS CATEGORIES add-on module you can analyze
> > categorical variables (unordered and ordered) as well as numeric
> variables.
> > For categorical variables you do not need dummies and you will obtain
> beta
> > coeffcient for the variable as a whole. (CATREG is in menu under
> Regression,
> > Optimal Scaling).
> >
> > If you do not have CATEGORIES, you can compute CATREG results for
> numeric
> > dependent variable and nominal (unordered categorical) and  numeric
> > independent variables as follows:
> >
> > Use standardized dependent variable in the regression with dummies.
> Recode
> > the categories of a variable with the B coefficients.
> >
> > For example for variable v1 with 5 categories:
> >
> > compute quantv1 = v1.
> >
> > recode quantv1 (1=B1) (2=B2) (3=B3) (4=B4) (5=0).
> >
> > Then run descriptives, saving the standardized recoded variables.
> >
> > DESCRIPTIVES  VARIABLES=quantv1 /STATISTICS=STDDEV /SAVE.
> >
> > The Std. dev. is the beta for the variable. The saved standardized
> variable
> > zquantv1 is the quantified variable: category values replaced with
> optimal
> > nominal quantifications. To create transformation plot:
> >
> > GRAPH  /LINE(SIMPLE)=MEAN(zquantv1) BY v1.
> >
> >
> >
> > NB: for one category you don't need a dummy. The left out category is
> > recoded to 0. It does not matter which category is left out. You
> obtain
> > different b's when different left out category, but the
> quantifications
> > (standardized B's) will not be different.
> >
> >
> >
> > Regards,
> >
> > Anita van der Kooij
> >
> > Data Theory Group
> >
> > Leiden University
> >
> >
> > ________________________________
> >
> > From: SPSSX(r) Discussion on behalf of paul wilson
> > Sent: Thu 09/10/2008 17:47
> > To: [hidden email]
> > Subject: Dummy vs Continuous predictors in regression
> >
> >
> >
> > Hi,
> >
> > I'm running a linear regression model using cathegorical dummy (0/1
> coding)
> > as well as continuous predicotrs.
> > My questing is regarding interpretation of standardized beta
> coefficients
> > and using them for comparisons between dummies and continuous
> > predictors' effects. I understand the standardized beta values tell us
> the
> > number of standard deviations the outcome will change as a result of
> one
> > standard deviation change in predictor. Now, is this applicable to
> > cathegorical/dummy variables as well?
> > My cathegorical and  continuous predictors are measured in different
> units
> > so standardized beta would be great for comparing their
> > impact on the outcome, however I'm not sure that, say .51 standardized
> beta
> > for cathegorical predictor and .51 standardized beta for continuous
> > predictor have the same "impact". Can anyone confirm?
> >
> > Thanks
> >
> >
> >
> >
> > =======
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except
> the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
> >
> >
> > **********************************************************************
> > This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they
> > are addressed. If you have received this email in error please notify
> > the system manager.
> > **********************************************************************
> >
> >
> >
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except
> the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
>
>
>
> --
> Michael Pearmain
> Senior Statistical Analyst
>
>
> Google UK Ltd
> Belgrave House
> 76 Buckingham Palace Road
> London SW1W 9TQ
> United Kingdom
> t +44 (0) 2032191684
> [hidden email]
>
> If you received this communication by mistake, please don't forward it
> to
> anyone else (it may contain confidential or privileged information),
> please
> erase all copies of it, including all attachments, and please let the
> sender
> know it went to the wrong person. Thanks.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Michael Pearmain
Senior Statistical Analyst


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
[hidden email]

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD