SPSSX Discussion

Problems with GzLM Category Order for Factors

Classic

List

Threaded

4 messages Options

Gary Rosin

Problems with GzLM Category Order for Factors

I've run into a mystery running a Generalized Linear Model
under SPSS 15.0.1.

I have a grouped binomial dependent trial/response variable
(Event of Trial) and a logit link function. I have two numerical
scale variables, C and L, and two dummy variables, S and P,
coded 1 or 0 based on presence or absence of certain
chaacteristics. The model I'm checking has both main effects
and interactions. To handle over-dispersion, I'm estimating
parameters using a scale parameter based on the model's
Pearson Chi-Square. See command syntax below.

The problem is that I get slightly *different results* when I run
the model using ascending category order for factors (the default)
and using a descending category order (this makes the output
show the effect of the presence of the factor, rather than of its
absence).

Specifically, the coefficients and the standard errors on the numerical
scale variables are different. As a result, the confidence intervals,
Wald Chi-Squares, and significances also vary.

The coefficients and standard errors for the factors and for the
interaction variables do not change (other than to change sign).
The Intercept also changes (no doubt, in order to account for the
change in sign of the factor variable included as a main effect).

Also, neither the Goodness of Fit table nor the calculated scale
parameter changes.

Oddly enough, for the *ascending* model, the entries in the Test of
Model Effects table (ToME) (Wald Chi-Square and significance)

--are different from the corresponding entries in the Parameter
Estimates table (PE), but

--are identical to ToME and PE tables for the *descending*
model.

So. Why is this happening? Is it me or have I found an "easter-egg"
in SPSS 15.0.1?

Thanks.

Gary Rosin <[hidden email]>

-----------------------------------

Command Syntax: the same in both models, except for
(ORDER=DESCENDING):

* Generalized Linear Models.
GENLIN
Event OF Trial
BY S P
(ORDER=ASCENDING)
WITH C L
/MODEL
S C S*C P*C L P*L
INTERCEPT=YES
DISTRIBUTION=BINOMIAL
LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON
COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012
ANALYSISTYPE=3 CILEVEL=95 LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT
SUMMARY SOLUTION(EXPONENTIATED) COVB CORB
HISTORY(1).

Reutter, Alex

Re: Problems with GzLM Category Order for Factors

Hi Gary,

What you've found is an "Easter Egg" of model design. If A is a 0-1 factor and X is a covariate, then the model:

Intercept A X A*X

produces two redundant parameters in the estimates table:

[A=1]
[A=1]*X

The [A=1] is identified with the intercept and [A=1]*X is identified with the X term. When you change the order of A, then the model produces two redundant parameters:

[A=0]
[A=0]*X

The [A=0] is identified with the intercept and [A=0]*X is identified with the X term. The factor and factor-covariate interaction coefficients simply change sign, but the coefficient for the X term in the second model is the sum of the coefficients for X and [A=0]*X terms in the first model. For example:

GET FILE='1991 U.S. General Social Survey.sav'.
select if (happy=1 or happy=2).

* Ascending.
GENLIN happy BY sex (ORDER=ASCENDING) WITH life
/MODEL sex life sex*life
INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT.
* For SEX=1: .905 + .186 - 1.029*LIFE - .175*LIFE = 1.091 - 1.204*LIFE.
* For SEX=2: .905 - 1.029*LIFE.

* Descending.
GENLIN happy BY sex (ORDER=DESCENDING) WITH life
/MODEL sex life sex*life
INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT.
* For SEX=1: 1.091 - 1.204*LIFE.
* For SEX=2: 1.091 - .186 - 1.204*LIFE + .175*LIFE = .905 - 1.029*LIFE.

Note that this will happen in any procedure on any statistical product; we've just made it easier to reproduce this result in Genlin. Best look to the Tests of Model Effects table in any model to help you determine whether the model term is significant.

Cheers,
Alex

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Rosin
Sent: Thursday, January 11, 2007 4:16 PM
To: [hidden email]
Subject: Problems with GzLM Category Order for Factors

I've run into a mystery running a Generalized Linear Model
under SPSS 15.0.1.

I have a grouped binomial dependent trial/response variable
(Event of Trial) and a logit link function. I have two numerical
scale variables, C and L, and two dummy variables, S and P,
coded 1 or 0 based on presence or absence of certain
chaacteristics. The model I'm checking has both main effects
and interactions. To handle over-dispersion, I'm estimating
parameters using a scale parameter based on the model's
Pearson Chi-Square. See command syntax below.

The problem is that I get slightly *different results* when I run
the model using ascending category order for factors (the default)
and using a descending category order (this makes the output
show the effect of the presence of the factor, rather than of its
absence).

Specifically, the coefficients and the standard errors on the numerical
scale variables are different. As a result, the confidence intervals,
Wald Chi-Squares, and significances also vary.

The coefficients and standard errors for the factors and for the
interaction variables do not change (other than to change sign).
The Intercept also changes (no doubt, in order to account for the
change in sign of the factor variable included as a main effect).

Also, neither the Goodness of Fit table nor the calculated scale
parameter changes.

Oddly enough, for the *ascending* model, the entries in the Test of
Model Effects table (ToME) (Wald Chi-Square and significance)

--are different from the corresponding entries in the Parameter
Estimates table (PE), but

--are identical to ToME and PE tables for the *descending*
model.

So. Why is this happening? Is it me or have I found an "easter-egg"
in SPSS 15.0.1?

Thanks.

Gary Rosin <[hidden email]>

-----------------------------------

Command Syntax: the same in both models, except for
(ORDER=DESCENDING):

* Generalized Linear Models.
GENLIN
Event OF Trial
BY S P
(ORDER=ASCENDING)
WITH C L
/MODEL
S C S*C P*C L P*L
INTERCEPT=YES
DISTRIBUTION=BINOMIAL
LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON
COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012
ANALYSISTYPE=3 CILEVEL=95 LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT
SUMMARY SOLUTION(EXPONENTIATED) COVB CORB
HISTORY(1).

Gary Rosin

Re: Problems with GzLM Category Order for Factors

Thanks, Alex. A couple of questions, if you would.

1. If the significances given in the parameter estimates are for
testing the hypothesis that the parameter (coefficient) is 0,
what does the Test of Model Effects show (test?)

2. I notice that the Wald Chi-squared given for the covariate
effects and parameters are different in both ascending and
descending models. That said, the values in the descending
model are closer to those in the ToME than those in the
ascending model. Is that why descending is the default?
By extension, does that make descending the preferred, or
at most usual, approach?

Gary

At 01:03 PM 1/12/2007, you wrote:

>Hi Gary,
>
>What you've found is an "Easter Egg" of model design.
>If A is a 0-1 factor and X is a covariate, then the model:
>
> Intercept A X A*X
>
>produces two redundant parameters in the estimates table:
>
> [A=1]
> [A=1]*X
>
>The [A=1] is identified with the intercept and [A=1]*X is identified
>with the X term. When you change the order of A, then the model
>produces two redundant parameters:
>
> [A=0]
> [A=0]*X
>
>The [A=0] is identified with the intercept and [A=0]*X is identified
>with the X term. The factor and factor-covariate interaction coefficients
>simply change sign, but the coefficient for the X term in the second
>model is the sum of the coefficients for X and [A=0]*X terms in the
>irst model. For example:
>
>GET FILE='1991 U.S. General Social Survey.sav'.
>select if (happy=1 or happy=2).
>
>* Ascending.
>GENLIN happy BY sex (ORDER=ASCENDING) WITH life
> /MODEL sex life sex*life
> INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT.
>* For SEX=1: .905 + .186 - 1.029*LIFE - .175*LIFE = 1.091 - 1.204*LIFE.
>* For SEX=2: .905 - 1.029*LIFE.
>
>* Descending.
>GENLIN happy BY sex (ORDER=DESCENDING) WITH life
> /MODEL sex life sex*life
> INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT.
>* For SEX=1: 1.091 - 1.204*LIFE.
>* For SEX=2: 1.091 - .186 - 1.204*LIFE + .175*LIFE = .905 - 1.029*LIFE.
>
>Note that this will happen in any procedure on any statistical product; we've
>just made it easier to reproduce this result in Genlin. Best look to the
>Tests of Model Effects table in any model to help you determine whether
>the model term is significant.
>
>Cheers,
>Alex
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Gary Rosin
>Sent: Thursday, January 11, 2007 4:16 PM
>To: [hidden email]
>Subject: Problems with GzLM Category Order for Factors
>
>I've run into a mystery running a Generalized Linear Model
>under SPSS 15.0.1.
>
>I have a grouped binomial dependent trial/response variable
>(Event of Trial) and a logit link function. I have two numerical
>scale variables, C and L, and two dummy variables, S and P,
>coded 1 or 0 based on presence or absence of certain
>chaacteristics. The model I'm checking has both main effects
>and interactions. To handle over-dispersion, I'm estimating
>parameters using a scale parameter based on the model's
>Pearson Chi-Square. See command syntax below.
>
>The problem is that I get slightly *different results* when I run
>the model using ascending category order for factors (the default)
>and using a descending category order (this makes the output
>show the effect of the presence of the factor, rather than of its
>absence).
>
>Specifically, the coefficients and the standard errors on the numerical
>scale variables are different. As a result, the confidence intervals,
>Wald Chi-Squares, and significances also vary.
>
>The coefficients and standard errors for the factors and for the
>interaction variables do not change (other than to change sign).
>The Intercept also changes (no doubt, in order to account for the
>change in sign of the factor variable included as a main effect).
>
>Also, neither the Goodness of Fit table nor the calculated scale
>parameter changes.
>
>Oddly enough, for the *ascending* model, the entries in the Test of
>Model Effects table (ToME) (Wald Chi-Square and significance)
>
> --are different from the corresponding entries in the Parameter
> Estimates table (PE), but
>
> --are identical to ToME and PE tables for the *descending*
> model.
>
>So. Why is this happening? Is it me or have I found an "easter-egg"
>in SPSS 15.0.1?
>
>Thanks.
>
>Gary Rosin <[hidden email]>
>
>
>-----------------------------------
>
>Command Syntax: the same in both models, except for
>(ORDER=DESCENDING):
>
>
>* Generalized Linear Models.
>GENLIN
> Event OF Trial
> BY S P
> (ORDER=ASCENDING)
> WITH C L
> /MODEL
> S C S*C P*C L P*L
> INTERCEPT=YES
> DISTRIBUTION=BINOMIAL
> LINK=LOGIT
> /CRITERIA METHOD=FISHER(1) SCALE=PEARSON
> COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
> PCONVERGE=1E-006(ABSOLUTE)
> SINGULAR=1E-012
> ANALYSISTYPE=3 CILEVEL=95 LIKELIHOOD=FULL
> /MISSING CLASSMISSING=EXCLUDE
> /PRINT CPS DESCRIPTIVES MODELINFO FIT
> SUMMARY SOLUTION(EXPONENTIATED) COVB CORB
> HISTORY(1).

Reutter, Alex

Re: Problems with GzLM Category Order for Factors

1. Add a /PRINT SUMMARY SOLUTION LMATRIX to the commands below (or add the LMATRIX keyword to your PRINT subcommand) and see the output under "Type III Estimable Functions". These are the contrasts used for the Tests of Model Effects.

2. Both tests use a chi-square statistic, but they're testing different things. Neither descending nor ascending is preferred, AFAIK.

Alex

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Rosin
Sent: Friday, January 12, 2007 3:09 PM
To: [hidden email]
Subject: Re: Problems with GzLM Category Order for Factors

Thanks, Alex. A couple of questions, if you would.

1. If the significances given in the parameter estimates are for
testing the hypothesis that the parameter (coefficient) is 0,
what does the Test of Model Effects show (test?)

2. I notice that the Wald Chi-squared given for the covariate
effects and parameters are different in both ascending and
descending models. That said, the values in the descending
model are closer to those in the ToME than those in the
ascending model. Is that why descending is the default?
By extension, does that make descending the preferred, or
at most usual, approach?

Gary

At 01:03 PM 1/12/2007, you wrote: