GENLOG vs GENLIN

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

GENLOG vs GENLIN

xenia
hello all,
I was wondering whether someone could enlighten me regarding the following problem. I have data on counts of death by drug use and gender, the drugs are four, let's say H, M, C, B. Deaths is the count/continuous variable, all other variables are binary i.e. use/no use for drug variables, male/female for gender. I know that I can use either GENLOG, which I first weight by the "deaths" variable which is the counts, i.e. the number of deaths, for each drug-gender combination, or GENLIN which covers loglinear models for count data. I don't have an offset variable, e.g. population at risk. I did both GENLIN and GENLOG. I fit all the main effects, all the 2-way interactions between drugs, all the 2-way interactions between each drug and gender, all the 3-way interactions between drugs, all the 3-way interactions between two drugs and gender, the 4-way interaction between all drugs, all the 4-way interactions between three drugs and gender, the 5-way interaction between all drugs and gender. I get different results for the interactions and I wonder why this is. For example, if I use the 0 group in each variable as a reference, GENLIN gives me a parameter estimate for the interaction H (group 1)*gender (group 1) and considers all other combinations redundant (01, 10, 00), while GENLOG considers all four combinations in this interaction redundant (00, 01, 10, 11), i.e. the parameter estimates for this interaction are zero.
Below is the syntax.

GENLIN deaths BY H M C B sex (ORDER=ASCENDING)
  /MODEL H M C B sex H*sex M*sex C*sex B*sex H*M H*C H*B M*C M*B C*B H*M*C H*M*B M*C*B
H*C*B H*M*sex H*C*sex H*B*sex M*C*sex M*B*sex C*B*sex H*M*C*sex H*M*B*sex H*C*B*sex M*C*B*sex H*M*C*B H*M*C*B*sex INTERCEPT=YES
 DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=NEWTON SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL.

and

WEIGHT BY deaths.
GENLOG H M C B sex
  /MODEL=POISSON
  /PRINT=FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV
  /PLOT=RESID(ADJRESID) NORMPROB(ADJRESID)
  /CRITERIA=CIN(95) ITERATE(20) CONVERGE(0.001) DELTA(.5)
  /DESIGN H M C B sex B*C*H*M*sex B*C B*H B*M B*sex C*H C*M C*sex H*M H*sex M*sex B*C*H B*C*M B*C*sex B*H*M B*H*sex B*M*sex C*H*M C*H*sex C*M*sex H*M*sex B*C*H*M B*C*H*sex B*C*M*sex B*H*M*sex C*H*M*sex.
(despite the fact that I entered the drugs in the same order through the interactive mode, the program ordered them its own way eventually).

Is there something in the algorithms used for each procedure that causes the differences in results?

Thank you all

Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Alex Reutter
I would guess it's because in the GENLIN syntax, H*sex appears before B*C*H*M*sex.  What happens when you run GENLOG with the same design as GENLIN?  That is,

WEIGHT BY deaths.
GENLOG H M C B sex
 /MODEL=POISSON
 /PRINT=FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV
 /PLOT=RESID(ADJRESID) NORMPROB(ADJRESID)
 /CRITERIA=CIN(95) ITERATE(20) CONVERGE(0.001) DELTA(.5)
 /DESIGN H M C B sex H*sex M*sex C*sex B*sex H*M H*C H*B M*C M*B C*B H*M*C
H*M*B M*C*B
H*C*B H*M*sex H*C*sex H*B*sex M*C*sex M*B*sex C*B*sex H*M*C*sex H*M*B*sex
H*C*B*sex M*C*B*sex H*M*C*B H*M*C*B*sex.


Alex
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Bruce Weaver
Administrator
In reply to this post by xenia
Why are you treating Deaths as a count?  A count regression model, as I understand it, would be used when each individual has a count of events.  But death is a binary variable (0 or 1) for each individual -- and when you WEIGHT by DEATHS, you are in essence getting a row for each individual where the variable is either a 0 or a 1, not a count.  I should think you want a logistic regression model, or some other model for a binary outcome (e.g., a model yielding the relative risk or risk difference).  

HTH.


xenia wrote
hello all,
I was wondering whether someone could enlighten me regarding the following problem. I have data on counts of death by drug use and gender, the drugs are four, let's say H, M, C, B. Deaths is the count/continuous variable, all other variables are binary i.e. use/no use for drug variables, male/female for gender. I know that I can use either GENLOG, which I first weight by the "deaths" variable which is the counts, i.e. the number of deaths, for each drug-gender combination, or GENLIN which covers loglinear models for count data. I don't have an offset variable, e.g. population at risk. I did both GENLIN and GENLOG. I fit all the main effects, all the 2-way interactions between drugs, all the 2-way interactions between each drug and gender, all the 3-way interactions between drugs, all the 3-way interactions between two drugs and gender, the 4-way interaction between all drugs, all the 4-way interactions between three drugs and gender, the 5-way interaction between all drugs and gender. I get different results for the interactions and I wonder why this is. For example, if I use the 0 group in each variable as a reference, GENLIN gives me a parameter estimate for the interaction H (group 1)*gender (group 1) and considers all other combinations redundant (01, 10, 00), while GENLOG considers all four combinations in this interaction redundant (00, 01, 10, 11), i.e. the parameter estimates for this interaction are zero.
Below is the syntax.

GENLIN deaths BY H M C B sex (ORDER=ASCENDING)
  /MODEL H M C B sex H*sex M*sex C*sex B*sex H*M H*C H*B M*C M*B C*B H*M*C H*M*B M*C*B
H*C*B H*M*sex H*C*sex H*B*sex M*C*sex M*B*sex C*B*sex H*M*C*sex H*M*B*sex H*C*B*sex M*C*B*sex H*M*C*B H*M*C*B*sex INTERCEPT=YES
 DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=NEWTON SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL.

and

WEIGHT BY deaths.
GENLOG H M C B sex
  /MODEL=POISSON
  /PRINT=FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV
  /PLOT=RESID(ADJRESID) NORMPROB(ADJRESID)
  /CRITERIA=CIN(95) ITERATE(20) CONVERGE(0.001) DELTA(.5)
  /DESIGN H M C B sex B*C*H*M*sex B*C B*H B*M B*sex C*H C*M C*sex H*M H*sex M*sex B*C*H B*C*M B*C*sex B*H*M B*H*sex B*M*sex C*H*M C*H*sex C*M*sex H*M*sex B*C*H*M B*C*H*sex B*C*M*sex B*H*M*sex C*H*M*sex.
(despite the fact that I entered the drugs in the same order through the interactive mode, the program ordered them its own way eventually).

Is there something in the algorithms used for each procedure that causes the differences in results?

Thank you all
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Alex Reutter
DEATHS could be a count of the number of deaths observed for that covariate pattern.

Alex



From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        07/03/2013 02:08 PM
Subject:        Re: GENLOG vs GENLIN
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Why are you treating Deaths as a count?  A count regression model, as I
understand it, would be used when each individual has a count of events.
But death is a binary variable (0 or 1) for each individual -- and when you
WEIGHT by DEATHS, you are in essence getting a row for each individual where
the variable is either a 0 or a 1, not a count.  I should think you want a
logistic regression model, or some other model for a binary outcome (e.g., a
model yielding the relative risk or risk difference).

HTH.




Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Rich Ulrich

It looks to me, too, like DEATHS could be the counts for each pattern.

In addition to the counts of DEATHS, the usual epidemiological study
would want to have the counts, for each pattern, of NOT-dying.

You have great difficulty in drawing many inferences if you don't have
the "denominators" for risk.  Or, to put it another way, a whole lot of
those factors and interactions should be treated as "given" proportions
and not interesting for testing. 

- Maybe that makes it feasible to care about some specific 4-way and 5-way
interactions, which are ordinarily too complicated and prone to artifact
to be at all interesting. 

--
Rich Ulrich


Date: Wed, 3 Jul 2013 14:53:33 -0500
From: [hidden email]
Subject: Re: GENLOG vs GENLIN
To: [hidden email]

DEATHS could be a count of the number of deaths observed for that covariate pattern.

Alex



From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        07/03/2013 02:08 PM
Subject:        Re: GENLOG vs GENLIN
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Why are you treating Deaths as a count?  A count regression model, as I
understand it, would be used when each individual has a count of events.
But death is a binary variable (0 or 1) for each individual -- and when you
WEIGHT by DEATHS, you are in essence getting a row for each individual where
the variable is either a 0 or a 1, not a count.  I should think you want a
logistic regression model, or some other model for a binary outcome (e.g., a
model yielding the relative risk or risk difference).

HTH.




Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Bruce Weaver
Administrator
In reply to this post by Alex Reutter
Alex Reutter wrote
DEATHS could be a count of the number of deaths observed for that covariate pattern.
I understand that.  But note what the OP said (emphasis added):

"I know that I can use either GENLOG, which I first weight by the "deaths" variable which is the counts, i.e. the number of deaths, for each drug-gender combination..."

When one "WEIGHTS by deaths", a dataset that has one row per covariate pattern is in essence becomes a dataset with one case per person, with the total number of rows = the sum of the counts.

Here's an example of what I think is going on.


NEW FILE.
DATASET CLOSE all.

* Generate a summary data set with counts.

DATA LIST LIST / Male Exposed Disease kount (4f5.0) .
BEGIN DATA.
1 1 1 160
1 1 0 80
1 0 1 440
1 0 0 320
0 1 1 240
0 1 0 330
0 0 1 160
0 0 0 270
END DATA.

DATASET Name Summary.

VALUE LABELS
  Male 1 'Male' 0 'Female' /
  Exposed 1 'Yes'  0 'No' /
  Disease 1 'Yes (case)' 0 'No (control)' .

WEIGHT by kount.

LOGISTIC REGRESSION VAR=exposed
  /METHOD=ENTER disease male
  /PRINT=CI(95)
  /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

WEIGHT off.

* Now write a file with one row per person
* and run the same model with no WEIGHTING.

LOOP id = 1 to kount.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = id Male to Disease.
END LOOP.
EXECUTE.

GET FILE = "C:\Temp\Junk.sav".
DATASET NAME raw.

DATASET ACTIVATE raw.
LOGISTIC REGRESSION VAR=exposed
  /METHOD=ENTER disease male
  /PRINT=CI(95)
  /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

* Clean up the junk.

DATASET ACTIVATE summary.
DATASET CLOSE raw.
ERASE FILE "C:\Temp\Junk.sav".

* End of example.

HTH.


Alex Reutter wrote
DEATHS could be a count of the number of deaths observed for that
covariate pattern.

Alex



From:   Bruce Weaver <[hidden email]>
To:     [hidden email],
Date:   07/03/2013 02:08 PM
Subject:        Re: GENLOG vs GENLIN
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Why are you treating Deaths as a count?  A count regression model, as I
understand it, would be used when each individual has a count of events.
But death is a binary variable (0 or 1) for each individual -- and when
you
WEIGHT by DEATHS, you are in essence getting a row for each individual
where
the variable is either a 0 or a 1, not a count.  I should think you want a
logistic regression model, or some other model for a binary outcome (e.g.,
a
model yielding the relative risk or risk difference).

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Alex Reutter
In reply to this post by Rich Ulrich
Right.  If the patterns are roughly equally distributed, I think this should be okay, in the sense that you'll be able to discern the relative effect of each factor on DEATHS.  

If they are unequally distributed, then you'd need an offset variable (like "Aggregate months service" in the ship damage example; pg 204 of McCullagh & Nelder's _Generalized Linear Models_; example showing the use of GENLIN to fit this data at http://pic.dhe.ibm.com/infocenter/spssstat/v21r0m0/topic/com.ibm.spss.statistics.cs/genlin_ships_intro.htm)

Alex




From:        Rich Ulrich <[hidden email]>
To:        Alex Reutter/Burlington/IBM@IBMUS, SPSS list <[hidden email]>,
Date:        07/03/2013 03:27 PM
Subject:        RE: GENLOG vs GENLIN





It looks to me, too, like DEATHS could be the counts for each pattern.

In addition to the counts of DEATHS, the usual epidemiological study
would want to have the counts, for each pattern, of NOT-dying.

You have great difficulty in drawing many inferences if you don't have
the "denominators" for risk.  Or, to put it another way, a whole lot of
those factors and interactions should be treated as "given" proportions
and not interesting for testing.  

- Maybe that makes it feasible to care about some specific 4-way and 5-way
interactions, which are ordinarily too complicated and prone to artifact
to be at all interesting.  

--
Rich Ulrich


Date: Wed, 3 Jul 2013 14:53:33 -0500
From: [hidden email]
Subject: Re: GENLOG vs GENLIN
To: [hidden email]

DEATHS could be a count of the number of deaths observed for that covariate pattern.


Alex




From:        
Bruce Weaver <[hidden email]>
To:        
[hidden email],
Date:        
07/03/2013 02:08 PM
Subject:        
Re: GENLOG vs GENLIN
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




Why are you treating Deaths as a count?  A count regression model, as I
understand it, would be used when each individual has a count of events.
But death is a binary variable (0 or 1) for each individual -- and when you
WEIGHT by DEATHS, you are in essence getting a row for each individual where
the variable is either a 0 or a 1, not a count.  I should think you want a
logistic regression model, or some other model for a binary outcome (e.g., a
model yielding the relative risk or risk difference).

HTH.




Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Alex Reutter
In reply to this post by Bruce Weaver
Based on the OP's statement "I don't have an offset variable, e.g. population at risk. " and subsequent analysis, and using your example below, I don't think the OP has the equivalent of cases where Exposed = 0.

Alex




From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        07/03/2013 03:39 PM
Subject:        Re: GENLOG vs GENLIN
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Alex Reutter wrote
> DEATHS could be a count of the number of deaths observed for that
> covariate pattern.

I understand that.  But note what the OP said (emphasis added):

"I know that I can use either GENLOG, *which I first weight by the "deaths"
variable* which is the counts, i.e. the number of deaths, for each
drug-gender combination..."

When one "WEIGHTS by deaths", a dataset that has one row per covariate
pattern is in essence becomes a dataset with one case per person, with the
total number of rows = the sum of the counts.

Here's an example of what I think is going on.


NEW FILE.
DATASET CLOSE all.

* Generate a summary data set with counts.

DATA LIST LIST / Male Exposed Disease kount (4f5.0) .
BEGIN DATA.
1 1 1 160
1 1 0 80
1 0 1 440
1 0 0 320
0 1 1 240
0 1 0 330
0 0 1 160
0 0 0 270
END DATA.

DATASET Name Summary.

VALUE LABELS
 Male 1 'Male' 0       'Female' /
 Exposed 1 'Yes'  0 'No' /
 Disease 1 'Yes (case)' 0 'No (control)' .

*WEIGHT by kount.*

LOGISTIC REGRESSION VAR=exposed
 /METHOD=ENTER disease male
 /PRINT=CI(95)
 /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

WEIGHT off.

* Now write a file with one row per person
* and run the same model with no WEIGHTING.

LOOP id = 1 to kount.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = id Male to Disease.
END LOOP.
EXECUTE.

GET FILE = "C:\Temp\Junk.sav".
DATASET NAME raw.

DATASET ACTIVATE raw.
LOGISTIC REGRESSION VAR=exposed
 /METHOD=ENTER disease male
 /PRINT=CI(95)
 /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

* Clean up the junk.

DATASET ACTIVATE summary.
DATASET CLOSE raw.
ERASE FILE "C:\Temp\Junk.sav".

* End of example.

HTH.



Alex Reutter wrote
> DEATHS could be a count of the number of deaths observed for that
> covariate pattern.
>
> Alex
>
>
>
> From:   Bruce Weaver &lt;

> bruce.weaver@

> &gt;
> To:

> SPSSX-L@.uga

> ,
> Date:   07/03/2013 02:08 PM
> Subject:        Re: GENLOG vs GENLIN
> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;
>
>
>
> Why are you treating Deaths as a count?  A count regression model, as I
> understand it, would be used when each individual has a count of events.
> But death is a binary variable (0 or 1) for each individual -- and when
> you
> WEIGHT by DEATHS, you are in essence getting a row for each individual
> where
> the variable is either a 0 or a 1, not a count.  I should think you want a
> logistic regression model, or some other model for a binary outcome (e.g.,
> a
> model yielding the relative risk or risk difference).
>
> HTH.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/GENLOG-vs-GENLIN-tp5720986p5720992.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Bruce Weaver
Administrator
Ah, I see.  Well that makes things more difficult, doesn't it!  ;-)


Alex Reutter wrote
Based on the OP's statement "I don't have an offset variable, e.g.
population at risk. " and subsequent analysis, and using your example
below, I don't think the OP has the equivalent of cases where Exposed = 0.

Alex




From:   Bruce Weaver <[hidden email]>
To:     [hidden email],
Date:   07/03/2013 03:39 PM
Subject:        Re: GENLOG vs GENLIN
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Alex Reutter wrote
> DEATHS could be a count of the number of deaths observed for that
> covariate pattern.

I understand that.  But note what the OP said (emphasis added):

"I know that I can use either GENLOG, *which I first weight by the
"deaths"
variable* which is the counts, i.e. the number of deaths, for each
drug-gender combination..."

When one "WEIGHTS by deaths", a dataset that has one row per covariate
pattern is in essence becomes a dataset with one case per person, with the
total number of rows = the sum of the counts.

Here's an example of what I think is going on.


NEW FILE.
DATASET CLOSE all.

* Generate a summary data set with counts.

DATA LIST LIST / Male Exposed Disease kount (4f5.0) .
BEGIN DATA.
1 1 1 160
1 1 0 80
1 0 1 440
1 0 0 320
0 1 1 240
0 1 0 330
0 0 1 160
0 0 0 270
END DATA.

DATASET Name Summary.

VALUE LABELS
  Male 1 'Male' 0       'Female' /
  Exposed 1 'Yes'  0 'No' /
  Disease 1 'Yes (case)' 0 'No (control)' .

*WEIGHT by kount.*

LOGISTIC REGRESSION VAR=exposed
  /METHOD=ENTER disease male
  /PRINT=CI(95)
  /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

WEIGHT off.

* Now write a file with one row per person
* and run the same model with no WEIGHTING.

LOOP id = 1 to kount.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = id Male to Disease.
END LOOP.
EXECUTE.

GET FILE = "C:\Temp\Junk.sav".
DATASET NAME raw.

DATASET ACTIVATE raw.
LOGISTIC REGRESSION VAR=exposed
  /METHOD=ENTER disease male
  /PRINT=CI(95)
  /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

* Clean up the junk.

DATASET ACTIVATE summary.
DATASET CLOSE raw.
ERASE FILE "C:\Temp\Junk.sav".

* End of example.

HTH.



Alex Reutter wrote
> DEATHS could be a count of the number of deaths observed for that
> covariate pattern.
>
> Alex
>
>
>
> From:   Bruce Weaver <

> bruce.weaver@

> >
> To:

> SPSSX-L@.uga

> ,
> Date:   07/03/2013 02:08 PM
> Subject:        Re: GENLOG vs GENLIN
> Sent by:        "SPSSX(r) Discussion" <

> SPSSX-L@.uga

> >
>
>
>
> Why are you treating Deaths as a count?  A count regression model, as I
> understand it, would be used when each individual has a count of events.
> But death is a binary variable (0 or 1) for each individual -- and when
> you
> WEIGHT by DEATHS, you are in essence getting a row for each individual
> where
> the variable is either a 0 or a 1, not a count.  I should think you want
a
> logistic regression model, or some other model for a binary outcome
(e.g.,
> a
> model yielding the relative risk or risk difference).
>
> HTH.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/GENLOG-vs-GENLIN-tp5720986p5720992.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
In reply to this post by Bruce Weaver
hello, thank you for the reply,
I'm treating deaths as a count because I have aggregated data, maybe I should have explicitly said that. In all the stuff I read about general loglinear regression, and in the spss tutorial "Using General Loglinear Analysis to Model Accident Rates" it is mentioned that "Since the accidents (deaths here) have been aggregated, you first need to weight the cases by Accidents. From the menus choose:

Data > Weight Cases... Select Weight cases by.

Select Accidents as the frequency variable.

â–º Click OK." . The syntax ends up as:
WEIGHT
  BY accid .
GENLOG
  agecat gender  /CSTRUCTURE = pop
  /MODEL = POISSON etc.

So, since I have aggregated data I thought I should weight cases by deaths and run GENLOG, as the example shows. In the accidents.sav used for that tutorial the accidents variable shows how many accidents there were for individuals who belong in each combination of age and gender categories. This is what my file looks like approximately, but with more rows as I have more factors, and without the population at risk variable.
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
In reply to this post by Alex Reutter
Yes, as I said in my original post, it is the number of deaths for each drug-gender category combination or as you say, for each specific covariate pattern, e.g. 120 deaths of males using heroin, not using any other drug, 15 deaths of females using heroin and cocaine and not using any other drug etc.
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
In reply to this post by Rich Ulrich
Thank you,
however I don't have the case-control design which would include the numbers of not-dying, in which case it would be a matter of making a dead/not dead binary dependent variable and carry out logistic regression.
I don't have the denominators or population at risk in the data I've been given, I don't know if it would be possible to get them from when the data was collected or by some other means, or if it is not possible.
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
This post was updated on .
In reply to this post by Bruce Weaver
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Bruce Weaver
Administrator
xenia wrote
thank you,
if when one "WEIGHTS by deaths", a dataset that has one row per covariate pattern becomes a dataset with one case per person, with the total number of rows = the sum of the counts, then why is the accidents.sav weighted by accidents and then GENLOG is run in the tutorial?
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fgenlin_ships_intro.htm
That link goes to the wrong tutorial, I think.  To find the one you want (with accident rates for ships), I had to navigate to:

Loglinear Modeling >
  General Loglinear Analysis >
     Using General Loglinear Analysis to Model Accident Rates

The accidents.sav file has only one row per covariate pattern, which is why it uses "WEIGHT by accid".  But if you write the data to a file with one row per accident (see below), you can run the model without using WEIGHT, and get exactly the same results.  See the example below.

Given that the accidents.sav data file is so small, I recreated it here with a DATA LIST command so that folks who don't have ready access to the sample files can play along if they wish.  

HTH.


NEW FILE.
DATASET CLOSE all.

* The following DATA LIST command reproduces
* the data in sample file accidents.sav.

DATA LIST list / agecat gender (2f1) accid pop (2f8.0).
BEGIN DATA
1 1 57997 198522
2 1 57113 203200
3 1 54123 200744
1 0 63936 187791
2 0 64835 195714
3 0 66804 208239
END DATA.

* The tutorial suggests the following analysis.

WEIGHT BY accid .
GENLOG
  agecat gender  /CSTRUCTURE = pop
  /MODEL = POISSON
  /PRINT = FREQ RESID DEV ADJRESID ESTIM CORR COV
  /CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)
  /DESIGN .

WEIGHT off.

* Use XSAVE to write the data to a file that has one row per accident,
* which is presumably one row per person.


LOOP id = 1 to accid.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = agecat gender pop.
END LOOP.
EXECUTE.

* Open the file with one row per accident.

GET FILE = "C:\Temp\Junk.sav".

CROSSTABS agecat by gender.

* Notice that the cell counts match the values
* of variable accid in the original file.

* No WEIGHT command this time.

GENLOG
  agecat gender  /CSTRUCTURE = pop
  /MODEL = POISSON
  /PRINT = FREQ RESID DEV ADJRESID ESTIM CORR COV
  /CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)
  /DESIGN .

* Results match those from the tutorial using WEIGHTED data.

* Clean up the junk.

NEW FILE.
DATASET CLOSE all.
ERASE FILE "C:\Temp\Junk.sav".



xenia wrote
thank you,
if when one "WEIGHTS by deaths", a dataset that has one row per covariate pattern becomes a dataset with one case per person, with the total number of rows = the sum of the counts, then why is the accidents.sav weighted by accidents and then GENLOG is run in the tutorial?
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fgenlin_ships_intro.htm

More importantly, is it wrong to run GENLIN or GENLOG without an offset variable?

My file is as in the attached file, this is for males and it's similar for females. The dependent is the deaths variable, there is no "disease" variable, and the independents are gender and exposure or not to drugs.

I don't think I can use any of this as an offset variable, as the only numbers I have are those of deaths.

Thanks again
death_count.docx
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
In reply to this post by Alex Reutter
Thank you, I have checked and this was the problem.

Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
In reply to this post by Bruce Weaver
Thank you.
So, let's suppose that my file also has one row per covariate pattern. I should be able to weight by deaths and run a GENLOG. I do not wish to make a file with one row per person or per accident, I want to be able to run the analysis with the aggregated data as it is. So, if my aggregate data file has one row per covariate pattern I can weight by deaths and run the GENLOG.
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

xenia
In reply to this post by xenia
Hello all and thank you for all the contributions,
after making sure that the file had one row per covariate pattern, and doing WEIGHT by deaths and GENLOG, making sure that the model was ordered in GENLOG in the same way as it was in GENLIN (as Alex suggested), I did get the same results for the interactions from the two procedures, so my initial query was answered.
However, after reading all the posts and the discussion by everyone, I have one more question:  
is it wrong to run GENLOG or GENLIN for count data if I don't have an offset variable?

Many thanks to all
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Bruce Weaver
Administrator
I don't claim great expertise in count regression models, and I can't give a nice concise answer to your question.  But here's how I would approach trying to understand what the OFFSET value is doing.  This is based on the tutorial at:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fgenlin_ships_intro.htm


NEW FILE.
DATASET CLOSE all.
GET FILE = "C:\SPSSdata\ships.sav". /* Change path if necessary.

* Run model with OFFSET = log_months_service, as in the tutorial.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
  /MODEL type construction operation  INTERCEPT=YES  OFFSET=log_months_service
    DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
    MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
    ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
  /MISSING CLASSMISSING=EXCLUDE
  /PRINT MODELINFO FIT SUMMARY SOLUTION .

* Now see what happens when different OFFSET values are used.

* Run model with OFFSET  option removed.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
  /MODEL type construction operation  INTERCEPT=YES  
    DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
    MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
    ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
  /MISSING CLASSMISSING=EXCLUDE
  /PRINT MODELINFO FIT SUMMARY SOLUTION .


* Now try various fixed OFFSET values.

descriptives log_months_service .

* OFFSET = MIN(log_months_service), or about 4.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
  /MODEL type construction operation  INTERCEPT=YES  OFFSET=4
    DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
    MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
    ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
  /MISSING CLASSMISSING=EXCLUDE
  /PRINT MODELINFO FIT SUMMARY SOLUTION .

* OFFSET = MEAN(log_months_service), or about 7.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
  /MODEL type construction operation  INTERCEPT=YES  OFFSET=7
    DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
    MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
    ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
  /MISSING CLASSMISSING=EXCLUDE
  /PRINT MODELINFO FIT SUMMARY SOLUTION.

* OFFSET = MAX(log_months_service), or about 11.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
  /MODEL type construction operation  INTERCEPT=YES  OFFSET=11
    DISTRIBUTION=POISSON LINK=LOG
  /CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
    MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
    ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
  /MISSING CLASSMISSING=EXCLUDE
  /PRINT MODELINFO FIT SUMMARY SOLUTION.


HTH.


xenia wrote
Hello all and thank you for all the contributions,
after making sure that the file had one row per covariate pattern, and doing WEIGHT by deaths and GENLOG, making sure that the model was ordered in GENLOG in the same way as it was in GENLIN (as Alex suggested), I did get the same results for the interactions from the two procedures, so my initial query was answered.
However, after reading all the posts and the discussion by everyone, I have one more question:  
is it wrong to run GENLOG or GENLIN for count data if I don't have an offset variable?

Many thanks to all
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: GENLOG vs GENLIN

Rich Ulrich
In reply to this post by xenia
[one more try on posting - through Nabble this time.]  
[My four attempts directly to the List on July 5 failed.]

You can run the analysis without an "offset variable" of exposure
or counts, but you can't draw many conclusions about "significant"
tests.  Does the high (or low) count reflect the population size, or
does it reflect "risk"?   If most of the population is White and Female,
you could see main effects for high Deaths for W and F  without any
valid implication about higher Risk.  Or, the excess in these population
totals could weaken or mask the evidence that a group has low risk.

I presume that Risk is more interesting, but you only have data for
"frequency."    I think that the most useful report from these data
might be the univariate counts:  What does characterize deaths?

Beyond that, if there are *no*  interactions that show up as significant,
you might take that as evidence that these Main effect, univariate results are
sufficient; that is the easiest conclusion to defend.  Any narrative that you
want to create about some interaction has to take into account the chance
that the at-risk population shows the same disproportion in exposure, and
so there is nothing special about this interaction.

--
Rich Ulrich



xenia wrote
Hello all and thank you for all the contributions,
after making sure that the file had one row per covariate pattern, and doing WEIGHT by deaths and GENLOG, making sure that the model was ordered in GENLOG in the same way as it was in GENLIN (as Alex suggested), I did get the same results for the interactions from the two procedures, so my initial query was answered.
However, after reading all the posts and the discussion by everyone, I have one more question:  
is it wrong to run GENLOG or GENLIN for count data if I don't have an offset variable?

Many thanks to all