SPSSX Discussion

GENLOG vs GENLIN

Classic

List

Threaded

19 messages Options

xenia

GENLOG vs GENLIN

hello all,
I was wondering whether someone could enlighten me regarding the following problem. I have data on counts of death by drug use and gender, the drugs are four, let's say H, M, C, B. Deaths is the count/continuous variable, all other variables are binary i.e. use/no use for drug variables, male/female for gender. I know that I can use either GENLOG, which I first weight by the "deaths" variable which is the counts, i.e. the number of deaths, for each drug-gender combination, or GENLIN which covers loglinear models for count data. I don't have an offset variable, e.g. population at risk. I did both GENLIN and GENLOG. I fit all the main effects, all the 2-way interactions between drugs, all the 2-way interactions between each drug and gender, all the 3-way interactions between drugs, all the 3-way interactions between two drugs and gender, the 4-way interaction between all drugs, all the 4-way interactions between three drugs and gender, the 5-way interaction between all drugs and gender. I get different results for the interactions and I wonder why this is. For example, if I use the 0 group in each variable as a reference, GENLIN gives me a parameter estimate for the interaction H (group 1)*gender (group 1) and considers all other combinations redundant (01, 10, 00), while GENLOG considers all four combinations in this interaction redundant (00, 01, 10, 11), i.e. the parameter estimates for this interaction are zero.
Below is the syntax.

GENLIN deaths BY H M C B sex (ORDER=ASCENDING)
/MODEL H M C B sex H*sex M*sex C*sex B*sex H*M H*C H*B M*C M*B C*B H*M*C H*M*B M*C*B
H*C*B H*M*sex H*C*sex H*B*sex M*C*sex M*B*sex C*B*sex H*M*C*sex H*M*B*sex H*C*B*sex M*C*B*sex H*M*C*B H*M*C*B*sex INTERCEPT=YES
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=NEWTON SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL.

and

WEIGHT BY deaths.
GENLOG H M C B sex
/MODEL=POISSON
/PRINT=FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV
/PLOT=RESID(ADJRESID) NORMPROB(ADJRESID)
/CRITERIA=CIN(95) ITERATE(20) CONVERGE(0.001) DELTA(.5)
/DESIGN H M C B sex B*C*H*M*sex B*C B*H B*M B*sex C*H C*M C*sex H*M H*sex M*sex B*C*H B*C*M B*C*sex B*H*M B*H*sex B*M*sex C*H*M C*H*sex C*M*sex H*M*sex B*C*H*M B*C*H*sex B*C*M*sex B*H*M*sex C*H*M*sex.
(despite the fact that I entered the drugs in the same order through the interactive mode, the program ordered them its own way eventually).

Is there something in the algorithms used for each procedure that causes the differences in results?

Thank you all

Alex Reutter

Re: GENLOG vs GENLIN

I would guess it's because in the GENLIN syntax, H*sex appears before B*C*H*M*sex. What happens when you run GENLOG with the same design as GENLIN? That is,

WEIGHT BY deaths. GENLOG H M C B sex /MODEL=POISSON /PRINT=FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV /PLOT=RESID(ADJRESID) NORMPROB(ADJRESID) /CRITERIA=CIN(95) ITERATE(20) CONVERGE(0.001) DELTA(.5) /DESIGN H M C B sex H*sex M*sex C*sex B*sex H*M H*C H*B M*C M*B C*B H*M*C H*M*B M*C*B H*C*B H*M*sex H*C*sex H*B*sex M*C*sex M*B*sex C*B*sex H*M*C*sex H*M*B*sex H*C*B*sex M*C*B*sex H*M*C*B H*M*C*B*sex.

Alex

Bruce Weaver

Re: GENLOG vs GENLIN

Administrator

In reply to this post by xenia

Why are you treating Deaths as a count? A count regression model, as I understand it, would be used when each individual has a count of events. But death is a binary variable (0 or 1) for each individual -- and when you WEIGHT by DEATHS, you are in essence getting a row for each individual where the variable is either a 0 or a 1, not a count. I should think you want a logistic regression model, or some other model for a binary outcome (e.g., a model yielding the relative risk or risk difference).

HTH.

xenia wrote

hello all,
I was wondering whether someone could enlighten me regarding the following problem. I have data on counts of death by drug use and gender, the drugs are four, let's say H, M, C, B. Deaths is the count/continuous variable, all other variables are binary i.e. use/no use for drug variables, male/female for gender. I know that I can use either GENLOG, which I first weight by the "deaths" variable which is the counts, i.e. the number of deaths, for each drug-gender combination, or GENLIN which covers loglinear models for count data. I don't have an offset variable, e.g. population at risk. I did both GENLIN and GENLOG. I fit all the main effects, all the 2-way interactions between drugs, all the 2-way interactions between each drug and gender, all the 3-way interactions between drugs, all the 3-way interactions between two drugs and gender, the 4-way interaction between all drugs, all the 4-way interactions between three drugs and gender, the 5-way interaction between all drugs and gender. I get different results for the interactions and I wonder why this is. For example, if I use the 0 group in each variable as a reference, GENLIN gives me a parameter estimate for the interaction H (group 1)*gender (group 1) and considers all other combinations redundant (01, 10, 00), while GENLOG considers all four combinations in this interaction redundant (00, 01, 10, 11), i.e. the parameter estimates for this interaction are zero.
Below is the syntax.

GENLIN deaths BY H M C B sex (ORDER=ASCENDING)
/MODEL H M C B sex H*sex M*sex C*sex B*sex H*M H*C H*B M*C M*B C*B H*M*C H*M*B M*C*B
H*C*B H*M*sex H*C*sex H*B*sex M*C*sex M*B*sex C*B*sex H*M*C*sex H*M*B*sex H*C*B*sex M*C*B*sex H*M*C*B H*M*C*B*sex INTERCEPT=YES
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=NEWTON SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL.

and

WEIGHT BY deaths.
GENLOG H M C B sex
/MODEL=POISSON
/PRINT=FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV
/PLOT=RESID(ADJRESID) NORMPROB(ADJRESID)
/CRITERIA=CIN(95) ITERATE(20) CONVERGE(0.001) DELTA(.5)
/DESIGN H M C B sex B*C*H*M*sex B*C B*H B*M B*sex C*H C*M C*sex H*M H*sex M*sex B*C*H B*C*M B*C*sex B*H*M B*H*sex B*M*sex C*H*M C*H*sex C*M*sex H*M*sex B*C*H*M B*C*H*sex B*C*M*sex B*H*M*sex C*H*M*sex.
(despite the fact that I entered the drugs in the same order through the interactive mode, the program ordered them its own way eventually).

Is there something in the algorithms used for each procedure that causes the differences in results?

Thank you all

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Alex Reutter

Re: GENLOG vs GENLIN

DEATHS could be a count of the number of deaths observed for that covariate pattern.

Alex

From: Bruce Weaver <[hidden email]>
To: [hidden email],
Date: 07/03/2013 02:08 PM
Subject: Re: GENLOG vs GENLIN
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Rich Ulrich

Re: GENLOG vs GENLIN

It looks to me, too, like DEATHS could be the counts for each pattern.

In addition to the counts of DEATHS, the usual epidemiological study
would want to have the counts, for each pattern, of NOT-dying.

You have great difficulty in drawing many inferences if you don't have
the "denominators" for risk. Or, to put it another way, a whole lot of
those factors and interactions should be treated as "given" proportions
and not interesting for testing.

- Maybe that makes it feasible to care about some specific 4-way and 5-way
interactions, which are ordinarily too complicated and prone to artifact
to be at all interesting.

--
Rich Ulrich

Date: Wed, 3 Jul 2013 14:53:33 -0500
From: [hidden email]
Subject: Re: GENLOG vs GENLIN
To: [hidden email]

DEATHS could be a count of the number of deaths observed for that covariate pattern.

Alex

From: Bruce Weaver <[hidden email]>
To: [hidden email],
Date: 07/03/2013 02:08 PM
Subject: Re: GENLOG vs GENLIN
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Bruce Weaver

Re: GENLOG vs GENLIN

Administrator

In reply to this post by Alex Reutter

Alex Reutter wrote

DEATHS could be a count of the number of deaths observed for that covariate pattern.

I understand that. But note what the OP said (emphasis added):

"I know that I can use either GENLOG, which I first weight by the "deaths" variable which is the counts, i.e. the number of deaths, for each drug-gender combination..."

When one "WEIGHTS by deaths", a dataset that has one row per covariate pattern is in essence becomes a dataset with one case per person, with the total number of rows = the sum of the counts.

Here's an example of what I think is going on.

NEW FILE.
DATASET CLOSE all.

* Generate a summary data set with counts.

DATA LIST LIST / Male Exposed Disease kount (4f5.0) .
BEGIN DATA.
1 1 1 160
1 1 0 80
1 0 1 440
1 0 0 320
0 1 1 240
0 1 0 330
0 0 1 160
0 0 0 270
END DATA.

DATASET Name Summary.

VALUE LABELS
Male 1 'Male' 0 'Female' /
Exposed 1 'Yes' 0 'No' /
Disease 1 'Yes (case)' 0 'No (control)' .

WEIGHT by kount.

LOGISTIC REGRESSION VAR=exposed
/METHOD=ENTER disease male
/PRINT=CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

WEIGHT off.

* Now write a file with one row per person
* and run the same model with no WEIGHTING.

LOOP id = 1 to kount.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = id Male to Disease.
END LOOP.
EXECUTE.

GET FILE = "C:\Temp\Junk.sav".
DATASET NAME raw.

DATASET ACTIVATE raw.
LOGISTIC REGRESSION VAR=exposed
/METHOD=ENTER disease male
/PRINT=CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

* Clean up the junk.

DATASET ACTIVATE summary.
DATASET CLOSE raw.
ERASE FILE "C:\Temp\Junk.sav".

* End of example.

HTH.

Alex Reutter wrote

DEATHS could be a count of the number of deaths observed for that
covariate pattern.

Alex

From: Bruce Weaver <[hidden email]>
To: [hidden email],
Date: 07/03/2013 02:08 PM
Subject: Re: GENLOG vs GENLIN
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Why are you treating Deaths as a count? A count regression model, as I
understand it, would be used when each individual has a count of events.
But death is a binary variable (0 or 1) for each individual -- and when
you
WEIGHT by DEATHS, you are in essence getting a row for each individual
where
the variable is either a 0 or a 1, not a count. I should think you want a
logistic regression model, or some other model for a binary outcome (e.g.,
a
model yielding the relative risk or risk difference).

HTH.

Alex Reutter

Re: GENLOG vs GENLIN

In reply to this post by Rich Ulrich

Right. If the patterns are roughly equally distributed, I think this should be okay, in the sense that you'll be able to discern the relative effect of each factor on DEATHS.

If they are unequally distributed, then you'd need an offset variable (like "Aggregate months service" in the ship damage example; pg 204 of McCullagh & Nelder's _Generalized Linear Models_; example showing the use of GENLIN to fit this data at http://pic.dhe.ibm.com/infocenter/spssstat/v21r0m0/topic/com.ibm.spss.statistics.cs/genlin_ships_intro.htm)

Alex

From: Rich Ulrich <[hidden email]>
To: Alex Reutter/Burlington/IBM@IBMUS, SPSS list <[hidden email]>,
Date: 07/03/2013 03:27 PM
Subject: RE: GENLOG vs GENLIN

Why are you treating Deaths as a count? A count regression model, as I
understand it, would be used when each individual has a count of events.
But death is a binary variable (0 or 1) for each individual -- and when you
WEIGHT by DEATHS, you are in essence getting a row for each individual where
the variable is either a 0 or a 1, not a count. I should think you want a
logistic regression model, or some other model for a binary outcome (e.g., a
model yielding the relative risk or risk difference).

HTH.

Alex Reutter

Re: GENLOG vs GENLIN

In reply to this post by Bruce Weaver

Based on the OP's statement "I don't have an offset variable, e.g. population at risk. " and subsequent analysis, and using your example below, I don't think the OP has the equivalent of cases where Exposed = 0.

Alex

From: Bruce Weaver <[hidden email]>
To: [hidden email],
Date: 07/03/2013 03:39 PM
Subject: Re: GENLOG vs GENLIN
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Alex Reutter wrote > DEATHS could be a count of the number of deaths observed for that > covariate pattern. I understand that. But note what the OP said (emphasis added): "I know that I can use either GENLOG, *which I first weight by the "deaths" variable* which is the counts, i.e. the number of deaths, for each drug-gender combination..." When one "WEIGHTS by deaths", a dataset that has one row per covariate pattern is in essence becomes a dataset with one case per person, with the total number of rows = the sum of the counts. Here's an example of what I think is going on. NEW FILE. DATASET CLOSE all. * Generate a summary data set with counts. DATA LIST LIST / Male Exposed Disease kount (4f5.0) . BEGIN DATA. 1 1 1 160 1 1 0 80 1 0 1 440 1 0 0 320 0 1 1 240 0 1 0 330 0 0 1 160 0 0 0 270 END DATA. DATASET Name Summary. VALUE LABELS Male 1 'Male' 0 'Female' / Exposed 1 'Yes' 0 'No' / Disease 1 'Yes (case)' 0 'No (control)' . *WEIGHT by kount.* LOGISTIC REGRESSION VAR=exposed /METHOD=ENTER disease male /PRINT=CI(95) /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . WEIGHT off. * Now write a file with one row per person * and run the same model with no WEIGHTING. LOOP id = 1 to kount. - XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = id Male to Disease. END LOOP. EXECUTE. GET FILE = "C:\Temp\Junk.sav". DATASET NAME raw. DATASET ACTIVATE raw. LOGISTIC REGRESSION VAR=exposed /METHOD=ENTER disease male /PRINT=CI(95) /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . * Clean up the junk. DATASET ACTIVATE summary. DATASET CLOSE raw. ERASE FILE "C:\Temp\Junk.sav". * End of example. HTH. Alex Reutter wrote > DEATHS could be a count of the number of deaths observed for that > covariate pattern. > > Alex > > > > From: Bruce Weaver < > bruce.weaver@ > > > To: > SPSSX-L@.uga > , > Date: 07/03/2013 02:08 PM > Subject: Re: GENLOG vs GENLIN > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > > > > Why are you treating Deaths as a count? A count regression model, as I > understand it, would be used when each individual has a count of events. > But death is a binary variable (0 or 1) for each individual -- and when > you > WEIGHT by DEATHS, you are in essence getting a row for each individual > where > the variable is either a 0 or a 1, not a count. I should think you want a > logistic regression model, or some other model for a binary outcome (e.g., > a > model yielding the relative risk or risk difference). > > HTH. ----- -- Bruce Weaver [hidden email]http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/GENLOG-vs-GENLIN-tp5720986p5720992.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Bruce Weaver

Re: GENLOG vs GENLIN

Administrator

Ah, I see. Well that makes things more difficult, doesn't it! ;-)

Alex Reutter wrote

Based on the OP's statement "I don't have an offset variable, e.g.
population at risk. " and subsequent analysis, and using your example
below, I don't think the OP has the equivalent of cases where Exposed = 0.

Alex

From: Bruce Weaver <[hidden email]>
To: [hidden email],
Date: 07/03/2013 03:39 PM
Subject: Re: GENLOG vs GENLIN
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Alex Reutter wrote
> DEATHS could be a count of the number of deaths observed for that
> covariate pattern.

I understand that. But note what the OP said (emphasis added):

"I know that I can use either GENLOG, *which I first weight by the
"deaths"
variable* which is the counts, i.e. the number of deaths, for each
drug-gender combination..."

When one "WEIGHTS by deaths", a dataset that has one row per covariate
pattern is in essence becomes a dataset with one case per person, with the
total number of rows = the sum of the counts.

Here's an example of what I think is going on.

NEW FILE.
DATASET CLOSE all.

* Generate a summary data set with counts.

DATA LIST LIST / Male Exposed Disease kount (4f5.0) .
BEGIN DATA.
1 1 1 160
1 1 0 80
1 0 1 440
1 0 0 320
0 1 1 240
0 1 0 330
0 0 1 160
0 0 0 270
END DATA.

DATASET Name Summary.

VALUE LABELS
Male 1 'Male' 0 'Female' /
Exposed 1 'Yes' 0 'No' /
Disease 1 'Yes (case)' 0 'No (control)' .

*WEIGHT by kount.*

LOGISTIC REGRESSION VAR=exposed
/METHOD=ENTER disease male
/PRINT=CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

WEIGHT off.

* Now write a file with one row per person
* and run the same model with no WEIGHTING.

LOOP id = 1 to kount.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = id Male to Disease.
END LOOP.
EXECUTE.

GET FILE = "C:\Temp\Junk.sav".
DATASET NAME raw.

DATASET ACTIVATE raw.
LOGISTIC REGRESSION VAR=exposed
/METHOD=ENTER disease male
/PRINT=CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

* Clean up the junk.

DATASET ACTIVATE summary.
DATASET CLOSE raw.
ERASE FILE "C:\Temp\Junk.sav".

* End of example.

HTH.

Alex Reutter wrote
> DEATHS could be a count of the number of deaths observed for that
> covariate pattern.
>
> Alex
>
>
>
> From: Bruce Weaver <

> bruce.weaver@

> >
> To:

> SPSSX-L@.uga

> ,
> Date: 07/03/2013 02:08 PM
> Subject: Re: GENLOG vs GENLIN
> Sent by: "SPSSX(r) Discussion" <

> SPSSX-L@.uga

> >
>
>
>
> Why are you treating Deaths as a count? A count regression model, as I
> understand it, would be used when each individual has a count of events.
> But death is a binary variable (0 or 1) for each individual -- and when
> you
> WEIGHT by DEATHS, you are in essence getting a row for each individual
> where
> the variable is either a 0 or a 1, not a count. I should think you want
a
> logistic regression model, or some other model for a binary outcome
(e.g.,
> a
> model yielding the relative risk or risk difference).
>
> HTH.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/GENLOG-vs-GENLIN-tp5720986p5720992.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

xenia

Re: GENLOG vs GENLIN

In reply to this post by Bruce Weaver

hello, thank you for the reply,
I'm treating deaths as a count because I have aggregated data, maybe I should have explicitly said that. In all the stuff I read about general loglinear regression, and in the spss tutorial "Using General Loglinear Analysis to Model Accident Rates" it is mentioned that "Since the accidents (deaths here) have been aggregated, you first need to weight the cases by Accidents. From the menus choose:

Data > Weight Cases... Select Weight cases by.

Select Accidents as the frequency variable.

► Click OK." . The syntax ends up as:
WEIGHT
BY accid .
GENLOG
agecat gender /CSTRUCTURE = pop
/MODEL = POISSON etc.

So, since I have aggregated data I thought I should weight cases by deaths and run GENLOG, as the example shows. In the accidents.sav used for that tutorial the accidents variable shows how many accidents there were for individuals who belong in each combination of age and gender categories. This is what my file looks like approximately, but with more rows as I have more factors, and without the population at risk variable.

xenia

Re: GENLOG vs GENLIN

In reply to this post by Alex Reutter

Yes, as I said in my original post, it is the number of deaths for each drug-gender category combination or as you say, for each specific covariate pattern, e.g. 120 deaths of males using heroin, not using any other drug, 15 deaths of females using heroin and cocaine and not using any other drug etc.

xenia

Re: GENLOG vs GENLIN

In reply to this post by Rich Ulrich

Thank you,
however I don't have the case-control design which would include the numbers of not-dying, in which case it would be a matter of making a dead/not dead binary dependent variable and carry out logistic regression.
I don't have the denominators or population at risk in the data I've been given, I don't know if it would be possible to get them from when the data was collected or by some other means, or if it is not possible.

xenia

Re: GENLOG vs GENLIN

This post was updated on .

In reply to this post by Bruce Weaver

CONTENTS DELETED

The author has deleted this message.

Bruce Weaver

Re: GENLOG vs GENLIN

Administrator

xenia wrote

thank you,
if when one "WEIGHTS by deaths", a dataset that has one row per covariate pattern becomes a dataset with one case per person, with the total number of rows = the sum of the counts, then why is the accidents.sav weighted by accidents and then GENLOG is run in the tutorial?
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fgenlin_ships_intro.htm

That link goes to the wrong tutorial, I think. To find the one you want (with accident rates for ships), I had to navigate to:

Loglinear Modeling >
General Loglinear Analysis >
Using General Loglinear Analysis to Model Accident Rates

The accidents.sav file has only one row per covariate pattern, which is why it uses "WEIGHT by accid". But if you write the data to a file with one row per accident (see below), you can run the model without using WEIGHT, and get exactly the same results. See the example below.

Given that the accidents.sav data file is so small, I recreated it here with a DATA LIST command so that folks who don't have ready access to the sample files can play along if they wish.

HTH.

NEW FILE.
DATASET CLOSE all.

* The following DATA LIST command reproduces
* the data in sample file accidents.sav.

DATA LIST list / agecat gender (2f1) accid pop (2f8.0).
BEGIN DATA
1 1 57997 198522
2 1 57113 203200
3 1 54123 200744
1 0 63936 187791
2 0 64835 195714
3 0 66804 208239
END DATA.

* The tutorial suggests the following analysis.

WEIGHT BY accid .
GENLOG
agecat gender /CSTRUCTURE = pop
/MODEL = POISSON
/PRINT = FREQ RESID DEV ADJRESID ESTIM CORR COV
/CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)
/DESIGN .

WEIGHT off.

* Use XSAVE to write the data to a file that has one row per accident,
* which is presumably one row per person.

LOOP id = 1 to accid.
- XSAVE OUTFILE = "C:\Temp\Junk.sav" / Keep = agecat gender pop.
END LOOP.
EXECUTE.

* Open the file with one row per accident.

GET FILE = "C:\Temp\Junk.sav".

CROSSTABS agecat by gender.

* Notice that the cell counts match the values
* of variable accid in the original file.

* No WEIGHT command this time.

GENLOG
agecat gender /CSTRUCTURE = pop
/MODEL = POISSON
/PRINT = FREQ RESID DEV ADJRESID ESTIM CORR COV
/CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)
/DESIGN .

* Results match those from the tutorial using WEIGHTED data.

* Clean up the junk.

NEW FILE.
DATASET CLOSE all.
ERASE FILE "C:\Temp\Junk.sav".

xenia wrote

thank you,
if when one "WEIGHTS by deaths", a dataset that has one row per covariate pattern becomes a dataset with one case per person, with the total number of rows = the sum of the counts, then why is the accidents.sav weighted by accidents and then GENLOG is run in the tutorial?
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fgenlin_ships_intro.htm

More importantly, is it wrong to run GENLIN or GENLOG without an offset variable?

My file is as in the attached file, this is for males and it's similar for females. The dependent is the deaths variable, there is no "disease" variable, and the independents are gender and exposure or not to drugs.

I don't think I can use any of this as an offset variable, as the only numbers I have are those of deaths.

Thanks again
death_count.docx

xenia

Re: GENLOG vs GENLIN

In reply to this post by Alex Reutter

Thank you, I have checked and this was the problem.

xenia

Re: GENLOG vs GENLIN

In reply to this post by Bruce Weaver

Thank you.
So, let's suppose that my file also has one row per covariate pattern. I should be able to weight by deaths and run a GENLOG. I do not wish to make a file with one row per person or per accident, I want to be able to run the analysis with the aggregated data as it is. So, if my aggregate data file has one row per covariate pattern I can weight by deaths and run the GENLOG.

xenia

Re: GENLOG vs GENLIN

In reply to this post by xenia

Hello all and thank you for all the contributions,
after making sure that the file had one row per covariate pattern, and doing WEIGHT by deaths and GENLOG, making sure that the model was ordered in GENLOG in the same way as it was in GENLIN (as Alex suggested), I did get the same results for the interactions from the two procedures, so my initial query was answered.
However, after reading all the posts and the discussion by everyone, I have one more question:
is it wrong to run GENLOG or GENLIN for count data if I don't have an offset variable?

Many thanks to all

Bruce Weaver

Re: GENLOG vs GENLIN

Administrator

I don't claim great expertise in count regression models, and I can't give a nice concise answer to your question. But here's how I would approach trying to understand what the OFFSET value is doing. This is based on the tutorial at:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fgenlin_ships_intro.htm

NEW FILE.
DATASET CLOSE all.
GET FILE = "C:\SPSSdata\ships.sav". /* Change path if necessary.

* Run model with OFFSET = log_months_service, as in the tutorial.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
/MODEL type construction operation INTERCEPT=YES OFFSET=log_months_service
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT MODELINFO FIT SUMMARY SOLUTION .

* Now see what happens when different OFFSET values are used.

* Run model with OFFSET option removed.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
/MODEL type construction operation INTERCEPT=YES
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT MODELINFO FIT SUMMARY SOLUTION .

* Now try various fixed OFFSET values.

descriptives log_months_service .

* OFFSET = MIN(log_months_service), or about 4.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
/MODEL type construction operation INTERCEPT=YES OFFSET=4
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT MODELINFO FIT SUMMARY SOLUTION .

* OFFSET = MEAN(log_months_service), or about 7.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
/MODEL type construction operation INTERCEPT=YES OFFSET=7
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT MODELINFO FIT SUMMARY SOLUTION.

* OFFSET = MAX(log_months_service), or about 11.

GENLIN damage_incidents BY type construction operation (ORDER=DESCENDING)
/MODEL type construction operation INTERCEPT=YES OFFSET=11
DISTRIBUTION=POISSON LINK=LOG
/CRITERIA METHOD=FISHER(1) SCALE=PEARSON COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012
ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT MODELINFO FIT SUMMARY SOLUTION.

HTH.

xenia wrote

Hello all and thank you for all the contributions,
after making sure that the file had one row per covariate pattern, and doing WEIGHT by deaths and GENLOG, making sure that the model was ordered in GENLOG in the same way as it was in GENLIN (as Alex suggested), I did get the same results for the interactions from the two procedures, so my initial query was answered.
However, after reading all the posts and the discussion by everyone, I have one more question:
is it wrong to run GENLOG or GENLIN for count data if I don't have an offset variable?

Many thanks to all

Rich Ulrich

Re: GENLOG vs GENLIN

In reply to this post by xenia

[one more try on posting - through Nabble this time.]
[My four attempts directly to the List on July 5 failed.]

You can run the analysis without an "offset variable" of exposure
or counts, but you can't draw many conclusions about "significant"
tests. Does the high (or low) count reflect the population size, or
does it reflect "risk"? If most of the population is White and Female,
you could see main effects for high Deaths for W and F without any
valid implication about higher Risk. Or, the excess in these population
totals could weaken or mask the evidence that a group has low risk.

I presume that Risk is more interesting, but you only have data for
"frequency." I think that the most useful report from these data
might be the univariate counts: What does characterize deaths?

Beyond that, if there are *no* interactions that show up as significant,
you might take that as evidence that these Main effect, univariate results are
sufficient; that is the easiest conclusion to defend. Any narrative that you
want to create about some interaction has to take into account the chance
that the at-risk population shows the same disproportion in exposure, and
so there is nothing special about this interaction.

--
Rich Ulrich

xenia wrote

Hello all and thank you for all the contributions,
after making sure that the file had one row per covariate pattern, and doing WEIGHT by deaths and GENLOG, making sure that the model was ordered in GENLOG in the same way as it was in GENLIN (as Alex suggested), I did get the same results for the interactions from the two procedures, so my initial query was answered.
However, after reading all the posts and the discussion by everyone, I have one more question:
is it wrong to run GENLOG or GENLIN for count data if I don't have an offset variable?

Many thanks to all