SPSSX Discussion

(C)NLR with log-likelihood loss function

Classic

List

Threaded

10 messages Options

Gary Rosin

(C)NLR with log-likelihood loss function

I'm trying to use nonlinear regression to fit a regression
using the logit transformation (0<y<1):

y = 1/(1+exp(-(b0+b1*x1+b2*x2)))

I'm using SPSS 14.0.2, and need a macro/syntax that will
let me use log-likelihood as the loss function. The data
is in the form of cases with y(i), x1(i), and x2(i).

Any suggestions?

---
Prof. Gary S. Rosin [hidden email]
South Texas College of Law
1303 San Jacinto Voice: (713) 646-1854
Houston, TX 77002-7000 Fax: 646-1766

Spousta Jan

Re: (C)NLR with log-likelihood loss function

Hi Gary,

If I am not mistaken, this is just what is LOGISTIC REGRESSION command
for. Why not to try it instead of torturing nonlinear regression?

LOGISTIC REGRESSION y
/METHOD = ENTER x1 x2 .

Greetings

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gary Rosin
Sent: Friday, August 25, 2006 4:55 AM
To: [hidden email]
Subject: (C)NLR with log-likelihood loss function

I'm trying to use nonlinear regression to fit a regression using the
logit transformation (0<y<1):

y = 1/(1+exp(-(b0+b1*x1+b2*x2)))

I'm using SPSS 14.0.2, and need a macro/syntax that will let me use
log-likelihood as the loss function. The data is in the form of cases
with y(i), x1(i), and x2(i).

Any suggestions?

---
Prof. Gary S. Rosin [hidden email]
South Texas College of Law
1303 San Jacinto Voice: (713) 646-1854
Houston, TX 77002-7000 Fax: 646-1766

Gary Rosin

Re: (C)NLR with log-likelihood loss function

So it is, if you are using individual data. I have grouped data,
where

y(i) = the proportion of group i that "passed"
x1(i) = the mean of predictor x1 for group i
x2(i) = predictor x2 for group i

I used probit/logit to get a model, but the statistics supplied
with that are skimpy. I want to use the parameters from the
probit/logit model as the initial parameters for a weighted
(C)NLR. I tried using the default least squares regressions
(all 4 of them), but the resulting parameters varied somewhat
from those of the probit/logit model. I wondered what would
happen if instead I used MLEs as the loss function.

I could disaggregate the data into individual cases--I think I
recently saw a macro for that--but I to stretch, and to get
familiar with implementing MLEs in (C)NLR.

Gary

>Jan Spousta <[hidden email]> wrote:
>
>If I am not mistaken, this is just what is LOGISTIC REGRESSION
>command for. Why not to try it instead of torturing nonlinear
>regression?
>
>LOGISTIC REGRESSION y
> /METHOD = ENTER x1 x2 .

>Gary Rosin wrote:
>
>I'm trying to use nonlinear regression to fit a regression using the
>logit transformation (0<y<1):
>
> y = 1/(1+exp(-(b0+b1*x1+b2*x2)))
>
>I'm using SPSS 14.0.2, and need a macro/syntax that will let me use
>log-likelihood as the loss function. The data is in the form of cases
>with y(i), x1(i), and x2(i).

Gary Rosin

Re: (C)NLR with log-likelihood loss function

I could easily do a logit transformation of the y proportions
and then do a (C)NLR regression using the MLE from
spssbase.pdf.

I have some questions/concerns, though:

1. Regressing on logit(y), p/(1-p), minimizes the residuals of
the transformed variable, rather than the residuals of the original
variable.

2. Does anyone have the macro/syntax for the log-likelihood
function or for the partial derivatives?

Gary

Marta García-Granero <[hidden email]> wrote:

>The formula for the log-likelihood function and MLE estimates can be found
>here:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/Statistics/algorithms/14.0/logistic_regression.pdf
>
>Alternatively, look in the installation CD, folder "Algorithms"
>file "logistic_regression.pdf".
>
> >>I'm trying to use nonlinear regression to fit a regression using the
> >>logit transformation (0<y<1):
> >>
> >> y = 1/(1+exp(-(b0+b1*x1+b2*x2)))
> >>
> >>I'm using SPSS 14.0.2, and need a macro/syntax that will let me use
> >>log-likelihood as the loss function. The data is in the form of cases
> >>with y(i), x1(i), and x2(i).

Marta García-Granero

Re: (C)NLR with log-likelihood loss function

In reply to this post by Gary Rosin

Hi again Gary

Some afterthoughts

1) Take a look at this document:

http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0/SPSS%20Regression%20Models%2013.0.pdf

There's a chapter dedicated to NLR.

(if requested to login, use "guest" as user and password).

It's in a QUITE well hidden page (hust positive criticism, spss-dot-com
people) in SPSS extense website (I found it just by chance) that
deserves a visit:

http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/index.html

2) The fact that the data are grouped doesn't mean you can't use
logistic regression, provided you have the sample sizes for each group
"i".

I can give you more details tomorrow, if you are interested.

Regards

Marta

GR> So it is, if you are using individual data. I have grouped data,
GR> where

GR> y(i) = the proportion of group i that "passed"
GR> x1(i) = the mean of predictor x1 for group i
GR> x2(i) = predictor x2 for group i

GR> I used probit/logit to get a model, but the statistics supplied
GR> with that are skimpy. I want to use the parameters from the
GR> probit/logit model as the initial parameters for a weighted
GR> (C)NLR. I tried using the default least squares regressions
GR> (all 4 of them), but the resulting parameters varied somewhat
GR> from those of the probit/logit model. I wondered what would
GR> happen if instead I used MLEs as the loss function.

GR> I could disaggregate the data into individual cases--I think I
GR> recently saw a macro for that--but I to stretch, and to get
GR> familiar with implementing MLEs in (C)NLR.

Anthony Babinec

Re: (C)NLR with log-likelihood loss function

I do not have a worked example and don't
know if I have the time to work one up.

You need to use a MODEL PROGRAM block before
CNLR; you can optionally specify the derivatives in
a DERIVATIVES block, but SPSS has an ability
to use numerical derivatives if you don't
have or are unable to derive them; and you
must specify the loss function on the /LOSS
subcommand of CNLR.

The need for these is hinted at in the syntax
discussion for NLR and CNLR. The question seems
to be: what is the loss function for aggregate
logistic regression, and what are the associated
derivatives?

Here's a thought: use SPSS to disaggregate the
data, and then find the loss function and derivatives
in textbook sources such as Greene's Econometric Analysis.

It would be nice to have a general ML estimation engine
in SPSS, but I don't know if there's another place
to look beside CNLR.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Friday, August 25, 2006 2:19 PM
To: [hidden email]
Subject: Re: (C)NLR with log-likelihood loss function

Hi again Gary

Some afterthoughts

1) Take a look at this document:

http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0
/SPSS%20Regression%20Models%2013.0.pdf

There's a chapter dedicated to NLR.

(if requested to login, use "guest" as user and password).

It's in a QUITE well hidden page (hust positive criticism, spss-dot-com
people) in SPSS extense website (I found it just by chance) that
deserves a visit:

http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde
x.html

2) The fact that the data are grouped doesn't mean you can't use
logistic regression, provided you have the sample sizes for each group
"i".

I can give you more details tomorrow, if you are interested.

Regards

Marta

GR> So it is, if you are using individual data. I have grouped data,
GR> where

GR> y(i) = the proportion of group i that "passed"
GR> x1(i) = the mean of predictor x1 for group i
GR> x2(i) = predictor x2 for group i

GR> I used probit/logit to get a model, but the statistics supplied
GR> with that are skimpy. I want to use the parameters from the
GR> probit/logit model as the initial parameters for a weighted
GR> (C)NLR. I tried using the default least squares regressions
GR> (all 4 of them), but the resulting parameters varied somewhat
GR> from those of the probit/logit model. I wondered what would
GR> happen if instead I used MLEs as the loss function.

GR> I could disaggregate the data into individual cases--I think I
GR> recently saw a macro for that--but I to stretch, and to get
GR> familiar with implementing MLEs in (C)NLR.

Gary Rosin

Re: (C)NLR with log-likelihood loss function

Anthony Babinec <[hidden email]> wrote:

>The need for these is hinted at in the syntax
>discussion for NLR and CNLR. The question seems
>to be: what is the loss function for aggregate
>logistic regression, and what are the associated
>derivatives?

You can get the algorithm off the "algorithm" link
at the bottom of the logistic regression help page.
That gives both the log likelihood function,

Sum(i=1 to n) [w(i)*y(i)*ln(prob(i)) + w(i)*(1-y(i))*ln(1-Prob(i))]

and the partial derivatives for the B(j) parameters:

Sum(i=1 to n) [w(i)*(y(i)-Prob(i))*ln(x(i,i)

In both, w(i)'s are case weights, y(i)'s are observed proportions,
prob(i)'s are the fitted proportions, and x(i,j)'s are the values of
the predictors for the cases.

The question is how to work the macro/syntax.

Gary

>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Marta García-Granero
>Sent: Friday, August 25, 2006 2:19 PM
>To: [hidden email]
>Subject: Re: (C)NLR with log-likelihood loss function
>
>Hi again Gary
>
>Some afterthoughts
>
>1) Take a look at this document:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0
>/SPSS%20Regression%20Models%2013.0.pdf
>
>There's a chapter dedicated to NLR.
>
>(if requested to login, use "guest" as user and password).
>
>It's in a QUITE well hidden page (hust positive criticism, spss-dot-com
>people) in SPSS extense website (I found it just by chance) that
>deserves a visit:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde
>x.html
>
>
>2) The fact that the data are grouped doesn't mean you can't use
>logistic regression, provided you have the sample sizes for each group
>"i".
>
>I can give you more details tomorrow, if you are interested.
>
>
>Regards
>
>Marta
>
>GR> So it is, if you are using individual data. I have grouped data,
>GR> where
>
>GR> y(i) = the proportion of group i that "passed"
>GR> x1(i) = the mean of predictor x1 for group i
>GR> x2(i) = predictor x2 for group i
>
>GR> I used probit/logit to get a model, but the statistics supplied
>GR> with that are skimpy. I want to use the parameters from the
>GR> probit/logit model as the initial parameters for a weighted
>GR> (C)NLR. I tried using the default least squares regressions
>GR> (all 4 of them), but the resulting parameters varied somewhat
>GR> from those of the probit/logit model. I wondered what would
>GR> happen if instead I used MLEs as the loss function.
>
>GR> I could disaggregate the data into individual cases--I think I
>GR> recently saw a macro for that--but I to stretch, and to get
>GR> familiar with implementing MLEs in (C)NLR.

Marta García-Granero

Re: (C)NLR with log-likelihood loss function

In reply to this post by Marta García-Granero

Hi Gary

>>2) The fact that the data are grouped doesn't mean you can't use
>>logistic regression, provided you have the sample sizes for each group
>>"i".
>>
>>I can give you more details tomorrow, if you are interested.

GR> Thanks; Please.

I have adaptad Shapiro's dataset on ocu consumption and infarction to
reflect a situation similar to the one you describe (aggregated data
with p(i) instead of counts):

DATA LIST LIST/N tobacco ocu (3 F8) meanage p_mi(2 F8.3).
BEGIN DATA
788 1 0 37.86 .043
56 1 1 32.71 .071
645 2 0 37.95 .122
54 2 1 31.72 .056
379 3 0 38.36 .243
54 3 1 34.04 .407
END DATA.
VALUE LABEL tobacco
1 'Non smoker'
2 '1-24 cig/day'
3 ' >=25 cig/day'.
VALUE LABEL ocu 0 'No' 1 'Yes'.
VAR LABEL meanage'Mean Age group (years)'.
LIST.

* Get number of Yes & No from N & fraction *.
COMPUTE n_yes=RND(p_mi*N).
COMPUTE n_No=N-n_Yes.

* Restructure dataset *
VARSTOCASES /MAKE weights FROM n_yes n_No
/INDEX = mi "Myocardial Infarction"(2)
/KEEP = TOBACCO OCU MEANAGE
/NULL = KEEP.
FORMAT weights (F8).
RECODE mi (2=0) .
VALUE LABEL mi 0 'Control' 1 'Case'.
LIST.

* Logistic regression *.
WEIGHT BY weights.
LOGISTIC REGRESSION mi
/METHOD = ENTER TOBACCO OCU MEANAGE
/CONTRAST (TOBACCO)=Indicator(1)
/CONTRAST (OCU)=Indicator(1)
/PRINT = GOODFIT CI(95)
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)

Marta García-Granero

Re: (C)NLR with log-likelihood loss function

Hi again Gary:

Fiddling with the tweaked dataset I sent to you, I saw that the OR for
ocu consumption was terribly distorted. After a bit of though, I
realized that there is a flaw in the logic of using as covariate the
mean of a quantitative variable for the whole group (mean age in the
Shapiro example I presented). What you need is the mean age for cases
and controls, separately, inside every group.

It's saturday, and my family is frowning a bit at me because I'm
"working" with the computer. I'll explain myself on monday, if you
don't mind.

Happy weekend,
Marta

MGG> I have adaptad Shapiro's dataset on ocu consumption and infarction to
MGG> reflect a situation similar to the one you describe (aggregated data
MGG> with p(i) instead of counts):

MGG> DATA LIST LIST/N tobacco ocu (3 F8) meanage p_mi(2 F8.3).
MGG> BEGIN DATA
MGG> 788 1 0 37.86 .043
MGG> 56 1 1 32.71 .071
MGG> 645 2 0 37.95 .122
MGG> 54 2 1 31.72 .056
MGG> 379 3 0 38.36 .243
MGG> 54 3 1 34.04 .407
MGG> END DATA.
MGG> VALUE LABEL tobacco
MGG> 1 'Non smoker'
MGG> 2 '1-24 cig/day'
MGG> 3 ' >=25 cig/day'.
MGG> VALUE LABEL ocu 0 'No' 1 'Yes'.
MGG> VAR LABEL meanage'Mean Age group (years)'.
MGG> LIST.

MGG> * Get number of Yes & No from N & fraction *.
MGG> COMPUTE n_yes=RND(p_mi*N).
MGG> COMPUTE n_No=N-n_Yes.

MGG> * Restructure dataset *.
MGG> VARSTOCASES /MAKE weights FROM n_yes n_No
MGG> /INDEX = mi "Myocardial Infarction"(2)
MGG> /KEEP = TOBACCO OCU MEANAGE
MGG> /NULL = KEEP.
MGG> FORMAT weights (F8).
MGG> RECODE mi (2=0) .
MGG> VALUE LABEL mi 0 'Control' 1 'Case'.
MGG> LIST.

MGG> * Logistic regression *.
MGG> WEIGHT BY weights.
MGG> LOGISTIC REGRESSION mi
MGG> /METHOD = ENTER TOBACCO OCU MEANAGE
MGG> /CONTRAST (TOBACCO)=Indicator(1)
MGG> /CONTRAST (OCU)=Indicator(1)
MGG> /PRINT = GOODFIT CI(95)
MGG> /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

Marta García-Granero

Re: (C)NLR with log-likelihood loss function

Hi Gary:

My message (saturday morning...)

MGG> Fiddling with the tweaked dataset I sent to you, I saw that the OR for
MGG> ocu consumption was terribly distorted. After a bit of though, I
MGG> realized that there is a flaw in the logic of using as covariate the
MGG> mean of a quantitative variable for the whole group (mean age in the
MGG> Shapiro example I presented). What you need is the mean age for cases
MGG> and controls, separately, inside every group.

MGG> It's saturday, and my family is frowning a bit at me because I'm
MGG> "working" with the computer. I'll explain myself on monday, if you
MGG> don't mind.

It's monday (at least in Spain):

* Data from Shapiro S et al
"Oral contraceptive use in relation to myocardial infarction"
Lancet 1979; 1: 743-7.

DATA LIST FREE /agegroup(f8.0) tobacco(F8.0) ocu(F4.0) mi(F4.0) n(F8.0).
BEGIN DATA
1 1 0 0 106 1 1 0 1 1 1 1 1 0 25 1 2 0 0 79 1 2 1 0 25 1 2 1 1 1
1 3 0 0 39 1 3 0 1 1 1 3 1 0 12 1 3 1 1 3 2 1 0 0 175 2 1 1 0 13
2 2 0 0 142 2 2 0 1 5 2 2 1 0 10 2 2 1 1 1 2 3 0 0 73 2 3 0 1 7
2 3 1 0 10 2 3 1 1 8 3 1 0 0 153 3 1 0 1 3 3 1 1 0 8 3 2 0 0 119
3 2 0 1 11 3 2 1 0 11 3 2 1 1 1 3 3 0 0 58 3 3 0 1 19 3 3 1 0 7
3 3 1 1 3 4 1 0 0 165 4 1 0 1 10 4 1 1 0 4 4 1 1 1 1 4 2 0 0 130
4 2 0 1 21 4 2 1 0 4 4 3 0 0 67 4 3 0 1 34 4 3 1 0 1 4 3 1 1 5
5 1 0 0 155 5 1 0 1 20 5 1 1 0 2 5 1 1 1 3 5 2 0 0 96 5 2 0 1 42
5 2 1 0 1 5 3 0 0 50 5 3 0 1 31 5 3 1 0 2 5 3 1 1 3
END DATA.
VAR LABEL ocu 'Oral contraceptive use' /mi 'Myocardial infarction'.
VALUE LABEL agegroup
1 '25-29 years'
2 '30-34 years'
3 '35-39 years'
4 '40-44 years'
5 '45-49 years'.
VALUE LABEL tobacco
1 'Non smoker'
2 '1-24 cig/day'
3 ' >=25 cig/day'.
VALUE LABEL ocu 0 'No' 1 'Yes'.
VALUE LABEL mi 0 'Control' 1 'Case'.
NUMERIC meanage(F8).
COMPUTE meanage=22+agegroup*5.
MATCH FILES/FILE=*
/KEEP=tobacco ocu meanage mi n.
LIST.

WEIGHT BY n .

LOGISTIC REGRESSION mi
/METHOD = ENTER tobacco ocu meanage
/CONTRAST (tobacco)=Indicator(1)
/CONTRAST (ocu)=Indicator(1)
/PRINT = GOODFIT CI(95).

* Results:
tobacco(1): OR=3.079
tobacco(2): OR=8.475
ocu: OR=3.281
meanage: OR=1.164

* Our goal: to get OR that are close enough *.

* Using saturday dataset layout:
tobacco(1): OR= 2.768
tobacco(2): OR= 4.579
ocu: OR=70.540
meanage: OR= 2.216

* All values are distorted (ocu & meanage the most)*

* Let's make the Shapiro original dataset look like yours (aggregated
with mean values for the quantitative predictor) but with more
information about meanage retained *.

AGGREGATE
/OUTFILE=*
/BREAK=tobacco ocu mi
/meanage = MEAN(meanage)
/N=N.

FORMAT N(F8).
WEIGHT BY n.

LIST.

* Now it looks like the dataset I sent on saturday, once "re-expanded"
to use as input for logistic regression, but with different mean age
values for cases and controls *.

LOGISTIC REGRESSION mi
/METHOD = ENTER tobacco ocu meanage
/CONTRAST (tobacco)=Indicator(1)
/CONTRAST (ocu)=Indicator(1)
/PRINT = GOODFIT CI(95).

Results: Ouch! EVEN WORSE, a perfect fit is detected and no model is
obtained. Using GENLOG (adding 0.5 to each cell) I get a ludicrous
model, with OR even more distorted than the ones obtained by
saturday's dataset.

Now the question: does that mean that the approach you wanted can't be
done (you can't work with mean values of quantitative predictors) in
general or that it can't be done with Shapiro dataset?

I don't know...

Marta

MGG>> I have adaptad Shapiro's dataset on ocu consumption and infarction to
MGG>> reflect a situation similar to the one you describe (aggregated data
MGG>> with p(i) instead of counts):

MGG>> DATA LIST LIST/N tobacco ocu (3 F8) meanage p_mi(2 F8.3).
MGG>> BEGIN DATA
MGG>> 788 1 0 37.86 .043
MGG>> 56 1 1 32.71 .071
MGG>> 645 2 0 37.95 .122
MGG>> 54 2 1 31.72 .056
MGG>> 379 3 0 38.36 .243
MGG>> 54 3 1 34.04 .407
MGG>> END DATA.
MGG>> VALUE LABEL tobacco
MGG>> 1 'Non smoker'
MGG>> 2 '1-24 cig/day'
MGG>> 3 ' >=25 cig/day'.
MGG>> VALUE LABEL ocu 0 'No' 1 'Yes'.
MGG>> VAR LABEL meanage'Mean Age group (years)'.
MGG>> LIST.

MGG>> * Get number of Yes & No from N & fraction *.
MGG>> COMPUTE n_yes=RND(p_mi*N).
MGG>> COMPUTE n_No=N-n_Yes.

MGG>> * Restructure dataset *.
MGG>> VARSTOCASES /MAKE weights FROM n_yes n_No
MGG>> /INDEX = mi "Myocardial Infarction"(2)
MGG>> /KEEP = TOBACCO OCU MEANAGE
MGG>> /NULL = KEEP.
MGG>> FORMAT weights (F8).
MGG>> RECODE mi (2=0) .
MGG>> VALUE LABEL mi 0 'Control' 1 'Case'.
MGG>> LIST.

MGG>> * Logistic regression *.
MGG>> WEIGHT BY weights.
MGG>> LOGISTIC REGRESSION mi
MGG>> /METHOD = ENTER TOBACCO OCU MEANAGE
MGG>> /CONTRAST (TOBACCO)=Indicator(1)
MGG>> /CONTRAST (OCU)=Indicator(1)
MGG>> /PRINT = GOODFIT CI(95)
MGG>> /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .