I'm trying to use nonlinear regression to fit a regression
using the logit transformation (0<y<1): y = 1/(1+exp(-(b0+b1*x1+b2*x2))) I'm using SPSS 14.0.2, and need a macro/syntax that will let me use log-likelihood as the loss function. The data is in the form of cases with y(i), x1(i), and x2(i). Any suggestions? --- Prof. Gary S. Rosin [hidden email] South Texas College of Law 1303 San Jacinto Voice: (713) 646-1854 Houston, TX 77002-7000 Fax: 646-1766 |
Hi Gary,
If I am not mistaken, this is just what is LOGISTIC REGRESSION command for. Why not to try it instead of torturing nonlinear regression? LOGISTIC REGRESSION y /METHOD = ENTER x1 x2 . Greetings Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Rosin Sent: Friday, August 25, 2006 4:55 AM To: [hidden email] Subject: (C)NLR with log-likelihood loss function I'm trying to use nonlinear regression to fit a regression using the logit transformation (0<y<1): y = 1/(1+exp(-(b0+b1*x1+b2*x2))) I'm using SPSS 14.0.2, and need a macro/syntax that will let me use log-likelihood as the loss function. The data is in the form of cases with y(i), x1(i), and x2(i). Any suggestions? --- Prof. Gary S. Rosin [hidden email] South Texas College of Law 1303 San Jacinto Voice: (713) 646-1854 Houston, TX 77002-7000 Fax: 646-1766 |
So it is, if you are using individual data. I have grouped data,
where y(i) = the proportion of group i that "passed" x1(i) = the mean of predictor x1 for group i x2(i) = predictor x2 for group i I used probit/logit to get a model, but the statistics supplied with that are skimpy. I want to use the parameters from the probit/logit model as the initial parameters for a weighted (C)NLR. I tried using the default least squares regressions (all 4 of them), but the resulting parameters varied somewhat from those of the probit/logit model. I wondered what would happen if instead I used MLEs as the loss function. I could disaggregate the data into individual cases--I think I recently saw a macro for that--but I to stretch, and to get familiar with implementing MLEs in (C)NLR. Gary >Jan Spousta <[hidden email]> wrote: > >If I am not mistaken, this is just what is LOGISTIC REGRESSION >command for. Why not to try it instead of torturing nonlinear >regression? > >LOGISTIC REGRESSION y > /METHOD = ENTER x1 x2 . >Gary Rosin wrote: > >I'm trying to use nonlinear regression to fit a regression using the >logit transformation (0<y<1): > > y = 1/(1+exp(-(b0+b1*x1+b2*x2))) > >I'm using SPSS 14.0.2, and need a macro/syntax that will let me use >log-likelihood as the loss function. The data is in the form of cases >with y(i), x1(i), and x2(i). |
I could easily do a logit transformation of the y proportions
and then do a (C)NLR regression using the MLE from spssbase.pdf. I have some questions/concerns, though: 1. Regressing on logit(y), p/(1-p), minimizes the residuals of the transformed variable, rather than the residuals of the original variable. 2. Does anyone have the macro/syntax for the log-likelihood function or for the partial derivatives? Gary Marta García-Granero <[hidden email]> wrote: >The formula for the log-likelihood function and MLE estimates can be found >here: > >http://support.spss.com/Tech/Products/SPSS/Documentation/Statistics/algorithms/14.0/logistic_regression.pdf > >Alternatively, look in the installation CD, folder "Algorithms" >file "logistic_regression.pdf". > > >>I'm trying to use nonlinear regression to fit a regression using the > >>logit transformation (0<y<1): > >> > >> y = 1/(1+exp(-(b0+b1*x1+b2*x2))) > >> > >>I'm using SPSS 14.0.2, and need a macro/syntax that will let me use > >>log-likelihood as the loss function. The data is in the form of cases > >>with y(i), x1(i), and x2(i). |
In reply to this post by Gary Rosin
Hi again Gary
Some afterthoughts 1) Take a look at this document: http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0/SPSS%20Regression%20Models%2013.0.pdf There's a chapter dedicated to NLR. (if requested to login, use "guest" as user and password). It's in a QUITE well hidden page (hust positive criticism, spss-dot-com people) in SPSS extense website (I found it just by chance) that deserves a visit: http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/index.html 2) The fact that the data are grouped doesn't mean you can't use logistic regression, provided you have the sample sizes for each group "i". I can give you more details tomorrow, if you are interested. Regards Marta GR> So it is, if you are using individual data. I have grouped data, GR> where GR> y(i) = the proportion of group i that "passed" GR> x1(i) = the mean of predictor x1 for group i GR> x2(i) = predictor x2 for group i GR> I used probit/logit to get a model, but the statistics supplied GR> with that are skimpy. I want to use the parameters from the GR> probit/logit model as the initial parameters for a weighted GR> (C)NLR. I tried using the default least squares regressions GR> (all 4 of them), but the resulting parameters varied somewhat GR> from those of the probit/logit model. I wondered what would GR> happen if instead I used MLEs as the loss function. GR> I could disaggregate the data into individual cases--I think I GR> recently saw a macro for that--but I to stretch, and to get GR> familiar with implementing MLEs in (C)NLR. |
I do not have a worked example and don't
know if I have the time to work one up. You need to use a MODEL PROGRAM block before CNLR; you can optionally specify the derivatives in a DERIVATIVES block, but SPSS has an ability to use numerical derivatives if you don't have or are unable to derive them; and you must specify the loss function on the /LOSS subcommand of CNLR. The need for these is hinted at in the syntax discussion for NLR and CNLR. The question seems to be: what is the loss function for aggregate logistic regression, and what are the associated derivatives? Here's a thought: use SPSS to disaggregate the data, and then find the loss function and derivatives in textbook sources such as Greene's Econometric Analysis. It would be nice to have a general ML estimation engine in SPSS, but I don't know if there's another place to look beside CNLR. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero Sent: Friday, August 25, 2006 2:19 PM To: [hidden email] Subject: Re: (C)NLR with log-likelihood loss function Hi again Gary Some afterthoughts 1) Take a look at this document: http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0 /SPSS%20Regression%20Models%2013.0.pdf There's a chapter dedicated to NLR. (if requested to login, use "guest" as user and password). It's in a QUITE well hidden page (hust positive criticism, spss-dot-com people) in SPSS extense website (I found it just by chance) that deserves a visit: http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde x.html 2) The fact that the data are grouped doesn't mean you can't use logistic regression, provided you have the sample sizes for each group "i". I can give you more details tomorrow, if you are interested. Regards Marta GR> So it is, if you are using individual data. I have grouped data, GR> where GR> y(i) = the proportion of group i that "passed" GR> x1(i) = the mean of predictor x1 for group i GR> x2(i) = predictor x2 for group i GR> I used probit/logit to get a model, but the statistics supplied GR> with that are skimpy. I want to use the parameters from the GR> probit/logit model as the initial parameters for a weighted GR> (C)NLR. I tried using the default least squares regressions GR> (all 4 of them), but the resulting parameters varied somewhat GR> from those of the probit/logit model. I wondered what would GR> happen if instead I used MLEs as the loss function. GR> I could disaggregate the data into individual cases--I think I GR> recently saw a macro for that--but I to stretch, and to get GR> familiar with implementing MLEs in (C)NLR. |
Anthony Babinec <[hidden email]> wrote:
>The need for these is hinted at in the syntax >discussion for NLR and CNLR. The question seems >to be: what is the loss function for aggregate >logistic regression, and what are the associated >derivatives? You can get the algorithm off the "algorithm" link at the bottom of the logistic regression help page. That gives both the log likelihood function, Sum(i=1 to n) [w(i)*y(i)*ln(prob(i)) + w(i)*(1-y(i))*ln(1-Prob(i))] and the partial derivatives for the B(j) parameters: Sum(i=1 to n) [w(i)*(y(i)-Prob(i))*ln(x(i,i) In both, w(i)'s are case weights, y(i)'s are observed proportions, prob(i)'s are the fitted proportions, and x(i,j)'s are the values of the predictors for the cases. The question is how to work the macro/syntax. Gary >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Marta García-Granero >Sent: Friday, August 25, 2006 2:19 PM >To: [hidden email] >Subject: Re: (C)NLR with log-likelihood loss function > >Hi again Gary > >Some afterthoughts > >1) Take a look at this document: > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0 >/SPSS%20Regression%20Models%2013.0.pdf > >There's a chapter dedicated to NLR. > >(if requested to login, use "guest" as user and password). > >It's in a QUITE well hidden page (hust positive criticism, spss-dot-com >people) in SPSS extense website (I found it just by chance) that >deserves a visit: > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde >x.html > > >2) The fact that the data are grouped doesn't mean you can't use >logistic regression, provided you have the sample sizes for each group >"i". > >I can give you more details tomorrow, if you are interested. > > >Regards > >Marta > >GR> So it is, if you are using individual data. I have grouped data, >GR> where > >GR> y(i) = the proportion of group i that "passed" >GR> x1(i) = the mean of predictor x1 for group i >GR> x2(i) = predictor x2 for group i > >GR> I used probit/logit to get a model, but the statistics supplied >GR> with that are skimpy. I want to use the parameters from the >GR> probit/logit model as the initial parameters for a weighted >GR> (C)NLR. I tried using the default least squares regressions >GR> (all 4 of them), but the resulting parameters varied somewhat >GR> from those of the probit/logit model. I wondered what would >GR> happen if instead I used MLEs as the loss function. > >GR> I could disaggregate the data into individual cases--I think I >GR> recently saw a macro for that--but I to stretch, and to get >GR> familiar with implementing MLEs in (C)NLR. |
In reply to this post by Marta García-Granero
Hi Gary
>>2) The fact that the data are grouped doesn't mean you can't use >>logistic regression, provided you have the sample sizes for each group >>"i". >> >>I can give you more details tomorrow, if you are interested. GR> Thanks; Please. I have adaptad Shapiro's dataset on ocu consumption and infarction to reflect a situation similar to the one you describe (aggregated data with p(i) instead of counts): DATA LIST LIST/N tobacco ocu (3 F8) meanage p_mi(2 F8.3). BEGIN DATA 788 1 0 37.86 .043 56 1 1 32.71 .071 645 2 0 37.95 .122 54 2 1 31.72 .056 379 3 0 38.36 .243 54 3 1 34.04 .407 END DATA. VALUE LABEL tobacco 1 'Non smoker' 2 '1-24 cig/day' 3 ' >=25 cig/day'. VALUE LABEL ocu 0 'No' 1 'Yes'. VAR LABEL meanage'Mean Age group (years)'. LIST. * Get number of Yes & No from N & fraction *. COMPUTE n_yes=RND(p_mi*N). COMPUTE n_No=N-n_Yes. * Restructure dataset * VARSTOCASES /MAKE weights FROM n_yes n_No /INDEX = mi "Myocardial Infarction"(2) /KEEP = TOBACCO OCU MEANAGE /NULL = KEEP. FORMAT weights (F8). RECODE mi (2=0) . VALUE LABEL mi 0 'Control' 1 'Case'. LIST. * Logistic regression *. WEIGHT BY weights. LOGISTIC REGRESSION mi /METHOD = ENTER TOBACCO OCU MEANAGE /CONTRAST (TOBACCO)=Indicator(1) /CONTRAST (OCU)=Indicator(1) /PRINT = GOODFIT CI(95) /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Hi again Gary:
Fiddling with the tweaked dataset I sent to you, I saw that the OR for ocu consumption was terribly distorted. After a bit of though, I realized that there is a flaw in the logic of using as covariate the mean of a quantitative variable for the whole group (mean age in the Shapiro example I presented). What you need is the mean age for cases and controls, separately, inside every group. It's saturday, and my family is frowning a bit at me because I'm "working" with the computer. I'll explain myself on monday, if you don't mind. Happy weekend, Marta MGG> I have adaptad Shapiro's dataset on ocu consumption and infarction to MGG> reflect a situation similar to the one you describe (aggregated data MGG> with p(i) instead of counts): MGG> DATA LIST LIST/N tobacco ocu (3 F8) meanage p_mi(2 F8.3). MGG> BEGIN DATA MGG> 788 1 0 37.86 .043 MGG> 56 1 1 32.71 .071 MGG> 645 2 0 37.95 .122 MGG> 54 2 1 31.72 .056 MGG> 379 3 0 38.36 .243 MGG> 54 3 1 34.04 .407 MGG> END DATA. MGG> VALUE LABEL tobacco MGG> 1 'Non smoker' MGG> 2 '1-24 cig/day' MGG> 3 ' >=25 cig/day'. MGG> VALUE LABEL ocu 0 'No' 1 'Yes'. MGG> VAR LABEL meanage'Mean Age group (years)'. MGG> LIST. MGG> * Get number of Yes & No from N & fraction *. MGG> COMPUTE n_yes=RND(p_mi*N). MGG> COMPUTE n_No=N-n_Yes. MGG> * Restructure dataset *. MGG> VARSTOCASES /MAKE weights FROM n_yes n_No MGG> /INDEX = mi "Myocardial Infarction"(2) MGG> /KEEP = TOBACCO OCU MEANAGE MGG> /NULL = KEEP. MGG> FORMAT weights (F8). MGG> RECODE mi (2=0) . MGG> VALUE LABEL mi 0 'Control' 1 'Case'. MGG> LIST. MGG> * Logistic regression *. MGG> WEIGHT BY weights. MGG> LOGISTIC REGRESSION mi MGG> /METHOD = ENTER TOBACCO OCU MEANAGE MGG> /CONTRAST (TOBACCO)=Indicator(1) MGG> /CONTRAST (OCU)=Indicator(1) MGG> /PRINT = GOODFIT CI(95) MGG> /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . |
Hi Gary:
My message (saturday morning...) MGG> Fiddling with the tweaked dataset I sent to you, I saw that the OR for MGG> ocu consumption was terribly distorted. After a bit of though, I MGG> realized that there is a flaw in the logic of using as covariate the MGG> mean of a quantitative variable for the whole group (mean age in the MGG> Shapiro example I presented). What you need is the mean age for cases MGG> and controls, separately, inside every group. MGG> It's saturday, and my family is frowning a bit at me because I'm MGG> "working" with the computer. I'll explain myself on monday, if you MGG> don't mind. It's monday (at least in Spain): * Data from Shapiro S et al "Oral contraceptive use in relation to myocardial infarction" Lancet 1979; 1: 743-7. DATA LIST FREE /agegroup(f8.0) tobacco(F8.0) ocu(F4.0) mi(F4.0) n(F8.0). BEGIN DATA 1 1 0 0 106 1 1 0 1 1 1 1 1 0 25 1 2 0 0 79 1 2 1 0 25 1 2 1 1 1 1 3 0 0 39 1 3 0 1 1 1 3 1 0 12 1 3 1 1 3 2 1 0 0 175 2 1 1 0 13 2 2 0 0 142 2 2 0 1 5 2 2 1 0 10 2 2 1 1 1 2 3 0 0 73 2 3 0 1 7 2 3 1 0 10 2 3 1 1 8 3 1 0 0 153 3 1 0 1 3 3 1 1 0 8 3 2 0 0 119 3 2 0 1 11 3 2 1 0 11 3 2 1 1 1 3 3 0 0 58 3 3 0 1 19 3 3 1 0 7 3 3 1 1 3 4 1 0 0 165 4 1 0 1 10 4 1 1 0 4 4 1 1 1 1 4 2 0 0 130 4 2 0 1 21 4 2 1 0 4 4 3 0 0 67 4 3 0 1 34 4 3 1 0 1 4 3 1 1 5 5 1 0 0 155 5 1 0 1 20 5 1 1 0 2 5 1 1 1 3 5 2 0 0 96 5 2 0 1 42 5 2 1 0 1 5 3 0 0 50 5 3 0 1 31 5 3 1 0 2 5 3 1 1 3 END DATA. VAR LABEL ocu 'Oral contraceptive use' /mi 'Myocardial infarction'. VALUE LABEL agegroup 1 '25-29 years' 2 '30-34 years' 3 '35-39 years' 4 '40-44 years' 5 '45-49 years'. VALUE LABEL tobacco 1 'Non smoker' 2 '1-24 cig/day' 3 ' >=25 cig/day'. VALUE LABEL ocu 0 'No' 1 'Yes'. VALUE LABEL mi 0 'Control' 1 'Case'. NUMERIC meanage(F8). COMPUTE meanage=22+agegroup*5. MATCH FILES/FILE=* /KEEP=tobacco ocu meanage mi n. LIST. WEIGHT BY n . LOGISTIC REGRESSION mi /METHOD = ENTER tobacco ocu meanage /CONTRAST (tobacco)=Indicator(1) /CONTRAST (ocu)=Indicator(1) /PRINT = GOODFIT CI(95). * Results: tobacco(1): OR=3.079 tobacco(2): OR=8.475 ocu: OR=3.281 meanage: OR=1.164 * Our goal: to get OR that are close enough *. * Using saturday dataset layout: tobacco(1): OR= 2.768 tobacco(2): OR= 4.579 ocu: OR=70.540 meanage: OR= 2.216 * All values are distorted (ocu & meanage the most)* * Let's make the Shapiro original dataset look like yours (aggregated with mean values for the quantitative predictor) but with more information about meanage retained *. AGGREGATE /OUTFILE=* /BREAK=tobacco ocu mi /meanage = MEAN(meanage) /N=N. FORMAT N(F8). WEIGHT BY n. LIST. * Now it looks like the dataset I sent on saturday, once "re-expanded" to use as input for logistic regression, but with different mean age values for cases and controls *. LOGISTIC REGRESSION mi /METHOD = ENTER tobacco ocu meanage /CONTRAST (tobacco)=Indicator(1) /CONTRAST (ocu)=Indicator(1) /PRINT = GOODFIT CI(95). Results: Ouch! EVEN WORSE, a perfect fit is detected and no model is obtained. Using GENLOG (adding 0.5 to each cell) I get a ludicrous model, with OR even more distorted than the ones obtained by saturday's dataset. Now the question: does that mean that the approach you wanted can't be done (you can't work with mean values of quantitative predictors) in general or that it can't be done with Shapiro dataset? I don't know... Marta MGG>> I have adaptad Shapiro's dataset on ocu consumption and infarction to MGG>> reflect a situation similar to the one you describe (aggregated data MGG>> with p(i) instead of counts): MGG>> DATA LIST LIST/N tobacco ocu (3 F8) meanage p_mi(2 F8.3). MGG>> BEGIN DATA MGG>> 788 1 0 37.86 .043 MGG>> 56 1 1 32.71 .071 MGG>> 645 2 0 37.95 .122 MGG>> 54 2 1 31.72 .056 MGG>> 379 3 0 38.36 .243 MGG>> 54 3 1 34.04 .407 MGG>> END DATA. MGG>> VALUE LABEL tobacco MGG>> 1 'Non smoker' MGG>> 2 '1-24 cig/day' MGG>> 3 ' >=25 cig/day'. MGG>> VALUE LABEL ocu 0 'No' 1 'Yes'. MGG>> VAR LABEL meanage'Mean Age group (years)'. MGG>> LIST. MGG>> * Get number of Yes & No from N & fraction *. MGG>> COMPUTE n_yes=RND(p_mi*N). MGG>> COMPUTE n_No=N-n_Yes. MGG>> * Restructure dataset *. MGG>> VARSTOCASES /MAKE weights FROM n_yes n_No MGG>> /INDEX = mi "Myocardial Infarction"(2) MGG>> /KEEP = TOBACCO OCU MEANAGE MGG>> /NULL = KEEP. MGG>> FORMAT weights (F8). MGG>> RECODE mi (2=0) . MGG>> VALUE LABEL mi 0 'Control' 1 'Case'. MGG>> LIST. MGG>> * Logistic regression *. MGG>> WEIGHT BY weights. MGG>> LOGISTIC REGRESSION mi MGG>> /METHOD = ENTER TOBACCO OCU MEANAGE MGG>> /CONTRAST (TOBACCO)=Indicator(1) MGG>> /CONTRAST (OCU)=Indicator(1) MGG>> /PRINT = GOODFIT CI(95) MGG>> /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . |
Free forum by Nabble | Edit this page |