Genlin question

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Genlin question

Maguin, Eugene
I'm running a very simple probit regression model with simulation data using genlin.

GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.

And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.

The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.

I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Genlin question

Ryan
Mind showing your simulation code? Let's inspect it, if that's okay with you.

Ryan

On Jul 24, 2012, at 2:28 PM, "Maguin, Eugene" <[hidden email]> wrote:

> I'm running a very simple probit regression model with simulation data using genlin.
>
> GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.
>
> And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.
>
> The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
> The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.
>
> I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Genlin question

Maguin, Eugene
Here it is. Actually here is the whole file. Let me add that I don't get the message on every simulation dataset.

*  POINT BISERIAL--ODDS RATIO COMPUTATION.
*  VARY PROPORTIONS FROM 10/90 TO 90/10 BY 10.
*  VARY ES (D) FROM -1.5 TO 1.5 BY 0.1.
*  VARY SD(PREDICTOR) AT 1.0, 5.0, 10.0.
SET SEED=7162012.
INPUT PROGRAM.
LOOP PREP=1 TO 9.
+  LOOP ESREP=1 TO 31.
+     LOOP SDREP=1 TO 3.
+        LOOP I=1 TO 1000.
+           COMPUTE X=RV.NORMAL(0.0,1).
+           COMPUTE Y=0.
*  PROPORTIONS OF 10/90 TO 90/10 BY 10.
+           IF (I GT 100*PREP) Y=1.
*  ADD EFFECT SIZES OF -1.5 TO 1.5 BY 0.1.
+           IF (Y EQ 1) X=X+(ESREP-16)/10.
+           END CASE.
+        END LOOP.
+     END LOOP.
+  END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
LEAVE PREP ESREP SDREP.
EXECUTE.
FORMAT PREP ESREP SDREP(F2.0) I(F4.0) X(F10.6) Y(F1.0).


AGGREGATE OUTFILE=* MODE=ADDVARIABLES/BREAK=PREP ESREP SDREP Y/
   GM=MEAN(X)/GSD=SD(X).

*  ADJUST DISTRIBUTION TO MEAN=0.0, SD=1.0.
COMPUTE X=(X*1/GSD)-(GM*1/GSD).

*  ADD EFFECT SIZES OF -1.5 TO 1.5 BY 0.1.
DO REPEAT E=1 TO 31.
+  DO IF (ESREP EQ E).
+     IF (Y EQ 1) X=X+(E-16)/10.
+  END IF.
END REPEAT.

*  ADJUST SD TO 1.0, 5.0, 10.0.
IF (SDREP EQ 2) X=X*5.0.
IF (SDREP EQ 3) X=X*10.0.
EXECUTE.

SAVE OUTFILE='U:\MOLLY\PBS072412.sav'.

*  COMPUTATION CHECK.
AGGREGATE OUTFILE=*/BREAK=PREP ESREP SDREP Y/CELLN=NU/CELLMEAN=MEAN(X)/CELLSD=SD(X).
FORMAT CELLMEAN CELLSD(F12.8).
*  SEEMS TO BE OK.


GET FILE='U:\MOLLY\PBS072412.sav'.

OMS /SELECT TABLES/IF COMMANDS=['Means'] SUBTYPES=['Report']/
   DESTINATION FORMAT=SAV OUTFILE='U:\MOLLY\Means.sav'.
MEANS X BY PREP BY ESREP BY SDREP BY Y/CELLS=COUNT MEAN STDDEV.
OMSEND.


SPLIT FILE BY PREP ESREP SDREP.

OMS /SELECT TABLES/IF COMMANDS=['Correlations'] SUBTYPES=['Correlations']/
   DESTINATION FORMAT=SAV OUTFILE='U:\MOLLY\Corrs.sav'.
CORRELATION X WITH Y.
OMSEND.

OMS /SELECT TABLES/IF COMMANDS=['Logistic Regression']
   SUBTYPES=['Variables in the Equation']/
   DESTINATION FORMAT=SAV OUTFILE='U:\MOLLY\LogReg.sav'.
LOGISTIC REGRESSION VARIABLES=Y WITH X/ENTER X.
OMSEND.

OMS /SELECT TABLES/IF COMMANDS=['Generalized Linear Models']
   SUBTYPES=['Parameter Estimates']/
   DESTINATION FORMAT=SAV OUTFILE='U:\MOLLY\ProbitReg.sav'.
GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT/
   CRITERIA MAXSTEPHALVING=10.
OMSEND.
SPLIT FILE OFF.




-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Tuesday, July 24, 2012 2:33 PM
To: Maguin, Eugene
Cc: [hidden email]
Subject: Re: Genlin question

Mind showing your simulation code? Let's inspect it, if that's okay with you.

Ryan

On Jul 24, 2012, at 2:28 PM, "Maguin, Eugene" <[hidden email]> wrote:

> I'm running a very simple probit regression model with simulation data using genlin.
>
> GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.
>
> And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.
>
> The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
> The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.
>
> I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Genlin question

Jon K Peck
In reply to this post by Ryan
The message is telling you that convergence criteria have not been met, but the numerical algorithm is stuck.  The first thing to try is to increase the MAXITERATIONS value from its default value of 100 and the MAXSTEPHALVING from its default value of 5.  But there may be some data problems that are making what might be a simple problem numerically hard.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Ryan Black <[hidden email]>
To:        [hidden email]
Date:        07/24/2012 12:37 PM
Subject:        Re: [SPSSX-L] Genlin question
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Mind showing your simulation code? Let's inspect it, if that's okay with you.

Ryan

On Jul 24, 2012, at 2:28 PM, "Maguin, Eugene" <[hidden email]> wrote:

> I'm running a very simple probit regression model with simulation data using genlin.
>
> GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.
>
> And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.
>
> The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
> The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.
>
> I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Genlin question

Maguin, Eugene

Jon,

Thanks for your explanation. I increased the interations to 500 and the stephalves to 10. Although the same message recurred, I can understand that the condition might well have had numerical problems. I incorrectly described it in my reply to Ryan. Perhaps most importantly, the split was 10/90 not 50/50. The sample mean would have been M=2.25, SD~10. Other conditions generated the same message and some of them were ‘extreme’ splits and while some were not perhaps they were extreme on other dimensions.

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Tuesday, July 24, 2012 2:44 PM
To: [hidden email]
Subject: Re: Genlin question

 

The message is telling you that convergence criteria have not been met, but the numerical algorithm is stuck.  The first thing to try is to increase the MAXITERATIONS value from its default value of 100 and the MAXSTEPHALVING from its default value of 5.  But there may be some data problems that are making what might be a simple problem numerically hard.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Ryan Black <[hidden email]>
To:        [hidden email]
Date:        07/24/2012 12:37 PM
Subject:        Re: [SPSSX-L] Genlin question
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





Mind showing your simulation code? Let's inspect it, if that's okay with you.

Ryan

On Jul 24, 2012, at 2:28 PM, "Maguin, Eugene" <[hidden email]> wrote:

> I'm running a very simple probit regression model with simulation data using genlin.
>
> GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.
>
> And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.
>
> The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
> The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.
>
> I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Genlin question

Ryan
Gene,
 
Sorry for the delayed response. Life is hectic these days (When isn't it?!). Here is simulation code which I think meets your specifications. That is, the simulation code specifies that y=10/90 split and x~N(2.25,10).
 
The simulation code generates 10 samples of N=1000.
 
After the simulation code, I run DESCRIPTIVES for x, FREQUENCIES for y, and GENLIN to fit the probit regression model on each of 10 samples.
 
All seems fine to me. I have a feeling, however, that this may not be what you're after. Anyway, here you go:
 
set seed 98734538.
 
new file.
inp pro.
 
comp iteration=-99.
leave iteration.
 
   loop iteration = 1 to 10.
   loop ID= 1 to 10000.
 
  compute x = rv.normal(2.25,10).
  compute #error = rv.normal(0,1).
  compute #yl = x + #error.
 
  if  (#yl>(IDF.NORMAL(.10,2.25,10))) y=1.
  if  (#yl<=(IDF.NORMAL(.10,2.25,10))) y=0.
 
 end case.
 
 end loop.
 end loop.
 
end file.
end inp pro.
exe.
 
SPLIT FILE SEPARATE BY iteration.
 
DESCRIPTIVES VARIABLES=x
  /STATISTICS=MEAN STDDEV MIN MAX.
 
FREQUENCIES VARIABLES=y
  /ORDER=ANALYSIS.
 
GENLIN y (REFERENCE=FIRST) WITH  x
 /MODEL x DISTRIBUTION=BINOMIAL LINK=PROBIT.
 
Ryan
 
On Tue, Jul 24, 2012 at 3:45 PM, Maguin, Eugene <[hidden email]> wrote:
>
> Jon,
>
> Thanks for your explanation. I increased the interations to 500 and the stephalves to 10. Although the same message recurred, I can understand that the condition might well have had numerical problems. I incorrectly described it in my reply to Ryan. Perhaps most importantly, the split was 10/90 not 50/50. The sample mean would have been M=2.25, SD~10. Other conditions generated the same message and some of them were ‘extreme’ splits and while some were not perhaps they were extreme on other dimensions.
>
>  
>
> Gene Maguin
>
>  
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
> Sent: Tuesday, July 24, 2012 2:44 PM
> Subject: Re: Genlin question
>
>  
>
> The message is telling you that convergence criteria have not been met, but the numerical algorithm is stuck.  The first thing to try is to increase the MAXITERATIONS value from its default value of 100 and the MAXSTEPHALVING from its default value of 5.  But there may be some data problems that are making what might be a simple problem numerically hard.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> new phone: 720-342-5621
>
>
>
>
> From:        Ryan Black <[hidden email]>
> To:        [hidden email]
> Date:        07/24/2012 12:37 PM
> Subject:        Re: [SPSSX-L] Genlin question
> Sent by:        "SPSSX(r) Discussion" <[hidden email]>
>
> ________________________________
>
>
>
>
> Mind showing your simulation code? Let's inspect it, if that's okay with you.
>
> Ryan
>
> On Jul 24, 2012, at 2:28 PM, "Maguin, Eugene" <[hidden email]> wrote:
>
> > I'm running a very simple probit regression model with simulation data using genlin.
> >
> > GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.
> >
> > And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.
> >
> > The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
> > The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.
> >
> > I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?
> >
> > Thanks, Gene Maguin
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
 
Reply | Threaded
Open this post in threaded view
|

Re: Genlin question

Ryan
Gene,
 
Immediately after posting the message I realized that the code was generating sample sizes of N=10000, not N=1000. After changing the sample size to N=1000, I received the same GENLIN warning message for a few of the samples. However, as suggested by Jon, I increased the MAXITERATIONS and MAXSTEPHALVING [which I have done in the past when encountering the same warning message] and then I achieved convergence for all 10 models without any warnings.
 
Ryan
On Sat, Jul 28, 2012 at 12:09 AM, R B <[hidden email]> wrote:
Gene,
 
Sorry for the delayed response. Life is hectic these days (When isn't it?!). Here is simulation code which I think meets your specifications. That is, the simulation code specifies that y=10/90 split and x~N(2.25,10).
 
The simulation code generates 10 samples of N=1000.
 
After the simulation code, I run DESCRIPTIVES for x, FREQUENCIES for y, and GENLIN to fit the probit regression model on each of 10 samples.
 
All seems fine to me. I have a feeling, however, that this may not be what you're after. Anyway, here you go:
 
set seed 98734538.
 
new file.
inp pro.
 
comp iteration=-99.
leave iteration.
 
   loop iteration = 1 to 10.
   loop ID= 1 to 10000.
 
  compute x = rv.normal(2.25,10).
  compute #error = rv.normal(0,1).
  compute #yl = x + #error.
 
  if  (#yl>(IDF.NORMAL(.10,2.25,10))) y=1.
  if  (#yl<=(IDF.NORMAL(.10,2.25,10))) y=0.
 
 end case.
 
 end loop.
 end loop.
 
end file.
end inp pro.
exe.
 
SPLIT FILE SEPARATE BY iteration.
 
DESCRIPTIVES VARIABLES=x
  /STATISTICS=MEAN STDDEV MIN MAX.
 
FREQUENCIES VARIABLES=y
  /ORDER=ANALYSIS.
 
GENLIN y (REFERENCE=FIRST) WITH  x
 /MODEL x DISTRIBUTION=BINOMIAL LINK=PROBIT.
 
Ryan
 
On Tue, Jul 24, 2012 at 3:45 PM, Maguin, Eugene <[hidden email]> wrote:
>
> Jon,
>
> Thanks for your explanation. I increased the interations to 500 and the stephalves to 10. Although the same message recurred, I can understand that the condition might well have had numerical problems. I incorrectly described it in my reply to Ryan. Perhaps most importantly, the split was 10/90 not 50/50. The sample mean would have been M=2.25, SD~10. Other conditions generated the same message and some of them were ‘extreme’ splits and while some were not perhaps they were extreme on other dimensions.
>
>  
>
> Gene Maguin
>
>  
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
> Sent: Tuesday, July 24, 2012 2:44 PM
> Subject: Re: Genlin question
>
>  
>
> The message is telling you that convergence criteria have not been met, but the numerical algorithm is stuck.  The first thing to try is to increase the MAXITERATIONS value from its default value of 100 and the MAXSTEPHALVING from its default value of 5.  But there may be some data problems that are making what might be a simple problem numerically hard.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> new phone: <a href="tel:720-342-5621" target="_blank" value="+17203425621">720-342-5621
>
>
>
>
> From:        Ryan Black <[hidden email]>
> To:        [hidden email]
> Date:        07/24/2012 12:37 PM
> Subject:        Re: [SPSSX-L] Genlin question
> Sent by:        "SPSSX(r) Discussion" <[hidden email]>
>
> ________________________________
>
>
>
>
> Mind showing your simulation code? Let's inspect it, if that's okay with you.
>
> Ryan
>
> On Jul 24, 2012, at 2:28 PM, "Maguin, Eugene" <[hidden email]> wrote:
>
> > I'm running a very simple probit regression model with simulation data using genlin.
> >
> > GENLIN Y (REFERENCE=FIRST) WITH X/MODEL X DISTRIBUTION=BINOMIAL LINK=PROBIT.
> >
> > And for this dataset: N=1000, y=50/50, mean(x)=1.75, sd(s)=~10, I get the following message.
> >
> > The maximum number of step-halvings was reached but the log-likelihood value cannot be further improved. Output for the last iteration is displayed for split file PREP = 1,ESREP = 7,SDREP = 3.
> > The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.
> >
> > I'm confused by this message. It sounds like the likelihood has reached a minimum since it can not be further improved. Is this a correct assumption? What are the specific implications of the message and would increasing the number of step-halvings necessarily fix the problem (and why)?
> >
> > Thanks, Gene Maguin
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>