SPSSX Discussion

Binary Logistic Regression for Rare Events

Classic

List

Threaded

14 messages Options

John Amora-2

Binary Logistic Regression for Rare Events

Hi, SPSS Users:

Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events.

How rare events are handled in SPSS?

Thank you.

JOHNNY T. AMORA, MAS

Director, Office of Institutional Effectiveness and Research

De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Alejandro González Heras

Re: Binary Logistic Regression for Rare Events

Hi Johnny and all,

Let me please add myself to the same question. I will also try to bring two possibilities:

I had a very similar situation (70/930) and I was told that I could use the ordinal logistic regression if ‘ordinal’ makes sense for the data (because although I had two values, original variable was a Likert scale). So, Johnny, do you have an original dichotomous variable or is it a recoding of a non-dichotomous variable?

If ordinal doesn’t make sense, I was told that I could use ‘count models’, apparently designed for unequal variable distribution, like the Zero-inflated Poisson Regression or the Zero-truncated Poisson Regression. I haven’t done my homework yet and I don’t know these models but I believe they are not available in SPSS: https://stats.idre.ucla.edu/other/dae/

Kind regards,

De: SPSSX(r) Discussion <[hidden email]> En nombre de John Amora
Enviado el: martes, 16 de marzo de 2021 9:48
Para: [hidden email]
Asunto: Binary Logistic Regression for Rare Events

Hi, SPSS Users:

Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events.

How rare events are handled in SPSS?

Thank you.

JOHNNY T. AMORA, MAS

Director, Office of Institutional Effectiveness and Research

De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines

spss.giesel@yahoo.de

Re: Binary Logistic Regression for Rare Events

In reply to this post by John Amora-2

Hi John,

maybe check out "statistical oversampling" strategy to equalize group sizes.

Mario Giesel

Munich, Germany

Am Dienstag, 16. März 2021, 12:14:43 MEZ hat John Amora <[hidden email]> Folgendes geschrieben:

Hi, SPSS Users:

Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events.

How rare events are handled in SPSS?

Thank you.

JOHNNY T. AMORA, MAS

Director, Office of Institutional Effectiveness and Research

De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines

Jon Peck

Re: Binary Logistic Regression for Rare Events

In reply to this post by Alejandro González Heras

Several zero-inflated models are available via the STATS ZEROINFL extension command, which can be installed from the Extension Hub. It requires the appropriate version of R and the R Essentials.

On Tue, Mar 16, 2021 at 3:53 AM Alejandro González Heras <[hidden email]> wrote:

Hi Johnny and all,

Let me please add myself to the same question. I will also try to bring two possibilities:

I had a very similar situation (70/930) and I was told that I could use the ordinal logistic regression if ‘ordinal’ makes sense for the data (because although I had two values, original variable was a Likert scale). So, Johnny, do you have an original dichotomous variable or is it a recoding of a non-dichotomous variable?

If ordinal doesn’t make sense, I was told that I could use ‘count models’, apparently designed for unequal variable distribution, like the Zero-inflated Poisson Regression or the Zero-truncated Poisson Regression. I haven’t done my homework yet and I don’t know these models but I believe they are not available in SPSS: https://stats.idre.ucla.edu/other/dae/

Kind regards,

A

De: SPSSX(r) Discussion <[hidden email]> En nombre de John Amora
Enviado el: martes, 16 de marzo de 2021 9:48
Para: [hidden email]
Asunto: Binary Logistic Regression for Rare Events

Hi, SPSS Users:

Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events.

How rare events are handled in SPSS?

Thank you.

JOHNNY T. AMORA, MAS

Director, Office of Institutional Effectiveness and Research

De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]

Art Kendall

Re: Binary Logistic Regression for Rare Events

In reply to this post by Alejandro González Heras

Likert item response variables are usually part of a summative scale.

Is your DV a scale or a single variable?

What were the values on the pre-coarsened DV?

Why did you coarsen it to a dichotomy?

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Bruce Weaver

Re: Binary Logistic Regression for Rare Events

Administrator

In reply to this post by John Amora-2

Firth logistic regression?

https://statisticalhorizons.com/logistic-regression-for-rare-events

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_26.0.0/statistics_r_package_project_ddita/spss/programmability_option/r_package_installed_extensions.html

The name of the R extension command is STATS FIRTHLOG

John Amora-2 wrote

> Hi, SPSS Users:
>
> Is there a specialized algorithm in SPSS that handles Binary Logistic
> Regression for rare events? Rare events happen if, for example, you have
> a
> sample size of 1000 but only 20 events.
>
> How rare events are handled in SPSS?
>
> Thank you.
>
>
> *JOHNNY T. AMORA, MAS*
> Director, Office of Institutional Effectiveness and Research
> De La Salle-College of Saint Benilde
> Taft Avenue, Manila, Philippines
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Alejandro González Heras

Re: Binary Logistic Regression for Rare Events

In reply to this post by Art Kendall

Hi Art,

Indeed, it could be understood as a scale but it does not have a normal distribution:

Dependent variable:
"How much do you believe that fake news influence you?"
1 22 (nothing at all)
2 7
3 19
4 37
5 117
6 260
7 537 (a lot)
Total 999

Now, could this variable explain what it's called "third person effect" (*)?
(IV) "How much do you think that fake news influence public opinion?" (also a likert variable with a non normal distribution)

Multiple linear regression (including other IVs) would be great if a normal distribution could be assumed. That's why we tried to dichotomize the DV into two groups: [1-4] vs [5-7]. Logistic regression would then be nice but we have to face the unequal variable distribution. There is the fact that it is uncommon to think that fake news have little influence in oneself. That's the reason for ordinal regression or the 'count models' which I haven't explored yet. Or the options that tell us about Jon and Bruce, which I understand are linked to R

(*). To put it simply, let me give you the Wikipedia definition: "The third-person effect hypothesis predicts that people tend to perceive that mass media messages have a greater effect on others than on themselves, based on personal biases."

Kind regards,
A

-----Mensaje original-----
De: SPSSX(r) Discussion <[hidden email]> En nombre de Art Kendall
Enviado el: martes, 16 de marzo de 2021 14:35
Para: [hidden email]
Asunto: Re: Binary Logistic Regression for Rare Events

Likert item response variables are usually part of a summative scale.

Is your DV a scale or a single variable?

What were the values on the pre-coarsened DV?

Why did you coarsen it to a dichotomy?

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Binary Logistic Regression for Rare Events

Why do you believe the *residuals* are severely discrepant from normally
distributed?

I suggest you try CATREG and see if it makes a meaningful substantive
difference in fit with nominal vs ordinal measurement level assumptions.
You might even try continuous vs ordinal assumptions.

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Alejandro González Heras

Re: Binary Logistic Regression for Rare Events

I just opened my eyes twice, if you get my meaning. Thank you very much Art

Apologies for being redundant: when you say CATREG, you refer to the following R package, right?

https://CRAN.R-project.org/package=CatReg

It is true though that categorizing the variable is also interesting, so we can compare those who consider themselves somehow 'immune' (to fake news) versus those who doesn't. I guess it will then depend on the interpretation of the different models, I will run them and compare them. Again, much appreciated Art.

All the best,
A

-----Mensaje original-----
De: SPSSX(r) Discussion <[hidden email]> En nombre de Art Kendall
Enviado el: martes, 16 de marzo de 2021 16:56
Para: [hidden email]
Asunto: Re: Binary Logistic Regression for Rare Events

Why do you believe the *residuals* are severely discrepant from normally distributed?

I suggest you try CATREG and see if it makes a meaningful substantive difference in fit with nominal vs ordinal measurement level assumptions.
You might even try continuous vs ordinal assumptions.

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Binary Logistic Regression for Rare Events

No.
in SPSS, go to <help> <topics> type "CATREG"
CATREG (categorical regression with optimal scaling using alternating least
squares) quantifies categorical variables using optimal scaling, resulting
in an optimal linear regression equation for the transformed variables. The
variables can be given mixed optimal scaling levels, and no distributional
assumptions about the variables are made.

Options

Transformation Type. You can specify the transformation type (spline
ordinal, spline nominal, ordinal, nominal, or numerical) at which you want
to analyze each variable.

Discretization. You can use the DISCRETIZATION subcommand to discretize
fractional-value variables or to recode categorical variables.

Initial Configuration. You can specify the kind of initial configuration
through the INITIAL subcommand. Also, multiple systematic starts or fixed
signs for the regression coefficients can be specified through this
subcommand.

Tuning the Algorithm. You can control the values of algorithm-tuning
parameters with the MAXITER and CRITITER subcommands.

Regularized regression. You can specifiy one of three methods for
regularized regression: Ridge regression, the Lasso, or the Elastic Net.

Resampling. You can specify cross validation or the .632 bootstrap for
estimation of prediction error.

Missing Data. You can specify the treatment of missing data with the MISSING
subcommand.

Optional Output. You can request optional output through the PRINT
subcommand.

Transformation Plot per Variable. You can request a plot per variable of its
quantification against the category numbers.

Residual Plot per Variable. You can request an overlay plot per variable of
the residuals and the weighted quantification against the category numbers.

Ridge, Lasso, or Elastic Net plot. You can request a plot of the regularized
coefficients paths. For the Elastic Net, the plots for all values of the
Ridge penalty can be requested, or plots for specified values of the Ridge
penalty.

Writing External Data. You can write the transformed data (category numbers
replaced with optimal quantifications) to an outfile for use in further
analyses. You can also write the discretized data to an outfile.

Saving Variables. You can save the transformed variables, the predicted
values, and/or the residuals in the working data file.

Basic Specification

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Bruce Weaver

Re: Binary Logistic Regression for Rare Events

Administrator

In reply to this post by Alejandro González Heras

Alejandro, if your license includes CATREG, you will find it in the GUI under
Analyze > Regression > Optimal Scaling (CATREG). (The GUI is useful for
generating a first draft of the command syntax, as Art is wont to say!)

Alejandro González Heras wrote

> I just opened my eyes twice, if you get my meaning. Thank you very much
> Art
>
> Apologies for being redundant: when you say CATREG, you refer to the
> following R package, right?
>
> https://CRAN.R-project.org/package=CatReg
>
> It is true though that categorizing the variable is also interesting, so
> we can compare those who consider themselves somehow 'immune' (to fake
> news) versus those who doesn't. I guess it will then depend on the
> interpretation of the different models, I will run them and compare them.
> Again, much appreciated Art.
>
> All the best,
> A
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion <

> SPSSX-L@.UGA

> > En nombre de Art Kendall
> Enviado el: martes, 16 de marzo de 2021 16:56
> Para:

> SPSSX-L@.UGA

> Asunto: Re: Binary Logistic Regression for Rare Events
>
> Why do you believe the *residuals* are severely discrepant from normally
> distributed?
>
> I suggest you try CATREG and see if it makes a meaningful substantive
> difference in fit with nominal vs ordinal measurement level assumptions.
> You might even try continuous vs ordinal assumptions.
>
>
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Binary Logistic Regression for Rare Events

In reply to this post by John Amora-2

As the article cited by Bruce points out, there is nothing wrong with the

logistic /model/ for rare events. ML /methods/ do not like small Ns if you

hope for robust results. N=20 (smaller group) is somewhat small.

However, even LS regression (here, it would be discriminant function)

wants the number of predictors to be multiplied by 5 or 10 to get the N

for the smaller group.

As to those other comments and alternate problem: "Likert" items have

a neutral midpoint; good likert items have generally symmetrical responses.

The example is not likert.

Dichotomizing a 7-point scale is a waste of good information, presumably.

If those category labels are not (subjectively) equal enough as intervals, consider

transformations. Someone cited a procedure that finds optimum intervals --

I think that it works well, but it /might/ be off-putting to your potential audience.

For the 1-7 frequencies given -- if I did not want to use that metric, my

first alternative to improve the "equal intervals" would be to subtract from 7

and take the square root. My second alternative would be to subtract from

8 and take the log. My third guess, if I disbelieved the OP's impulse to combine

the low scores, would be to take average ranks and score as logits. That assigns

wider spacing to the 1,2,3 scores (in search of subjective "equal intervals").

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of John Amora <[hidden email]>
Sent: Tuesday, March 16, 2021 4:48 AM
To: [hidden email] <[hidden email]>
Subject: Binary Logistic Regression for Rare Events

Hi, SPSS Users:

Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events.

How rare events are handled in SPSS?

Thank you.

JOHNNY T. AMORA, MAS

Director, Office of Institutional Effectiveness and Research

De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines

Alejandro González Heras

Re: Binary Logistic Regression for Rare Events

In reply to this post by Bruce Weaver

Much appreciated Bruce and Art :)

-----Mensaje original-----
De: SPSSX(r) Discussion <[hidden email]> En nombre de Bruce Weaver
Enviado el: martes, 16 de marzo de 2021 19:07
Para: [hidden email]
Asunto: Re: Binary Logistic Regression for Rare Events

Alejandro, if your license includes CATREG, you will find it in the GUI under Analyze > Regression > Optimal Scaling (CATREG). (The GUI is useful for generating a first draft of the command syntax, as Art is wont to say!)

Alejandro González Heras wrote

> I just opened my eyes twice, if you get my meaning. Thank you very
> much Art
>
> Apologies for being redundant: when you say CATREG, you refer to the
> following R package, right?
>
> https://CRAN.R-project.org/package=CatReg
>
> It is true though that categorizing the variable is also interesting,
> so we can compare those who consider themselves somehow 'immune' (to
> fake
> news) versus those who doesn't. I guess it will then depend on the
> interpretation of the different models, I will run them and compare them.
> Again, much appreciated Art.
>
> All the best,
> A
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion <

> SPSSX-L@.UGA

> > En nombre de Art Kendall
> Enviado el: martes, 16 de marzo de 2021 16:56
> Para:

> SPSSX-L@.UGA

> Asunto: Re: Binary Logistic Regression for Rare Events
>
> Why do you believe the *residuals* are severely discrepant from
> normally distributed?
>
> I suggest you try CATREG and see if it makes a meaningful substantive
> difference in fit with nominal vs ordinal measurement level assumptions.
> You might even try continuous vs ordinal assumptions.
>
>
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

John Amora-2

Re: Binary Logistic Regression for Rare Events

In reply to this post by Alejandro González Heras

Hi, Alejandro and All.

My variable is really an original dichotomous with 0/1 values.

Johnny

On Tue, Mar 16, 2021 at 5:53 PM Alejandro González Heras <[hidden email]> wrote:

Hi Johnny and all,

Let me please add myself to the same question. I will also try to bring two possibilities:

I had a very similar situation (70/930) and I was told that I could use the ordinal logistic regression if ‘ordinal’ makes sense for the data (because although I had two values, original variable was a Likert scale). So, Johnny, do you have an original dichotomous variable or is it a recoding of a non-dichotomous variable?

If ordinal doesn’t make sense, I was told that I could use ‘count models’, apparently designed for unequal variable distribution, like the Zero-inflated Poisson Regression or the Zero-truncated Poisson Regression. I haven’t done my homework yet and I don’t know these models but I believe they are not available in SPSS: https://stats.idre.ucla.edu/other/dae/

Kind regards,

A

De: SPSSX(r) Discussion <[hidden email]> En nombre de John Amora
Enviado el: martes, 16 de marzo de 2021 9:48
Para: [hidden email]
Asunto: Binary Logistic Regression for Rare Events

Hi, SPSS Users:

Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events.

How rare events are handled in SPSS?

Thank you.

JOHNNY T. AMORA, MAS

Director, Office of Institutional Effectiveness and Research

De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD