Hi, SPSS Users:
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events. How rare events are handled in SPSS? Thank you. JOHNNY T. AMORA, MAS Director, Office of Institutional Effectiveness and Research De La Salle-College of Saint Benilde Taft Avenue, Manila, Philippines |
Hi Johnny and all, Let me please add myself to the same question. I will also try to bring two possibilities: I had a very similar situation (70/930) and I was told that I could use the ordinal logistic regression if ‘ordinal’ makes sense for the data (because although I had two values, original
variable was a Likert scale). So, Johnny, do you have an original dichotomous variable or is it a recoding of a non-dichotomous variable?
If ordinal doesn’t make sense, I was told that I could use ‘count models’, apparently designed for unequal variable distribution, like the Zero-inflated Poisson Regression or the Zero-truncated
Poisson Regression. I haven’t done my homework yet and I don’t know these models but I believe they are not available in SPSS:
https://stats.idre.ucla.edu/other/dae/ Kind regards, A De: SPSSX(r) Discussion <[hidden email]>
En nombre de John Amora Hi, SPSS Users: Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a
sample size of 1000 but only 20 events. How rare events are handled in SPSS? Thank you.
JOHNNY T. AMORA, MAS Director, Office of Institutional Effectiveness and Research De La Salle-College of Saint Benilde ===================== To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
In reply to this post by John Amora-2
Hi John, maybe check out "statistical oversampling" strategy to equalize group sizes. Mario Giesel Munich, Germany
Am Dienstag, 16. März 2021, 12:14:43 MEZ hat John Amora <[hidden email]> Folgendes geschrieben:
Hi, SPSS Users:
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20 events. How rare events are handled in SPSS? Thank you. JOHNNY T. AMORA, MAS Director, Office of Institutional Effectiveness and Research De La Salle-College of Saint Benilde Taft Avenue, Manila, Philippines |
In reply to this post by Alejandro González Heras
Several zero-inflated models are available via the STATS ZEROINFL extension command, which can be installed from the Extension Hub. It requires the appropriate version of R and the R Essentials. On Tue, Mar 16, 2021 at 3:53 AM Alejandro González Heras <[hidden email]> wrote:
|
In reply to this post by Alejandro González Heras
Likert item response variables are usually part of a summative scale.
Is your DV a scale or a single variable? What were the values on the pre-coarsened DV? Why did you coarsen it to a dichotomy? ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by John Amora-2
Firth logistic regression?
https://statisticalhorizons.com/logistic-regression-for-rare-events https://www.ibm.com/support/knowledgecenter/en/SSLVMB_26.0.0/statistics_r_package_project_ddita/spss/programmability_option/r_package_installed_extensions.html The name of the R extension command is STATS FIRTHLOG John Amora-2 wrote > Hi, SPSS Users: > > Is there a specialized algorithm in SPSS that handles Binary Logistic > Regression for rare events? Rare events happen if, for example, you have > a > sample size of 1000 but only 20 events. > > How rare events are handled in SPSS? > > Thank you. > > > *JOHNNY T. AMORA, MAS* > Director, Office of Institutional Effectiveness and Research > De La Salle-College of Saint Benilde > Taft Avenue, Manila, Philippines > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Art Kendall
Hi Art,
Indeed, it could be understood as a scale but it does not have a normal distribution: Dependent variable: "How much do you believe that fake news influence you?" 1 22 (nothing at all) 2 7 3 19 4 37 5 117 6 260 7 537 (a lot) Total 999 Now, could this variable explain what it's called "third person effect" (*)? (IV) "How much do you think that fake news influence public opinion?" (also a likert variable with a non normal distribution) Multiple linear regression (including other IVs) would be great if a normal distribution could be assumed. That's why we tried to dichotomize the DV into two groups: [1-4] vs [5-7]. Logistic regression would then be nice but we have to face the unequal variable distribution. There is the fact that it is uncommon to think that fake news have little influence in oneself. That's the reason for ordinal regression or the 'count models' which I haven't explored yet. Or the options that tell us about Jon and Bruce, which I understand are linked to R (*). To put it simply, let me give you the Wikipedia definition: "The third-person effect hypothesis predicts that people tend to perceive that mass media messages have a greater effect on others than on themselves, based on personal biases." Kind regards, A -----Mensaje original----- De: SPSSX(r) Discussion <[hidden email]> En nombre de Art Kendall Enviado el: martes, 16 de marzo de 2021 14:35 Para: [hidden email] Asunto: Re: Binary Logistic Regression for Rare Events Likert item response variables are usually part of a summative scale. Is your DV a scale or a single variable? What were the values on the pre-coarsened DV? Why did you coarsen it to a dichotomy? ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Why do you believe the *residuals* are severely discrepant from normally
distributed? I suggest you try CATREG and see if it makes a meaningful substantive difference in fit with nominal vs ordinal measurement level assumptions. You might even try continuous vs ordinal assumptions. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
I just opened my eyes twice, if you get my meaning. Thank you very much Art
Apologies for being redundant: when you say CATREG, you refer to the following R package, right? https://CRAN.R-project.org/package=CatReg It is true though that categorizing the variable is also interesting, so we can compare those who consider themselves somehow 'immune' (to fake news) versus those who doesn't. I guess it will then depend on the interpretation of the different models, I will run them and compare them. Again, much appreciated Art. All the best, A -----Mensaje original----- De: SPSSX(r) Discussion <[hidden email]> En nombre de Art Kendall Enviado el: martes, 16 de marzo de 2021 16:56 Para: [hidden email] Asunto: Re: Binary Logistic Regression for Rare Events Why do you believe the *residuals* are severely discrepant from normally distributed? I suggest you try CATREG and see if it makes a meaningful substantive difference in fit with nominal vs ordinal measurement level assumptions. You might even try continuous vs ordinal assumptions. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
No.
in SPSS, go to <help> <topics> type "CATREG" CATREG (categorical regression with optimal scaling using alternating least squares) quantifies categorical variables using optimal scaling, resulting in an optimal linear regression equation for the transformed variables. The variables can be given mixed optimal scaling levels, and no distributional assumptions about the variables are made. Options Transformation Type. You can specify the transformation type (spline ordinal, spline nominal, ordinal, nominal, or numerical) at which you want to analyze each variable. Discretization. You can use the DISCRETIZATION subcommand to discretize fractional-value variables or to recode categorical variables. Initial Configuration. You can specify the kind of initial configuration through the INITIAL subcommand. Also, multiple systematic starts or fixed signs for the regression coefficients can be specified through this subcommand. Tuning the Algorithm. You can control the values of algorithm-tuning parameters with the MAXITER and CRITITER subcommands. Regularized regression. You can specifiy one of three methods for regularized regression: Ridge regression, the Lasso, or the Elastic Net. Resampling. You can specify cross validation or the .632 bootstrap for estimation of prediction error. Missing Data. You can specify the treatment of missing data with the MISSING subcommand. Optional Output. You can request optional output through the PRINT subcommand. Transformation Plot per Variable. You can request a plot per variable of its quantification against the category numbers. Residual Plot per Variable. You can request an overlay plot per variable of the residuals and the weighted quantification against the category numbers. Ridge, Lasso, or Elastic Net plot. You can request a plot of the regularized coefficients paths. For the Elastic Net, the plots for all values of the Ridge penalty can be requested, or plots for specified values of the Ridge penalty. Writing External Data. You can write the transformed data (category numbers replaced with optimal quantifications) to an outfile for use in further analyses. You can also write the discretized data to an outfile. Saving Variables. You can save the transformed variables, the predicted values, and/or the residuals in the working data file. Basic Specification ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Alejandro González Heras
Alejandro, if your license includes CATREG, you will find it in the GUI under
Analyze > Regression > Optimal Scaling (CATREG). (The GUI is useful for generating a first draft of the command syntax, as Art is wont to say!) Alejandro González Heras wrote > I just opened my eyes twice, if you get my meaning. Thank you very much > Art > > Apologies for being redundant: when you say CATREG, you refer to the > following R package, right? > > https://CRAN.R-project.org/package=CatReg > > It is true though that categorizing the variable is also interesting, so > we can compare those who consider themselves somehow 'immune' (to fake > news) versus those who doesn't. I guess it will then depend on the > interpretation of the different models, I will run them and compare them. > Again, much appreciated Art. > > All the best, > A > > -----Mensaje original----- > De: SPSSX(r) Discussion < > SPSSX-L@.UGA > > En nombre de Art Kendall > Enviado el: martes, 16 de marzo de 2021 16:56 > Para: > SPSSX-L@.UGA > Asunto: Re: Binary Logistic Regression for Rare Events > > Why do you believe the *residuals* are severely discrepant from normally > distributed? > > I suggest you try CATREG and see if it makes a meaningful substantive > difference in fit with nominal vs ordinal measurement level assumptions. > You might even try continuous vs ordinal assumptions. > > > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to manage > subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by John Amora-2
As the article cited by Bruce points out, there is nothing wrong with the
logistic /model/ for rare events. ML /methods/ do not like small Ns if you
hope for robust results. N=20 (smaller group) is somewhat small.
However, even LS regression (here, it would be discriminant function)
wants the number of predictors to be multiplied by 5 or 10 to get the N
for the smaller group.
As to those other comments and alternate problem: "Likert" items have
a neutral midpoint; good likert items have generally symmetrical responses.
The example is not likert.
Dichotomizing a 7-point scale is a waste of good information, presumably.
If those category labels are not (subjectively) equal enough as intervals, consider
transformations. Someone cited a procedure that finds optimum intervals --
I think that it works well, but it /might/ be off-putting to your potential audience.
For the 1-7 frequencies given -- if I did not want to use that metric, my
first alternative to improve the "equal intervals" would be to subtract from 7
and take the square root. My second alternative would be to subtract from
8 and take the log. My third guess, if I disbelieved the OP's impulse to combine
the low scores, would be to take average ranks and score as logits. That assigns
wider spacing to the 1,2,3 scores (in search of subjective "equal intervals").
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of John Amora <[hidden email]>
Sent: Tuesday, March 16, 2021 4:48 AM To: [hidden email] <[hidden email]> Subject: Binary Logistic Regression for Rare Events Hi, SPSS Users:
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Is there a specialized algorithm in SPSS that handles Binary Logistic Regression for rare events? Rare events happen if, for example, you have a sample size of 1000 but only 20
events.
How rare events are handled in SPSS?
Thank you.
JOHNNY T. AMORA, MAS
Director, Office of Institutional Effectiveness and Research
De La Salle-College of Saint Benilde
Taft Avenue, Manila, Philippines |
In reply to this post by Bruce Weaver
Much appreciated Bruce and Art :)
-----Mensaje original----- De: SPSSX(r) Discussion <[hidden email]> En nombre de Bruce Weaver Enviado el: martes, 16 de marzo de 2021 19:07 Para: [hidden email] Asunto: Re: Binary Logistic Regression for Rare Events Alejandro, if your license includes CATREG, you will find it in the GUI under Analyze > Regression > Optimal Scaling (CATREG). (The GUI is useful for generating a first draft of the command syntax, as Art is wont to say!) Alejandro González Heras wrote > I just opened my eyes twice, if you get my meaning. Thank you very > much Art > > Apologies for being redundant: when you say CATREG, you refer to the > following R package, right? > > https://CRAN.R-project.org/package=CatReg > > It is true though that categorizing the variable is also interesting, > so we can compare those who consider themselves somehow 'immune' (to > fake > news) versus those who doesn't. I guess it will then depend on the > interpretation of the different models, I will run them and compare them. > Again, much appreciated Art. > > All the best, > A > > -----Mensaje original----- > De: SPSSX(r) Discussion < > SPSSX-L@.UGA > > En nombre de Art Kendall > Enviado el: martes, 16 de marzo de 2021 16:56 > Para: > SPSSX-L@.UGA > Asunto: Re: Binary Logistic Regression for Rare Events > > Why do you believe the *residuals* are severely discrepant from > normally distributed? > > I suggest you try CATREG and see if it makes a meaningful substantive > difference in fit with nominal vs ordinal measurement level assumptions. > You might even try continuous vs ordinal assumptions. > > > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Alejandro González Heras
Hi, Alejandro and All. My variable is really an original dichotomous with 0/1 values. Johnny On Tue, Mar 16, 2021 at 5:53 PM Alejandro González Heras <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |