Effect-size measure for simple effects in Binary Logistic Regression?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Effect-size measure for simple effects in Binary Logistic Regression?

Mlkman
I am trying to get an effect-size measure for simple effects in a binary logistic regression. I am using the GENLIN procedure in SPSS; here is the syntax:

GENLIN Bar_exact_score (REFERENCE=FIRST) BY Skill Bar_cut_point (ORDER=ASCENDING)
/MODEL Skill Bar_cut_point Skill*Bar_cut_point INTERCEPT=YES DISTRIBUTION=BINOMIAL  
LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD
LIKELIHOOD=FULL
/EMMEANS TABLES=Skill*Bar_cut_point SCALE=ORIGINAL
COMPARE=Skill CONTRAST=SIMPLE(2) PADJUST=LSD
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

The CONTRAST subcommand within EMMEANS provides Wald Chi-square and p values, but I would like to get an effect size measure - for example, an odds ratio. I cannot work out how to do this.

Extra background details:

    * In my study, I show expert and novice police officers edited video clips of law enforcement incidents. The clips are edited to stop at a certain point; when the video clip stops, the officers are asked to predict what happens next in the clip (i.e., if it were to continue playing). Officers' responses are scored as correct (1)/incorrect (0).
    * There are 3-5 versions of each clip - each one with a different stop point (e.g., early, mid, late stop points).
    * This is a between-subjects study, so each officer sees only one version of each clip.
    * Each officer watches 23 different video incidents, and each officers sees a mix of cut points across those incidents.
    * The number of officers who viewed each version of a clip is different. For example: For Clip X, 20 novices and 12 experts may have viewed cut point #1 for Clip A, and 15 experts and 14 novices may have viewed cut point #2.
    * My goal is to identify the stop point in each clip that maximizes the difference between experts' and novices' ability to predict what will happen next. So, I want a way of comparing the simple effects of skill at each cut point.

Thank you for your time.
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

Bruce Weaver
Administrator
If you change SCALE=ORIGINAL to SCALE=TRANSFORMED on your EMMEANS line, the EM Means will give you the log odds of the outcome being equal to whichever value is not the referent.  Also, you can add COMPARE to get pair-wise comparisons.  So, you want something like:

 /EMMEANS TABLES=Skill*Bar_cut_point SCALE=TRANSFORMED COMPARE=Skill

OR

 /EMMEANS TABLES=Skill*Bar_cut_point SCALE=TRANSFORMED COMPARE=Bar_cut_point

Which one you want depends on which is the variable of main interest, and which one is seen as the effect modifier (as epidemiologists might say), or moderator (as psychologists tend to say).  

The DIFFERENCES in the pair-wise comparisons table can be exponentiated to give odds ratios.  You might find it convenient to slap an OMS command or two in front of your GENLIN to write the desired tables out to a new data set.

HTH.




Mlkman wrote
I am trying to get an effect-size measure for simple effects in a binary logistic regression. I am using the GENLIN procedure in SPSS; here is the syntax:

GENLIN Bar_exact_score (REFERENCE=FIRST) BY Skill Bar_cut_point (ORDER=ASCENDING)
/MODEL Skill Bar_cut_point Skill*Bar_cut_point INTERCEPT=YES DISTRIBUTION=BINOMIAL  
LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD
LIKELIHOOD=FULL
/EMMEANS TABLES=Skill*Bar_cut_point SCALE=ORIGINAL
COMPARE=Skill CONTRAST=SIMPLE(2) PADJUST=LSD
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

The CONTRAST subcommand within EMMEANS provides Wald Chi-square and p values, but I would like to get an effect size measure - for example, an odds ratio. I cannot work out how to do this.

Extra background details:

    * In my study, I show expert and novice police officers edited video clips of law enforcement incidents. The clips are edited to stop at a certain point; when the video clip stops, the officers are asked to predict what happens next in the clip (i.e., if it were to continue playing). Officers' responses are scored as correct (1)/incorrect (0).
    * There are 3-5 versions of each clip - each one with a different stop point (e.g., early, mid, late stop points).
    * This is a between-subjects study, so each officer sees only one version of each clip.
    * Each officer watches 23 different video incidents, and each officers sees a mix of cut points across those incidents.
    * The number of officers who viewed each version of a clip is different. For example: For Clip X, 20 novices and 12 experts may have viewed cut point #1 for Clip A, and 15 experts and 14 novices may have viewed cut point #2.
    * My goal is to identify the stop point in each clip that maximizes the difference between experts' and novices' ability to predict what will happen next. So, I want a way of comparing the simple effects of skill at each cut point.

Thank you for your time.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

parisec
In reply to this post by Mlkman
Out of curiousity, why did you choose GENLIN instead of LOGISTIC? Since you say it is between subjects, It doesn't look like you have any random effects that need to be accounted for.

I'm not that familiar with GENLIN but in logistic, you can use /PRINT CI(95) and you will get your ORs and 95%CIs.



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mlkman
Sent: Friday, September 28, 2012 12:49 PM
To: [hidden email]
Subject: Effect-size measure for simple effects in Binary Logistic Regression?

I am trying to get an effect-size measure for simple effects in a binary logistic regression. I am using the GENLIN procedure in SPSS; here is the
syntax:

GENLIN Bar_exact_score (REFERENCE=FIRST) BY Skill Bar_cut_point
(ORDER=ASCENDING)
/MODEL Skill Bar_cut_point Skill*Bar_cut_point INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT /CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL /EMMEANS TABLES=Skill*Bar_cut_point SCALE=ORIGINAL COMPARE=Skill CONTRAST=SIMPLE(2) PADJUST=LSD /MISSING CLASSMISSING=EXCLUDE /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

The CONTRAST subcommand within EMMEANS provides Wald Chi-square and p values, but I would like to get an effect size measure - for example, an odds ratio. I cannot work out how to do this.

Extra background details:

    * In my study, I show expert and novice police officers edited video clips of law enforcement incidents. The clips are edited to stop at a certain point; when the video clip stops, the officers are asked to predict what happens next in the clip (i.e., if it were to continue playing).
Officers' responses are scored as correct (1)/incorrect (0).
    * There are 3-5 versions of each clip - each one with a different stop point (e.g., early, mid, late stop points).
    * This is a between-subjects study, so each officer sees only one version of each clip.
    * Each officer watches 23 different video incidents, and each officers sees a mix of cut points across those incidents.
    * The number of officers who viewed each version of a clip is different.
For example: For Clip X, 20 novices and 12 experts may have viewed cut point
#1 for Clip A, and 15 experts and 14 novices may have viewed cut point #2.
    * My goal is to identify the stop point in each clip that maximizes the difference between experts' and novices' ability to predict what will happen next. So, I want a way of comparing the simple effects of skill at each cut point.

Thank you for your time.




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Effect-size-measure-for-simple-effects-in-Binary-Logistic-Regression-tp5715362.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

Mlkman
The reason I chose GENLIN instead of LOGISTIC REGRESSION is because I could not find a way to get the SKILL contrasts (i.e., skilled vs less-skilled) at each level of CUT POINT in LOGISTIC REGRESSION.

In the "Variables in the Equation" tables in LOGISTIC REGRESSION and GENLIN, I get parameters such as: Skill(1) x Bar_cut_point(1).

As far as I understand, "Bar_cut_point(1)" is not just one cut point - it is a contrast between two different cut points. So, my understanding is that the variable "Skill(1) x Bar_cut_point(1)" is really looking at a 2 (skill level) x 2 (cut point) interaction - which is not what I want.

I came to this conclusion because the "Variables in the Equation" table contains one less "cut point" parameter than the actual number of cut points, which would make sense if SPSS is constructing contrasts between the different levels of "cut point."

I found that in GENLIN, I could specify the contrasts that I actually wanted (i.e., skill at each cut point). Just having trouble with getting a measure of effect size.

Does that make sense?
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

Mlkman
In reply to this post by Bruce Weaver
Thanks for your message. It was helpful, but left me with some more questions. I would appreciate some additional guidance.

When I used SCALE=TRANSFORMED instead of SCALE=ORIGINAL on the EMMEANS line, I got different p-values for my comparisons. The contrasts and p-values that I got when I used to SCALE=ORIGINAL  seemed to make sense to me. The ones I get with to SCALE=TRANSFORMED do not. Would you expect the p-values to change when using TRANSFORMED vs ORIGINAL scale?

The Wald Chi square measures that I get when running SCALE=ORIGINAL have to come from somewhere, right? There has to be a beta coefficient and a SE in order to calculate the Wald Chi squares?? Are there any ways to get SPSS to report those "hidden" values?

I was trying to think of why I might be getting different p-values using TRANSFORMED vs ORIGINAL scale. Could this have anything to do with the fact that some cell counts are small? Also, on some of the analyses I am running, I get this warning:

Warnings
A quasi-complete separation may exist in the data. The maximum likelihood estimates do not exist.
The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.

As always, I am incredibly grateful for any help that can be provided.
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

Bruce Weaver
Administrator
From the FM entry for EMMEANS under GENLIN:

The SCALE keyword specifies whether to compute estimated marginal means based on the original scale of the dependent variable or based on the link function transformation.

ORIGINAL Estimated marginal means are based on the original scale of the dependent variable. Estimated marginal means are computed for the response. This is the default. Note that when the dependent variable is specified using the events/trials option, ORIGINAL gives the estimated marginal means for the events/trials proportion rather than for the number of events.

TRANSFORMED Estimated marginal means are based on the link function transformation. Estimated marginal means are computed for the linear predictor.


If I had a model like yours (i.e., two categorical predictors and their interaction), I would want the EM Mean for the cell that corresponds to the reference categories of both variables to be equal to the Constant in the table of coefficients.  And to get that, I need to use the SCALE=TRANSFORMED option.  Personally, I can't imagine why I would ever want SCALE=ORIGINAL.  But maybe someone else can chip in on that.

That warning about quasi-compete separation is concerning.  As it says, "Validity of the model fit is uncertain."  What is your sample size, and how many of the cases have "events" (where event is the less frequent category for the binary outcome variable)?  

HTH.


Mlkman wrote
Thanks for your message. It was helpful, but left me with some more questions. I would appreciate some additional guidance.

When I used SCALE=TRANSFORMED instead of SCALE=ORIGINAL on the EMMEANS line, I got different p-values for my comparisons. The contrasts and p-values that I got when I used to SCALE=ORIGINAL  seemed to make sense to me. The ones I get with to SCALE=TRANSFORMED do not. Would you expect the p-values to change when using TRANSFORMED vs ORIGINAL scale?

The Wald Chi square measures that I get when running SCALE=ORIGINAL have to come from somewhere, right? There has to be a beta coefficient and a SE in order to calculate the Wald Chi squares?? Are there any ways to get SPSS to report those "hidden" values?

I was trying to think of why I might be getting different p-values using TRANSFORMED vs ORIGINAL scale. Could this have anything to do with the fact that some cell counts are small? Also, on some of the analyses I am running, I get this warning:

Warnings
A quasi-complete separation may exist in the data. The maximum likelihood estimates do not exist.
The GENLIN procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.

As always, I am incredibly grateful for any help that can be provided.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

Mlkman
OK - that makes sense now. I looked at the output for SCALE=TRANSFORMED again, and I can see that I had my reference category for the SKILL variable back-to-front.

Is it correct/acceptable to calculate an odds ratio for each comparison by taking the exponent of the contrast estimate? If so, then I think can use that as a measure of effect size.

I have a 23 scenarios that I am analyzing individually. For each scenario, my sample size is around 160; that's across two skill levels and five cut points. When I look across all of the scenarios, the smallest number of cases that have events is 12 (that's for one particular scenario), and the highest is 155. The reason that there are some scenarios where only a small number of cases have events is because those scenarios would be considered "difficult" (i.e., it is very hard to anticipate what is going to happen next).

Any suggestions for how to handle this in the analyses?
Reply | Threaded
Open this post in threaded view
|

Re: Effect-size measure for simple effects in Binary Logistic Regression?

Bruce Weaver
Administrator
See below.

Mlkman wrote
OK - that makes sense now. I looked at the output for SCALE=TRANSFORMED again, and I can see that I had my reference category for the SKILL variable back-to-front.

Is it correct/acceptable to calculate an odds ratio for each comparison by taking the exponent of the contrast estimate? If so, then I think can use that as a measure of effect size.

BW: Yes, the contrast gives you a difference on the log-odds scale.  So Exp(difference) = an odds ratio.  I don't have SPSS on this PC, and can't remember if the contrast table also gives you a CI on the difference.  If so, you can exponentiate the limits of the CI too to get a CI for the OR.  


I have a 23 scenarios that I am analyzing individually. For each scenario, my sample size is around 160; that's across two skill levels and five cut points. When I look across all of the scenarios, the smallest number of cases that have events is 12 (that's for one particular scenario), and the highest is 155. The reason that there are some scenarios where only a small number of cases have events is because those scenarios would be considered "difficult" (i.e., it is very hard to anticipate what is going to happen next).

Any suggestions for how to handle this in the analyses?
A rule of thumb for logistic regression is that you should have 15 or 20 events per variable, where "event" = the outcome variable category with the lower frequency.  You have 2 levels of skill and 5 levels of cut-point, which makes 5 indicator variables for the first-order terms plus 5 product terms.  I'd say you're over-fitting (some of) your models.  E.g., when you say you have 155 "events", there are only 5 cases in the less frequent outcome category.  So you're definitely over-fitting that one.  (If you need a reference for the events-per-variable rule of thumb, look up Frank Harrell's book on regression models.)

To solve the problem of over-fitting, you need to either have more cases (and more to the point, more events), or you need to reduce the number of variables in the model.

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).