SPSSX Discussion

Multinomial Logistic Regression - Category Size

Classic

List

Threaded

11 messages Options

s-volk

Multinomial Logistic Regression - Category Size

Dear all,

I ran a multinomial logistic regression analysis with one continuous independent variable. I have a sample size of 68 subjects (psychological experiment) which end up split into 5 categories ranging in size from 5 to 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and Model Fit χ2=19.71, p<0.01.

Now here is the problem: A reviewer complains that my results may be sample specific because one of the 5 categories of the dependent variable consists of only 5 observations (subjects), i.e. sh/e argues that very few participants (five) are responsible for the observed effects. Is this valid argument? I thought that if the overall model is significant, I can conclude that there is a significant relationship between the dependent and independent variable for all categories of the dependent variable? That is, the calculations for the overall model are based on all observations (68) and not only on the observations in specific categories (e.g., 5)?

I was wondering if someone could provide me with or point me to some arguments for reviewers (ideally including some references)?

Many thanks in advance,
Stefan

Bruce Weaver

Re: Multinomial Logistic Regression - Category Size

Administrator

s-volk wrote

Dear all,

I ran a multinomial logistic regression analysis with one continuous independent variable. I have a sample size of 68 subjects (psychological experiment) which end up split into 5 categories ranging in size from 5 to 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and Model Fit χ2=19.71, p<0.01.

Now here is the problem: A reviewer complains that my results may be sample specific because one of the 5 categories of the dependent variable consists of only 5 observations (subjects), i.e. sh/e argues that very few participants (five) are responsible for the observed effects. Is this valid argument? I thought that if the overall model is significant, I can conclude that there is a significant relationship between the dependent and independent variable for all categories of the dependent variable? That is, the calculations for the overall model are based on all observations (68) and not only on the observations in specific categories (e.g., 5)?

I was wondering if someone could provide me with or point me to some arguments for reviewers (ideally including some references)?

Many thanks in advance,
Stefan

I'll have a kick at this one, more to get some discussion going than to provide any definitive answers. ;-)

I suppose the comment about it being "sample specific" translates to "will not generalize well to other samples".

Just thinking out loud here, so forgive me if it ends up being twaddle. What if you ran the model again, but without the 5 potentially problematic cases. If the predicted probabilities from the two models were very similar for the other N-5 cases, this might reassure the reviewer that the omitted 5 are not overly influential. On the other hand, if the predicted probabilities differ a fair bit, that would confirm the reviewer's fears.

Another possibility--could the outcome category that the 5 problem are in reasonably be merged with one of the other categories? Again, if the predicted probabilities from this model didn't differ substantially from those obtained with the original model, you could argue that the 5 cases are not very influential.

Perhaps someone else will have a better idea--remember, I was just trying to prime the pump here!

HTH.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Ryan

Re: Multinomial Logistic Regression - Category Size

In reply to this post by s-volk

Stefan,

What do you mean by the following statement?: "...if the overall model
is significant, I can conclude that there is a significant
relationship between the dependent and independent variable for all
categories of the dependent variable?"

In the typical multinomial logistic regression assuming a single
continuous predictor, X, the parameter estimates are interpreted as
the change in the log(risk relative to the reference category), given
a one-point increase in X. The parameter estimates do NOT reflect a
change in the log(risk) of observing each category, given a one-point
increase in X. Moreover, it's certainly possible to observe a
non-significant log(relative risk) in the presence an overall
significant model effect.

Ryan

On Tue, Nov 30, 2010 at 7:49 AM, s-volk <[hidden email]> wrote:

> Dear all,
>
> I ran a multinomial logistic regression analysis with one continuous
> independent variable. I have a sample size of 68 subjects (psychological
> experiment) which end up split into 5 categories ranging in size from 5 to
> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and Model
> Fit χ2=19.71, p<0.01.
>
> Now here is the problem: A reviewer complains that my results may be sample
> specific because one of the 5 categories of the dependent variable consists
> of only 5 observations (subjects), i.e. sh/e argues that very few
> participants (five) are responsible for the observed effects. Is this valid
> argument? I thought that if the overall model is significant, I can conclude
> that there is a significant relationship between the dependent and
> independent variable for all categories of the dependent variable? That is,
> the calculations for the overall model are based on all observations (68)
> and not only on the observations in specific categories (e.g., 5)?
>
> I was wondering if someone could provide me with or point me to some
> arguments for reviewers (ideally including some references)?
>
> Many thanks in advance,
> Stefan
>
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

s-volk

Re: Multinomial Logistic Regression - Category Size

In reply to this post by Bruce Weaver

Please apologize my confusing question,

What I was trying to say was: The multinomial logistic regression model (MLR) provides a likelihood-ratio test which evaluates the overall relationship between the independent variable and the dependent variable for all categories of the dependent variable. More specifically, it is tested whether the population value for the logistic regression coefficient of the independent variable is zero (i.e., there is no significant relationship between the dependent and independent variable in the population).

This leads me to draw the following conclusions, which I hope are not completely wrong?

1. Since the likelihood-ratio test is based on all observations of the dependent variable, we can assume that the relationship between the dependent and independent variable exists for all categories of the dependent variable (i.e., not only one category is responsible for the observed effect)?

2. The likelihood-ratio test is comparable to the overall F test in OLS regression and tests whether there is a relationship between the dependent and independent variable in the population and therefore provides evidence that the results will generalize to other sample?

@Bruce: Thanks for the suggestions, I thought about this before as well…but can I just drop some “inconvenient cases”

Many thanks for the help and best wishes,
Stefan

Bruce Weaver

Re: Multinomial Logistic Regression - Category Size

Administrator

s-volk wrote

--- snip ---

@Bruce: Thanks for the suggestions, I thought about this before as well…but can I just drop some “inconvenient cases”

Many thanks for the help and best wishes,
Stefan

Hi Stefan. You'd only be dropping them in order to compare that model to one that includes them. That's not the same thing as ignoring them completely. This is in essence what measures like Cook's Distance do, although in that case, it leaves out one observation at a time.

HTH.

Bruce Weaver

Re: Multinomial Logistic Regression - Category Size

Administrator

In reply to this post by Ryan

I responded to Ryan off-list to ask if he meant to say that the parameter estimates are interpreted as the change in the log(odds) relative to a reference category. He responded that he did indeed mean log(risk), not log(odds); and we have been having a vigorous back and forth discussion since, exchanging examples and links. Here is one link I sent to Ryan:

http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#estimates

And here's one he just sent to me (which I've not read yet--it's bed time here).

http://www.columbia.edu/~so33/SusDev/Lecture_10.pdf

Just thought I'd post this, in case anyone else was interested. I may have some more to say after reading that last document.

Cheers,
Bruce

R B wrote

Stefan,

What do you mean by the following statement?: "...if the overall model
is significant, I can conclude that there is a significant
relationship between the dependent and independent variable for all
categories of the dependent variable?"

In the typical multinomial logistic regression assuming a single
continuous predictor, X, the parameter estimates are interpreted as
the change in the log(risk relative to the reference category), given
a one-point increase in X. The parameter estimates do NOT reflect a
change in the log(risk) of observing each category, given a one-point
increase in X. Moreover, it's certainly possible to observe a
non-significant log(relative risk) in the presence an overall
significant model effect.

Ryan

On Tue, Nov 30, 2010 at 7:49 AM, s-volk <stefan.volk@uni-tuebingen.de> wrote:
> Dear all,
>
> I ran a multinomial logistic regression analysis with one continuous
> independent variable. I have a sample size of 68 subjects (psychological
> experiment) which end up split into 5 categories ranging in size from 5 to
> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and Model
> Fit χ2=19.71, p<0.01.
>
> Now here is the problem: A reviewer complains that my results may be sample
> specific because one of the 5 categories of the dependent variable consists
> of only 5 observations (subjects), i.e. sh/e argues that very few
> participants (five) are responsible for the observed effects. Is this valid
> argument? I thought that if the overall model is significant, I can conclude
> that there is a significant relationship between the dependent and
> independent variable for all categories of the dependent variable? That is,
> the calculations for the overall model are based on all observations (68)
> and not only on the observations in specific categories (e.g., 5)?
>
> I was wondering if someone could provide me with or point me to some
> arguments for reviewers (ideally including some references)?
>
> Many thanks in advance,
> Stefan
>
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Multinomial Logistic Regression - Category Size

Bruce et al.,

I have little doubt that the parameter estimates obtained from a
generalized logits multinomial regression without any predictors yield
log(relative risks), and assuming predictors are in the model,
relative risk ratios. Allow me to provide a couple simple examples
(without a predictor and with a dichotomous predictor) here to provide
evidence in support of what I've stated. But before I do, let's make
sure we all agree on some basic definitions within a logistic
regression framework:

Risk_A = probability of event A
Risk_B = probability of event B
Risk_C = probability of event C

Relative Risk_A_B = Risk A / Risk B
Relative Risk_A_C = Risk A / Risk C
Relative Risk_B_C = Risk B / Risk C

Odds_A = probability of event A / probability of not event A
Odds_B = probability of event B / probability of not event B
Odds_C = probability of event C / probability of not event C

Odds Ratio_A_B = Odds_A / Odds_B
Odds Ratio_A_C = Odds_A / Odds_C
Odds Ratio_B_C = Odds_B / Odds_C

Now, the first example I provide below shows that the parameter
estimates obtained from the generalized logits multinomial regression
model with no predictors below are equivalent to log(Relative Risks).

data list list / Y.
begin data
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
end data.

FREQUENCIES VARIABLES=Y
/ORDER=ANALYSIS.

COMPUTE RR_1_3_raw = 16.666666666666664 / 50.
COMPUTE RR_2_3_raw = 33.33333333333333 / 50.
EXECUTE.

NOMREG Y (BASE = LAST ORDER=ASCENDING)
/MODEL
/INTERCEPT=INCLUDE
/PRINT=PARAMETER .

COMPUTE RR_1_3 = exp(-1.0986122886681096).
COMPUTE RR_2_3 = exp(-0.4054651081081645).
EXECUTE.

It should be clear from the example above that the parameter estimates
can certainly be interpreted as log(Relative Risks). Those calculated
from CROSSTABS using the definitional formulas are exactly the same as
those output from NOMREG. Now, let's add a dichotomous predictor to
the model to see what happens. I provide further comments after this
code.

data list list / Y X.
begin data
1 1
1 1
1 0
2 1
2 1
2 1
2 0
2 1
2 1
3 1
3 1
3 0
3 1
3 1
3 1
3 0
3 1
3 1
end data.

CROSSTABS
/TABLES=Y BY X
/FORMAT=AVALUE TABLES
/CELLS=COUNT ROW COLUMN
/COUNT ROUND CELL.

COMPUTE RR_1_3_X0_Raw = (25 / 50) .
COMPUTE RR_1_3_X1_Raw = (14.285714285714285 / 50).
COMPUTE RRR_1_3_Raw = RR_1_3_X1_Raw / RR_1_3_X0_Raw.
EXECUTE.

COMPUTE RR_2_3_X0_Raw = (25 / 50) .
COMPUTE RR_2_3_X1_Raw = (35.714285714285715 / 50).
COMPUTE RRR_2_3_Raw = RR_2_3_X1_Raw / RR_2_3_X0_Raw.
EXECUTE.

NOMREG Y (BASE = LAST ORDER=ASCENDING) WITH X
/MODEL X
/INTERCEPT=INCLUDE
/PRINT=PARAMETER .

COMPUTE RRR_1_3 = exp(-0.5596157879354635).
COMPUTE RRR_2_3 = exp(0.35667494393875165).
EXECUTE.

Again, I calculated the estimates using the probability estimates from
CROSSTABS. Then I compared those estimates to exponentiated estimates
from NOMREG. As expected, they [relative risk ratios] are identical.

Ryan

On Wed, Dec 1, 2010 at 10:07 PM, Bruce Weaver <[hidden email]> wrote:

> I responded to Ryan off-list to ask if he meant to say that the parameter
> estimates are interpreted as the change in the log(odds) relative to a
> reference category. � He responded that he did indeed mean log(risk), not
> log(odds); and we have been having a vigorous back and forth discussion
> since, exchanging examples and links. � Here is one link I sent to Ryan:
>
> � http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#estimates
>
> And here's one he just sent to me (which I've not read yet--it's bed time
> here).
>
> � http://www.columbia.edu/~so33/SusDev/Lecture_10.pdf
>
> Just thought I'd post this, in case anyone else was interested. � I may have
> some more to say after reading that last document.
>
> Cheers,
> Bruce
>
>
> R B wrote:
>>
>> Stefan,
>>
>> What do you mean by the following statement?: "...if the overall model
>> is significant, I can conclude that there is a significant
>> relationship between the dependent and independent variable for all
>> categories of the dependent variable?"
>>
>> In the typical multinomial logistic regression assuming a single
>> continuous predictor, X, the parameter estimates are interpreted as
>> the change in the log(risk relative to the reference category), given
>> a one-point increase in X. The parameter estimates do NOT reflect a
>> change in the log(risk) of observing each category, given a one-point
>> increase in X. Moreover, it's certainly possible to observe a
>> non-significant log(relative risk) in the presence an overall
>> significant model effect.
>>
>> Ryan
>>
>> On Tue, Nov 30, 2010 at 7:49 AM, s-volk <[hidden email]>
>> wrote:
>>> Dear all,
>>>
>>> I ran a multinomial logistic regression analysis with one continuous
>>> independent variable. I have a sample size of 68 subjects (psychological
>>> experiment) which end up split into 5 categories ranging in size from 5
>>> to
>>> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and
>>> Model
>>> Fit χ2=19.71, p<0.01.
>>>
>>> Now here is the problem: A reviewer complains that my results may be
>>> sample
>>> specific because one of the 5 categories of the dependent variable
>>> consists
>>> of only 5 observations (subjects), i.e. sh/e argues that very few
>>> participants (five) are responsible for the observed effects. Is this
>>> valid
>>> argument? I thought that if the overall model is significant, I can
>>> conclude
>>> that there is a significant relationship between the dependent and
>>> independent variable for all categories of the dependent variable? That
>>> is,
>>> the calculations for the overall model are based on all observations (68)
>>> and not only on the observations in specific categories (e.g., 5)?
>>>
>>> I was wondering if someone could provide me with or point me to some
>>> arguments for reviewers (ideally including some references)?
>>>
>>> Many thanks in advance,
>>> Stefan
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3288831.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

Ryan

Re: Multinomial Logistic Regression - Category Size

Also, for those interested, the UCLA website that describes how to
interpret output from a multinomial logistic regression with
predictors in Stata refers to the exponentiated parameter estimates as
relative risk ratios. Go to the bottom of the page for details.

http://www.ats.ucla.edu/stat/stata/output/stata_mlogit_output.htm

Ryan

On Wed, Dec 1, 2010 at 11:33 PM, R B <[hidden email]> wrote:

> Bruce et al.,
>
> I have little doubt that the parameter estimates obtained from a
> generalized logits multinomial regression without any predictors yield
> log(relative risks), and assuming predictors are in the model,
> relative risk ratios. Allow me to provide a couple simple examples
> (without a predictor and with a dichotomous predictor) here to provide
> evidence in support of what I've stated. But before I do, let's make
> sure we all agree on some basic definitions within a logistic
> regression framework:
>
> Risk_A = probability of event A
> Risk_B = probability of event B
> Risk_C = probability of event C
>
> Relative Risk_A_B = Risk A / Risk B
> Relative Risk_A_C = Risk A / Risk C
> Relative Risk_B_C = Risk B / Risk C
>
> Odds_A = probability of event A / probability of not event A
> Odds_B = probability of event B / probability of not event B
> Odds_C = probability of event C / probability of not event C
>
> Odds Ratio_A_B = Odds_A / Odds_B
> Odds Ratio_A_C = Odds_A / Odds_C
> Odds Ratio_B_C = Odds_B / Odds_C
>
> Now, the first example I provide below shows that the parameter
> estimates obtained from the generalized logits multinomial regression
> model with no predictors below are equivalent to log(Relative Risks).
>
> data list list / Y.
> begin data
> 1
> 1
> 1
> 2
> 2
> 2
> 2
> 2
> 2
> 3
> 3
> 3
> 3
> 3
> 3
> 3
> 3
> 3
> end data.
>
> FREQUENCIES VARIABLES=Y
> /ORDER=ANALYSIS.
>
> COMPUTE RR_1_3_raw = 16.666666666666664 / 50.
> COMPUTE RR_2_3_raw = 33.33333333333333 / 50.
> EXECUTE.
>
> NOMREG Y (BASE = LAST ORDER=ASCENDING)
> /MODEL
> /INTERCEPT=INCLUDE
> /PRINT=PARAMETER .
>
> COMPUTE RR_1_3 = exp(-1.0986122886681096).
> COMPUTE RR_2_3 = exp(-0.4054651081081645).
> EXECUTE.
>
> It should be clear from the example above that the parameter estimates
> can certainly be interpreted as log(Relative Risks). Those calculated
> from CROSSTABS using the definitional formulas are exactly the same as
> those output from NOMREG. Now, let's add a dichotomous predictor to
> the model to see what happens. I provide further comments after this
> code.
>
> data list list / Y X.
> begin data
> 1 1
> 1 1
> 1 0
> 2 1
> 2 1
> 2 1
> 2 0
> 2 1
> 2 1
> 3 1
> 3 1
> 3 0
> 3 1
> 3 1
> 3 1
> 3 0
> 3 1
> 3 1
> end data.
>
> CROSSTABS
> /TABLES=Y BY X
> /FORMAT=AVALUE TABLES
> /CELLS=COUNT ROW COLUMN
> /COUNT ROUND CELL.
>
> COMPUTE RR_1_3_X0_Raw = (25 / 50) .
> COMPUTE RR_1_3_X1_Raw = (14.285714285714285 / 50).
> COMPUTE RRR_1_3_Raw = RR_1_3_X1_Raw / RR_1_3_X0_Raw.
> EXECUTE.
>
> COMPUTE RR_2_3_X0_Raw = (25 / 50) .
> COMPUTE RR_2_3_X1_Raw = (35.714285714285715 / 50).
> COMPUTE RRR_2_3_Raw = RR_2_3_X1_Raw / RR_2_3_X0_Raw.
> EXECUTE.
>
> NOMREG Y (BASE = LAST ORDER=ASCENDING) WITH X
> /MODEL X
> /INTERCEPT=INCLUDE
> /PRINT=PARAMETER .
>
> COMPUTE RRR_1_3 = exp(-0.5596157879354635).
> COMPUTE RRR_2_3 = exp(0.35667494393875165).
> EXECUTE.
>
> Again, I calculated the estimates using the probability estimates from
> CROSSTABS. Then I compared those estimates to exponentiated estimates
> from NOMREG. As expected, they [relative risk ratios] are identical.
>
> Ryan
>
> On Wed, Dec 1, 2010 at 10:07 PM, Bruce Weaver <[hidden email]> wrote:
>> I responded to Ryan off-list to ask if he meant to say that the parameter
>> estimates are interpreted as the change in the log(odds) relative to a
>> reference category. He responded that he did indeed mean log(risk), not
>> log(odds); and we have been having a vigorous back and forth discussion
>> since, exchanging examples and links. Here is one link I sent to Ryan:
>>
>> http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#estimates
>>
>> And here's one he just sent to me (which I've not read yet--it's bed time
>> here).
>>
>> http://www.columbia.edu/~so33/SusDev/Lecture_10.pdf
>>
>> Just thought I'd post this, in case anyone else was interested. I may have
>> some more to say after reading that last document.
>>
>> Cheers,
>> Bruce
>>
>>
>> R B wrote:
>>>
>>> Stefan,
>>>
>>> What do you mean by the following statement?: "...if the overall model
>>> is significant, I can conclude that there is a significant
>>> relationship between the dependent and independent variable for all
>>> categories of the dependent variable?"
>>>
>>> In the typical multinomial logistic regression assuming a single
>>> continuous predictor, X, the parameter estimates are interpreted as
>>> the change in the log(risk relative to the reference category), given
>>> a one-point increase in X. The parameter estimates do NOT reflect a
>>> change in the log(risk) of observing each category, given a one-point
>>> increase in X. Moreover, it's certainly possible to observe a
>>> non-significant log(relative risk) in the presence an overall
>>> significant model effect.
>>>
>>> Ryan
>>>
>>> On Tue, Nov 30, 2010 at 7:49 AM, s-volk <[hidden email]>
>>> wrote:
>>>> Dear all,
>>>>
>>>> I ran a multinomial logistic regression analysis with one continuous
>>>> independent variable. I have a sample size of 68 subjects (psychological
>>>> experiment) which end up split into 5 categories ranging in size from 5
>>>> to
>>>> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and
>>>> Model
>>>> Fit χ2=19.71, p<0.01.
>>>>
>>>> Now here is the problem: A reviewer complains that my results may be
>>>> sample
>>>> specific because one of the 5 categories of the dependent variable
>>>> consists
>>>> of only 5 observations (subjects), i.e. sh/e argues that very few
>>>> participants (five) are responsible for the observed effects. Is this
>>>> valid
>>>> argument? I thought that if the overall model is significant, I can
>>>> conclude
>>>> that there is a significant relationship between the dependent and
>>>> independent variable for all categories of the dependent variable? That
>>>> is,
>>>> the calculations for the overall model are based on all observations (68)
>>>> and not only on the observations in specific categories (e.g., 5)?
>>>>
>>>> I was wondering if someone could provide me with or point me to some
>>>> arguments for reviewers (ideally including some references)?
>>>>
>>>> Many thanks in advance,
>>>> Stefan
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
>>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>> [hidden email]
>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3288831.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>

Bruce Weaver

Re: Multinomial Logistic Regression - Category Size

Administrator

In reply to this post by Ryan

Here is an example demonstrating equivalence of Exp(B) from NOMREG with odds ratios computed via CROSSTABS.

Cheers,
Bruce

* Multinomial logistic regression on a 4x3 table.
* The data are from http://www.angelfire.com/wv/bwhomedir/notes/multinomial_log_reg.pdf .

data list list / X Y kount (3f5.0).
begin data
1 1 6
1 2 8
1 3 38
2 1 13
2 2 29
2 3 55
3 1 20
3 2 33
3 3 160
4 1 51
4 2 42
4 3 518
end data.

var lab
x 'Functional status'
y 'ICU Code Status'
.
val lab
x 1 'Unknown'
2 'Severely limited'
3 'Somewhat limited'
4 'Totally independent' /
y 1 'Explicit: Resuscitate'
2 'Explicit: DNR'
3 'Implicit: Resuscitate'
.

weight by kount.
crosstabs x by y .

NOMREG Y (BASE=LAST ORDER=ASCENDING) BY X
/MODEL = X
/INTERCEPT=INCLUDE
/PRINT=PARAMETER SUMMARY LRT STEP MFI.

* Notice that the Exp(B) values from this model match exactly
* the odds ratios computed in the document.

* Now compute the same odds ratios via CROSSTABS.

* First OR.
temporary.
select if any(X,1,4) and any(Y,1,3).
crosstabs x by y / stat = risk.

* Second OR.
temporary.
select if any(X,2,4) and any(Y,1,3).
crosstabs x by y / stat = risk.

* Third OR.
temporary.
select if any(X,3,4) and any(Y,1,3).
crosstabs x by y / stat = risk.

* Fourth OR.
temporary.
select if any(X,1,4) and any(Y,2,3).
crosstabs x by y / stat = risk.

* Fifth OR.
temporary.
select if any(X,2,4) and any(Y,2,3).
crosstabs x by y / stat = risk.

* Sixth OR.
temporary.
select if any(X,3,4) and any(Y,2,3).
crosstabs x by y / stat = risk.

* Notice that the odds ratios & 95% confidence intervals
* obtained via CROSSTABS match exactly the values of
* Exp(B) from NOMREG and their 95% confidence intervals.

R B wrote

Bruce et al.,

I have little doubt that the parameter estimates obtained from a
generalized logits multinomial regression without any predictors yield
log(relative risks), and assuming predictors are in the model,
relative risk ratios. Allow me to provide a couple simple examples
(without a predictor and with a dichotomous predictor) here to provide
evidence in support of what I've stated. But before I do, let's make
sure we all agree on some basic definitions within a logistic
regression framework:

Risk_A = probability of event A
Risk_B = probability of event B
Risk_C = probability of event C

Relative Risk_A_B = Risk A / Risk B
Relative Risk_A_C = Risk A / Risk C
Relative Risk_B_C = Risk B / Risk C

Odds_A = probability of event A / probability of not event A
Odds_B = probability of event B / probability of not event B
Odds_C = probability of event C / probability of not event C

Odds Ratio_A_B = Odds_A / Odds_B
Odds Ratio_A_C = Odds_A / Odds_C
Odds Ratio_B_C = Odds_B / Odds_C

Now, the first example I provide below shows that the parameter
estimates obtained from the generalized logits multinomial regression
model with no predictors below are equivalent to log(Relative Risks).

data list list / Y.
begin data
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
end data.

FREQUENCIES VARIABLES=Y
/ORDER=ANALYSIS.

COMPUTE RR_1_3_raw = 16.666666666666664 / 50.
COMPUTE RR_2_3_raw = 33.33333333333333 / 50.
EXECUTE.

NOMREG Y (BASE = LAST ORDER=ASCENDING)
/MODEL
/INTERCEPT=INCLUDE
/PRINT=PARAMETER .

COMPUTE RR_1_3 = exp(-1.0986122886681096).
COMPUTE RR_2_3 = exp(-0.4054651081081645).
EXECUTE.

It should be clear from the example above that the parameter estimates
can certainly be interpreted as log(Relative Risks). Those calculated
from CROSSTABS using the definitional formulas are exactly the same as
those output from NOMREG. Now, let's add a dichotomous predictor to
the model to see what happens. I provide further comments after this
code.

data list list / Y X.
begin data
1 1
1 1
1 0
2 1
2 1
2 1
2 0
2 1
2 1
3 1
3 1
3 0
3 1
3 1
3 1
3 0
3 1
3 1
end data.

CROSSTABS
/TABLES=Y BY X
/FORMAT=AVALUE TABLES
/CELLS=COUNT ROW COLUMN
/COUNT ROUND CELL.

COMPUTE RR_1_3_X0_Raw = (25 / 50) .
COMPUTE RR_1_3_X1_Raw = (14.285714285714285 / 50).
COMPUTE RRR_1_3_Raw = RR_1_3_X1_Raw / RR_1_3_X0_Raw.
EXECUTE.

COMPUTE RR_2_3_X0_Raw = (25 / 50) .
COMPUTE RR_2_3_X1_Raw = (35.714285714285715 / 50).
COMPUTE RRR_2_3_Raw = RR_2_3_X1_Raw / RR_2_3_X0_Raw.
EXECUTE.

NOMREG Y (BASE = LAST ORDER=ASCENDING) WITH X
/MODEL X
/INTERCEPT=INCLUDE
/PRINT=PARAMETER .

COMPUTE RRR_1_3 = exp(-0.5596157879354635).
COMPUTE RRR_2_3 = exp(0.35667494393875165).
EXECUTE.

Again, I calculated the estimates using the probability estimates from
CROSSTABS. Then I compared those estimates to exponentiated estimates
from NOMREG. As expected, they [relative risk ratios] are identical.

Ryan

On Wed, Dec 1, 2010 at 10:07 PM, Bruce Weaver <bruce.weaver@hotmail.com> wrote:
> I responded to Ryan off-list to ask if he meant to say that the parameter
> estimates are interpreted as the change in the log(odds) relative to a
> reference category. � He responded that he did indeed mean log(risk), not
> log(odds); and we have been having a vigorous back and forth discussion
> since, exchanging examples and links. � Here is one link I sent to Ryan:
>
> � http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#estimates
>
> And here's one he just sent to me (which I've not read yet--it's bed time
> here).
>
> � http://www.columbia.edu/~so33/SusDev/Lecture_10.pdf
>
> Just thought I'd post this, in case anyone else was interested. � I may have
> some more to say after reading that last document.
>
> Cheers,
> Bruce
>
>
> R B wrote:
>>
>> Stefan,
>>
>> What do you mean by the following statement?: "...if the overall model
>> is significant, I can conclude that there is a significant
>> relationship between the dependent and independent variable for all
>> categories of the dependent variable?"
>>
>> In the typical multinomial logistic regression assuming a single
>> continuous predictor, X, the parameter estimates are interpreted as
>> the change in the log(risk relative to the reference category), given
>> a one-point increase in X. The parameter estimates do NOT reflect a
>> change in the log(risk) of observing each category, given a one-point
>> increase in X. Moreover, it's certainly possible to observe a
>> non-significant log(relative risk) in the presence an overall
>> significant model effect.
>>
>> Ryan
>>
>> On Tue, Nov 30, 2010 at 7:49 AM, s-volk <stefan.volk@uni-tuebingen.de>
>> wrote:
>>> Dear all,
>>>
>>> I ran a multinomial logistic regression analysis with one continuous
>>> independent variable. I have a sample size of 68 subjects (psychological
>>> experiment) which end up split into 5 categories ranging in size from 5
>>> to
>>> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and
>>> Model
>>> Fit χ2=19.71, p<0.01.
>>>
>>> Now here is the problem: A reviewer complains that my results may be
>>> sample
>>> specific because one of the 5 categories of the dependent variable
>>> consists
>>> of only 5 observations (subjects), i.e. sh/e argues that very few
>>> participants (five) are responsible for the observed effects. Is this
>>> valid
>>> argument? I thought that if the overall model is significant, I can
>>> conclude
>>> that there is a significant relationship between the dependent and
>>> independent variable for all categories of the dependent variable? That
>>> is,
>>> the calculations for the overall model are based on all observations (68)
>>> and not only on the observations in specific categories (e.g., 5)?
>>>
>>> I was wondering if someone could provide me with or point me to some
>>> arguments for reviewers (ideally including some references)?
>>>
>>> Many thanks in advance,
>>> Stefan
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
> -----
> --
> Bruce Weaver
> bweaver@lakeheadu.ca
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3288831.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Multinomial Logistic Regression - Category Size

Bruce,

I decided to calculate what you call "First OR" using the RRR formula
I presented previously. Lo and behold, our estimates are the same.

RRR1 = ((6 / (6 + 8 + 38)) / (38 / (6 + 8 + 38))) / ((51 / (51 + 42 +
518)) / (518 / (51 + 42 + 518)))
= 1.6037151703

You can see in the formula above that I am calculating two relative
risk estimates and then dividing those two estimates. I am comfortable
interpreting this estimate as a relative risk ratio because of the way
I set up the equation. Interpretation of this estimate as a RRR is
common IMO. Having said that, I have also seen places where this
estimate is interpreted as an odds ratio.

Best wishes,

Ryan

On Thu, Dec 2, 2010 at 11:37 AM, Bruce Weaver <[hidden email]> wrote:

> Here is an example demonstrating equivalence of Exp(B) from NOMREG with odds
> ratios computed via CROSSTABS.
>
> Cheers,
> Bruce
>
> * Multinomial logistic regression on a 4x3 table.
> * The data are from
> http://www.angelfire.com/wv/bwhomedir/notes/multinomial_log_reg.pdf .
>
> data list list / X Y kount (3f5.0).
> begin data
> 1 1 6
> 1 2 8
> 1 3 38
> 2 1 13
> 2 2 29
> 2 3 55
> 3 1 20
> 3 2 33
> 3 3 160
> 4 1 51
> 4 2 42
> 4 3 518
> end data.
>
> var lab
> � x 'Functional status'
> � y 'ICU Code Status'
> .
> val lab
> � x 1 'Unknown'
> � 2 'Severely limited'
> � 3 'Somewhat limited'
> � 4 'Totally independent' /
> � y 1 'Explicit: Resuscitate'
> � 2 'Explicit: DNR'
> � 3 'Implicit: Resuscitate'
> .
>
> weight by kount.
> crosstabs x by y .
>
> NOMREG Y (BASE=LAST ORDER=ASCENDING) BY X
> � /MODEL = X
> � /INTERCEPT=INCLUDE
> � /PRINT=PARAMETER SUMMARY LRT STEP MFI.
>
> * Notice that the Exp(B) values from this model match exactly
> * the odds ratios computed in the document.
>
> * Now compute the same odds ratios via CROSSTABS.
>
> * First OR.
> temporary.
> select if any(X,1,4) and any(Y,1,3).
> crosstabs x by y / stat = risk.
>
> * Second OR.
> temporary.
> select if any(X,2,4) and any(Y,1,3).
> crosstabs x by y / stat = risk.
>
> * Third OR.
> temporary.
> select if any(X,3,4) and any(Y,1,3).
> crosstabs x by y / stat = risk.
>
> * Fourth OR.
> temporary.
> select if any(X,1,4) and any(Y,2,3).
> crosstabs x by y / stat = risk.
>
> * Fifth OR.
> temporary.
> select if any(X,2,4) and any(Y,2,3).
> crosstabs x by y / stat = risk.
>
> * Sixth OR.
> temporary.
> select if any(X,3,4) and any(Y,2,3).
> crosstabs x by y / stat = risk.
>
> * Notice that the odds ratios & 95% confidence intervals
> * obtained via CROSSTABS match exactly the values of
> * Exp(B) from NOMREG and their 95% confidence intervals.
>
>
>
>
> R B wrote:
>>
>> Bruce et al.,
>>
>> I have little doubt that the parameter estimates obtained from a
>> generalized logits multinomial regression without any predictors yield
>> log(relative risks), and assuming predictors are in the model,
>> relative risk ratios. Allow me to provide a couple simple examples
>> (without a predictor and with a dichotomous predictor) here to provide
>> evidence in support of what I've stated. But before I do, let's make
>> sure we all agree on some basic definitions within a logistic
>> regression framework:
>>
>> Risk_A = probability of event A
>> Risk_B = probability of event B
>> Risk_C = probability of event C
>>
>> Relative Risk_A_B = Risk A / Risk B
>> Relative Risk_A_C = Risk A / Risk C
>> Relative Risk_B_C = Risk B / Risk C
>>
>> Odds_A = probability of event A / probability of not event A
>> Odds_B = probability of event B / probability of not event B
>> Odds_C = probability of event C / probability of not event C
>>
>> Odds Ratio_A_B = Odds_A / Odds_B
>> Odds Ratio_A_C = Odds_A / Odds_C
>> Odds Ratio_B_C = Odds_B / Odds_C
>>
>> Now, the first example I provide below shows that the parameter
>> estimates obtained from the generalized logits multinomial regression
>> model with no predictors below are equivalent to log(Relative Risks).
>>
>> data list list / Y.
>> begin data
>> 1
>> 1
>> 1
>> 2
>> 2
>> 2
>> 2
>> 2
>> 2
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> end data.
>>
>> FREQUENCIES VARIABLES=Y
>> /ORDER=ANALYSIS.
>>
>> COMPUTE RR_1_3_raw = 16.666666666666664 / 50.
>> COMPUTE RR_2_3_raw = 33.33333333333333 � / 50.
>> EXECUTE.
>>
>> NOMREG Y (BASE = LAST ORDER=ASCENDING)
>> /MODEL
>> /INTERCEPT=INCLUDE
>> /PRINT=PARAMETER .
>>
>> COMPUTE RR_1_3 = exp(-1.0986122886681096).
>> COMPUTE RR_2_3 = exp(-0.4054651081081645).
>> EXECUTE.
>>
>> It should be clear from the example above that the parameter estimates
>> can certainly be interpreted as log(Relative Risks). Those calculated
>> from CROSSTABS using the definitional formulas are exactly the same as
>> those output from NOMREG. Now, let's add a dichotomous predictor to
>> the model to see what happens. I provide further comments after this
>> code.
>>
>> data list list / Y X.
>> begin data
>> 1 1
>> 1 1
>> 1 0
>> 2 1
>> 2 1
>> 2 1
>> 2 0
>> 2 1
>> 2 1
>> 3 1
>> 3 1
>> 3 0
>> 3 1
>> 3 1
>> 3 1
>> 3 0
>> 3 1
>> 3 1
>> end data.
>>
>> CROSSTABS
>> � /TABLES=Y BY X
>> � /FORMAT=AVALUE TABLES
>> � /CELLS=COUNT ROW COLUMN
>> � /COUNT ROUND CELL.
>>
>> COMPUTE RR_1_3_X0_Raw = (25 / 50) .
>> COMPUTE RR_1_3_X1_Raw = (14.285714285714285 / 50).
>> COMPUTE RRR_1_3_Raw = RR_1_3_X1_Raw / RR_1_3_X0_Raw.
>> EXECUTE.
>>
>> COMPUTE RR_2_3_X0_Raw = (25 / 50) .
>> COMPUTE RR_2_3_X1_Raw = (35.714285714285715 / 50).
>> COMPUTE RRR_2_3_Raw = RR_2_3_X1_Raw / RR_2_3_X0_Raw.
>> EXECUTE.
>>
>> NOMREG Y (BASE = LAST ORDER=ASCENDING) WITH X
>> /MODEL X
>> /INTERCEPT=INCLUDE
>> /PRINT=PARAMETER .
>>
>> COMPUTE RRR_1_3 = exp(-0.5596157879354635).
>> COMPUTE RRR_2_3 = exp(0.35667494393875165).
>> EXECUTE.
>>
>> Again, I calculated the estimates using the probability estimates from
>> CROSSTABS. Then I compared those estimates to exponentiated estimates
>> from NOMREG. As expected, they [relative risk ratios] are identical.
>>
>> Ryan
>>
>> On Wed, Dec 1, 2010 at 10:07 PM, Bruce Weaver <[hidden email]>
>> wrote:
>>> I responded to Ryan off-list to ask if he meant to say that the parameter
>>> estimates are interpreted as the change in the log(odds) relative to a
>>> reference category. � He responded that he did indeed mean log(risk), not
>>> log(odds); and we have been having a vigorous back and forth discussion
>>> since, exchanging examples and links. � Here is one link I sent to Ryan:
>>>
>>> � � http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#estimates
>>>
>>> And here's one he just sent to me (which I've not read yet--it's bed time
>>> here).
>>>
>>> � � http://www.columbia.edu/~so33/SusDev/Lecture_10.pdf
>>>
>>> Just thought I'd post this, in case anyone else was interested. � I may
>>> have
>>> some more to say after reading that last document.
>>>
>>> Cheers,
>>> Bruce
>>>
>>>
>>> R B wrote:
>>>>
>>>> Stefan,
>>>>
>>>> What do you mean by the following statement?: "...if the overall model
>>>> is significant, I can conclude that there is a significant
>>>> relationship between the dependent and independent variable for all
>>>> categories of the dependent variable?"
>>>>
>>>> In the typical multinomial logistic regression assuming a single
>>>> continuous predictor, X, the parameter estimates are interpreted as
>>>> the change in the log(risk relative to the reference category), given
>>>> a one-point increase in X. The parameter estimates do NOT reflect a
>>>> change in the log(risk) of observing each category, given a one-point
>>>> increase in X. Moreover, it's certainly possible to observe a
>>>> non-significant log(relative risk) in the presence an overall
>>>> significant model effect.
>>>>
>>>> Ryan
>>>>
>>>> On Tue, Nov 30, 2010 at 7:49 AM, s-volk <[hidden email]>
>>>> wrote:
>>>>> Dear all,
>>>>>
>>>>> I ran a multinomial logistic regression analysis with one continuous
>>>>> independent variable. I have a sample size of 68 subjects
>>>>> (psychological
>>>>> experiment) which end up split into 5 categories ranging in size from 5
>>>>> to
>>>>> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and
>>>>> Model
>>>>> Fit χ2=19.71, p<0.01.
>>>>>
>>>>> Now here is the problem: A reviewer complains that my results may be
>>>>> sample
>>>>> specific because one of the 5 categories of the dependent variable
>>>>> consists
>>>>> of only 5 observations (subjects), i.e. sh/e argues that very few
>>>>> participants (five) are responsible for the observed effects. Is this
>>>>> valid
>>>>> argument? I thought that if the overall model is significant, I can
>>>>> conclude
>>>>> that there is a significant relationship between the dependent and
>>>>> independent variable for all categories of the dependent variable? That
>>>>> is,
>>>>> the calculations for the overall model are based on all observations
>>>>> (68)
>>>>> and not only on the observations in specific categories (e.g., 5)?
>>>>>
>>>>> I was wondering if someone could provide me with or point me to some
>>>>> arguments for reviewers (ideally including some references)?
>>>>>
>>>>> Many thanks in advance,
>>>>> Stefan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
>>>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>>>
>>>>> =====================
>>>>> To manage your subscription to SPSSX-L, send a message to
>>>>> [hidden email] (not to SPSSX-L), with no body text except
>>>>> the
>>>>> command. To leave the list, send the command
>>>>> SIGNOFF SPSSX-L
>>>>> For a list of commands to manage subscriptions, send the command
>>>>> INFO REFCARD
>>>>>
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> [hidden email]
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3288831.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3289655.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

Bruce Weaver

Re: Multinomial Logistic Regression - Category Size

Administrator

Fair enough, Ryan. I've still not had time to look at those lecture notes you gave a link for, but will try to get to them sometime soon. Meanwhile, I'll keep calling it an odds ratio. ;-)

Cheers,
Bruce

R B wrote

Bruce,

I decided to calculate what you call "First OR" using the RRR formula
I presented previously. Lo and behold, our estimates are the same.

RRR1 = ((6 / (6 + 8 + 38)) / (38 / (6 + 8 + 38))) / ((51 / (51 + 42 +
518)) / (518 / (51 + 42 + 518)))
= 1.6037151703

You can see in the formula above that I am calculating two relative
risk estimates and then dividing those two estimates. I am comfortable
interpreting this estimate as a relative risk ratio because of the way
I set up the equation. Interpretation of this estimate as a RRR is
common IMO. Having said that, I have also seen places where this
estimate is interpreted as an odds ratio.

Best wishes,

Ryan

On Thu, Dec 2, 2010 at 11:37 AM, Bruce Weaver <bruce.weaver@hotmail.com> wrote:
> Here is an example demonstrating equivalence of Exp(B) from NOMREG with odds
> ratios computed via CROSSTABS.
>
> Cheers,
> Bruce
>
> * Multinomial logistic regression on a 4x3 table.
> * The data are from
> http://www.angelfire.com/wv/bwhomedir/notes/multinomial_log_reg.pdf .
>
> data list list / X Y kount (3f5.0).
> begin data
> 1 1 6
> 1 2 8
> 1 3 38
> 2 1 13
> 2 2 29
> 2 3 55
> 3 1 20
> 3 2 33
> 3 3 160
> 4 1 51
> 4 2 42
> 4 3 518
> end data.
>
> var lab
> � x 'Functional status'
> � y 'ICU Code Status'
> .
> val lab
> � x 1 'Unknown'
> � 2 'Severely limited'
> � 3 'Somewhat limited'
> � 4 'Totally independent' /
> � y 1 'Explicit: Resuscitate'
> � 2 'Explicit: DNR'
> � 3 'Implicit: Resuscitate'
> .
>
> weight by kount.
> crosstabs x by y .
>
> NOMREG Y (BASE=LAST ORDER=ASCENDING) BY X
> � /MODEL = X
> � /INTERCEPT=INCLUDE
> � /PRINT=PARAMETER SUMMARY LRT STEP MFI.
>
> * Notice that the Exp(B) values from this model match exactly
> * the odds ratios computed in the document.
>
> * Now compute the same odds ratios via CROSSTABS.
>
> * First OR.
> temporary.
> select if any(X,1,4) and any(Y,1,3).
> crosstabs x by y / stat = risk.
>
> * Second OR.
> temporary.
> select if any(X,2,4) and any(Y,1,3).
> crosstabs x by y / stat = risk.
>
> * Third OR.
> temporary.
> select if any(X,3,4) and any(Y,1,3).
> crosstabs x by y / stat = risk.
>
> * Fourth OR.
> temporary.
> select if any(X,1,4) and any(Y,2,3).
> crosstabs x by y / stat = risk.
>
> * Fifth OR.
> temporary.
> select if any(X,2,4) and any(Y,2,3).
> crosstabs x by y / stat = risk.
>
> * Sixth OR.
> temporary.
> select if any(X,3,4) and any(Y,2,3).
> crosstabs x by y / stat = risk.
>
> * Notice that the odds ratios & 95% confidence intervals
> * obtained via CROSSTABS match exactly the values of
> * Exp(B) from NOMREG and their 95% confidence intervals.
>
>
>
>
> R B wrote:
>>
>> Bruce et al.,
>>
>> I have little doubt that the parameter estimates obtained from a
>> generalized logits multinomial regression without any predictors yield
>> log(relative risks), and assuming predictors are in the model,
>> relative risk ratios. Allow me to provide a couple simple examples
>> (without a predictor and with a dichotomous predictor) here to provide
>> evidence in support of what I've stated. But before I do, let's make
>> sure we all agree on some basic definitions within a logistic
>> regression framework:
>>
>> Risk_A = probability of event A
>> Risk_B = probability of event B
>> Risk_C = probability of event C
>>
>> Relative Risk_A_B = Risk A / Risk B
>> Relative Risk_A_C = Risk A / Risk C
>> Relative Risk_B_C = Risk B / Risk C
>>
>> Odds_A = probability of event A / probability of not event A
>> Odds_B = probability of event B / probability of not event B
>> Odds_C = probability of event C / probability of not event C
>>
>> Odds Ratio_A_B = Odds_A / Odds_B
>> Odds Ratio_A_C = Odds_A / Odds_C
>> Odds Ratio_B_C = Odds_B / Odds_C
>>
>> Now, the first example I provide below shows that the parameter
>> estimates obtained from the generalized logits multinomial regression
>> model with no predictors below are equivalent to log(Relative Risks).
>>
>> data list list / Y.
>> begin data
>> 1
>> 1
>> 1
>> 2
>> 2
>> 2
>> 2
>> 2
>> 2
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> 3
>> end data.
>>
>> FREQUENCIES VARIABLES=Y
>> /ORDER=ANALYSIS.
>>
>> COMPUTE RR_1_3_raw = 16.666666666666664 / 50.
>> COMPUTE RR_2_3_raw = 33.33333333333333 � / 50.
>> EXECUTE.
>>
>> NOMREG Y (BASE = LAST ORDER=ASCENDING)
>> /MODEL
>> /INTERCEPT=INCLUDE
>> /PRINT=PARAMETER .
>>
>> COMPUTE RR_1_3 = exp(-1.0986122886681096).
>> COMPUTE RR_2_3 = exp(-0.4054651081081645).
>> EXECUTE.
>>
>> It should be clear from the example above that the parameter estimates
>> can certainly be interpreted as log(Relative Risks). Those calculated
>> from CROSSTABS using the definitional formulas are exactly the same as
>> those output from NOMREG. Now, let's add a dichotomous predictor to
>> the model to see what happens. I provide further comments after this
>> code.
>>
>> data list list / Y X.
>> begin data
>> 1 1
>> 1 1
>> 1 0
>> 2 1
>> 2 1
>> 2 1
>> 2 0
>> 2 1
>> 2 1
>> 3 1
>> 3 1
>> 3 0
>> 3 1
>> 3 1
>> 3 1
>> 3 0
>> 3 1
>> 3 1
>> end data.
>>
>> CROSSTABS
>> � /TABLES=Y BY X
>> � /FORMAT=AVALUE TABLES
>> � /CELLS=COUNT ROW COLUMN
>> � /COUNT ROUND CELL.
>>
>> COMPUTE RR_1_3_X0_Raw = (25 / 50) .
>> COMPUTE RR_1_3_X1_Raw = (14.285714285714285 / 50).
>> COMPUTE RRR_1_3_Raw = RR_1_3_X1_Raw / RR_1_3_X0_Raw.
>> EXECUTE.
>>
>> COMPUTE RR_2_3_X0_Raw = (25 / 50) .
>> COMPUTE RR_2_3_X1_Raw = (35.714285714285715 / 50).
>> COMPUTE RRR_2_3_Raw = RR_2_3_X1_Raw / RR_2_3_X0_Raw.
>> EXECUTE.
>>
>> NOMREG Y (BASE = LAST ORDER=ASCENDING) WITH X
>> /MODEL X
>> /INTERCEPT=INCLUDE
>> /PRINT=PARAMETER .
>>
>> COMPUTE RRR_1_3 = exp(-0.5596157879354635).
>> COMPUTE RRR_2_3 = exp(0.35667494393875165).
>> EXECUTE.
>>
>> Again, I calculated the estimates using the probability estimates from
>> CROSSTABS. Then I compared those estimates to exponentiated estimates
>> from NOMREG. As expected, they [relative risk ratios] are identical.
>>
>> Ryan
>>
>> On Wed, Dec 1, 2010 at 10:07 PM, Bruce Weaver <bruce.weaver@hotmail.com>
>> wrote:
>>> I responded to Ryan off-list to ask if he meant to say that the parameter
>>> estimates are interpreted as the change in the log(odds) relative to a
>>> reference category. � He responded that he did indeed mean log(risk), not
>>> log(odds); and we have been having a vigorous back and forth discussion
>>> since, exchanging examples and links. � Here is one link I sent to Ryan:
>>>
>>> � � http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#estimates
>>>
>>> And here's one he just sent to me (which I've not read yet--it's bed time
>>> here).
>>>
>>> � � http://www.columbia.edu/~so33/SusDev/Lecture_10.pdf
>>>
>>> Just thought I'd post this, in case anyone else was interested. � I may
>>> have
>>> some more to say after reading that last document.
>>>
>>> Cheers,
>>> Bruce
>>>
>>>
>>> R B wrote:
>>>>
>>>> Stefan,
>>>>
>>>> What do you mean by the following statement?: "...if the overall model
>>>> is significant, I can conclude that there is a significant
>>>> relationship between the dependent and independent variable for all
>>>> categories of the dependent variable?"
>>>>
>>>> In the typical multinomial logistic regression assuming a single
>>>> continuous predictor, X, the parameter estimates are interpreted as
>>>> the change in the log(risk relative to the reference category), given
>>>> a one-point increase in X. The parameter estimates do NOT reflect a
>>>> change in the log(risk) of observing each category, given a one-point
>>>> increase in X. Moreover, it's certainly possible to observe a
>>>> non-significant log(relative risk) in the presence an overall
>>>> significant model effect.
>>>>
>>>> Ryan
>>>>
>>>> On Tue, Nov 30, 2010 at 7:49 AM, s-volk <stefan.volk@uni-tuebingen.de>
>>>> wrote:
>>>>> Dear all,
>>>>>
>>>>> I ran a multinomial logistic regression analysis with one continuous
>>>>> independent variable. I have a sample size of 68 subjects
>>>>> (psychological
>>>>> experiment) which end up split into 5 categories ranging in size from 5
>>>>> to
>>>>> 24 (dependent variable). The MLR-model has a Nagelkerke R2 of 0.27 and
>>>>> Model
>>>>> Fit χ2=19.71, p<0.01.
>>>>>
>>>>> Now here is the problem: A reviewer complains that my results may be
>>>>> sample
>>>>> specific because one of the 5 categories of the dependent variable
>>>>> consists
>>>>> of only 5 observations (subjects), i.e. sh/e argues that very few
>>>>> participants (five) are responsible for the observed effects. Is this
>>>>> valid
>>>>> argument? I thought that if the overall model is significant, I can
>>>>> conclude
>>>>> that there is a significant relationship between the dependent and
>>>>> independent variable for all categories of the dependent variable? That
>>>>> is,
>>>>> the calculations for the overall model are based on all observations
>>>>> (68)
>>>>> and not only on the observations in specific categories (e.g., 5)?
>>>>>
>>>>> I was wondering if someone could provide me with or point me to some
>>>>> arguments for reviewers (ideally including some references)?
>>>>>
>>>>> Many thanks in advance,
>>>>> Stefan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3286013.html
>>>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>>>
>>>>> =====================
>>>>> To manage your subscription to SPSSX-L, send a message to
>>>>> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except
>>>>> the
>>>>> command. To leave the list, send the command
>>>>> SIGNOFF SPSSX-L
>>>>> For a list of commands to manage subscriptions, send the command
>>>>> INFO REFCARD
>>>>>
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> bweaver@lakeheadu.ca
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3288831.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
> -----
> --
> Bruce Weaver
> bweaver@lakeheadu.ca
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multinomial-Logistic-Regression-Category-Size-tp3286013p3289655.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD