Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

jmdpulido
Dear all,

I use SPSS. I run a logistic regression model using the LOGISTIC command. As
my "readers" do not understand odds ratios, I use excel and I compute a
"predicted probability" for the average individual, using the average value
in my sample for continuous regressors and the relative frequency for each
category of the categorical regressors. Using the logistic formula I easily
calculate in excel the predicted probabilities for the average individual
belonging to each category, and the marginal probability increase of a unit
increase in a continous predictor.

Nevertheless, I need to present "confidence intervals" for my "predicted
probabilities" of the average individual (in all other variables) belonging
to each category of my categorical predictors. In OLS it is easy to do so
using the formula, but how to calculate these "confidence intervals" for
prediected probabilities in logistic regression? What's the right formula? I
have checked many books on logistic regression, but I have not found the
answer.

I now Stata and other packages automatically do so. However,  I only have a
licence of SPSS. Therefore, I would appreciate very much if someone could
explain (or give me some reference of a paper or a book) where I can find the
formula for calculating the "confidence intervals" for the "predicted
probabilities" for the average person.

Thanks in advance

J. Pulido
PhD Student in Economics
Reply | Threaded
Open this post in threaded view
|

Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

Ryan
I think a similar question was asked recently. The generalized linear
model (GENLIN) procedure is capable of estimating confidence limits
about predicted probabilities. Take a look at the documentation, and
write back if you still have questions.

Ryan

On Mon, Mar 7, 2011 at 10:45 AM, jmdpulido <[hidden email]> wrote:

> Dear all,
>
> I use SPSS. I run a logistic regression model using the LOGISTIC command. As
> my "readers" do not understand odds ratios, I use excel and I compute a
> "predicted probability" for the average individual, using the average value
> in my sample for continuous regressors and the relative frequency for each
> category of the categorical regressors. Using the logistic formula I easily
> calculate in excel the predicted probabilities for the average individual
> belonging to each category, and the marginal probability increase of a unit
> increase in a continous predictor.
>
> Nevertheless, I need to present "confidence intervals" for my "predicted
> probabilities" of the average individual (in all other variables) belonging
> to each category of my categorical predictors. In OLS it is easy to do so
> using the formula, but how to calculate these "confidence intervals" for
> prediected probabilities in logistic regression? What's the right formula? I
> have checked many books on logistic regression, but I have not found the
> answer.
>
> I now Stata and other packages automatically do so. However,  I only have a
> licence of SPSS. Therefore, I would appreciate very much if someone could
> explain (or give me some reference of a paper or a book) where I can find
> the
> formula for calculating the "confidence intervals" for the "predicted
> probabilities" for the average person.
>
> Thanks in advance
>
> J. Pulido
> PhD Student in Economics
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Logistic-Regression-Help-Needed-Conf-Intervals-for-Predicted-Prob-for-the-average-individual-tp3412616p3412616.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

Bruce Weaver
Administrator
In reply to this post by jmdpulido
I would use GENLIN rather than LOGISTIC REGRESSION, and use /EMMEANS to get fitted values with confidence intervals.  If you use the SCALE=TRANSFORMED option (rather than the default SCALE=ORIGINAL), the "means" will be log-odds.  Exponentiate the values you see in the table to convert to odds; then if you really want predicted probabilities, apply the usual transformation from odds to probabilities.

   http://www.graphpad.com/faq/viewfaq.cfm?faq=1466

By the way, for GENLIN, you need to specify DISTRIBUTION=BINOMIAL LINK=LOGIT, and that (REFERENCE=FIRST) for the outcome variable--the default is (REFERENCE=LAST).  You may also have to change (ORDER=ASCENDING) to (ORDER=DESCENDING) for any categorical explanatory variables to match the output you get from LOGISTIC REGRESSION.

HTH.


jmdpulido wrote
Dear all,

I use SPSS. I run a logistic regression model using the LOGISTIC command. As
my "readers" do not understand odds ratios, I use excel and I compute a
"predicted probability" for the average individual, using the average value
in my sample for continuous regressors and the relative frequency for each
category of the categorical regressors. Using the logistic formula I easily
calculate in excel the predicted probabilities for the average individual
belonging to each category, and the marginal probability increase of a unit
increase in a continous predictor.

Nevertheless, I need to present "confidence intervals" for my "predicted
probabilities" of the average individual (in all other variables) belonging
to each category of my categorical predictors. In OLS it is easy to do so
using the formula, but how to calculate these "confidence intervals" for
prediected probabilities in logistic regression? What's the right formula? I
have checked many books on logistic regression, but I have not found the
answer.

I now Stata and other packages automatically do so. However,  I only have a
licence of SPSS. Therefore, I would appreciate very much if someone could
explain (or give me some reference of a paper or a book) where I can find the
formula for calculating the "confidence intervals" for the "predicted
probabilities" for the average person.

Thanks in advance

J. Pulido
PhD Student in Economics
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

Hector Maletta
In reply to this post by Ryan
Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

I agree. However, remember that the “average individual” is not necessarily a representative one. Predictors are usually correlated and not necessarily normally distributed in the sample, and therefore the “centroid” (the individual with average value in all predictors) is not necessarily modal or very frequent. In particular, binary predictors have a fractional average, giving you an “average gender” of 0.6 in a sample with 60% women (where female=1 and male=0) and an “average afroethnicity” of 0.20 if the percentage of blacks in the sample is 20%. Those “averages” make little empirical sense. Perhaps it is better to compute probabilities for interesting subsets of individuals (by gender, ethnicity or whatever).

On the other hand, probabilities are not predicable of individuals, but of groups. If female highschool dropouts of age 18-25 have a probability of 0.3, that means that 30% of them are expected to suffer the event in question, but says nothing about each particular member of the group, who may suffer or not suffer the event (values 0 or 1), which actual occurrence is undetermined for each individual case. By the same token, confidence intervals are also referred to the GROUP of people whose predicted probability is p: the confidence interval says that AMONG MANY POSSIBLE SAMPLES OF INDIVIDUALS, that particular group will have various values of p in the various samples, with the average across samples tending to the “true” value of p in the population, and a normal distribution of sample ps around that across-samples average. The confidence intervals do not refer to each individual’s estimated value of p.

 

Hector

 

 

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de R B
Enviado el: Monday, March 07, 2011 13:22
Para: [hidden email]
Asunto: Re: Logistic Regression Help Needed: Conf.
Intervals for Predicted Prob. for the average individual.

 

I think a similar question was asked recently. The generalized linear
model (GENLIN) procedure is capable of estimating confidence limits
about predicted probabilities. Take a look at the documentation, and
write back if you still have questions.

Ryan

On Mon, Mar 7, 2011 at 10:45 AM, jmdpulido <[hidden email]> wrote:


> Dear all,
>
> I use SPSS. I run a logistic regression model using the LOGISTIC command. As
> my "readers" do not understand odds ratios, I use excel and I compute a
> "predicted probability" for the average individual, using the average value
> in my sample for continuous regressors and the relative frequency for each
> category of the categorical regressors. Using the logistic formula I easily
> calculate in excel the predicted probabilities for the average individual
> belonging to each category, and the marginal probability increase of a unit
> increase in a continous predictor.
>
> Nevertheless, I need to present "confidence intervals" for my "predicted
> probabilities" of the average individual (in all other variables) belonging
> to each category of my categorical predictors. In OLS it is easy to do so
> using the formula, but how to calculate these "confidence intervals" for
> prediected probabilities in logistic regression? What's the right formula? I
> have checked many books on logistic regression, but I have not found the
> answer.
>
> I now Stata and other packages automatically do so. However,  I only have a
> licence of SPSS. Therefore, I would appreciate very much if someone could
> explain (or give me some reference of a paper or a book) where I can find
> the
> formula for calculating the "confidence intervals" for the "predicted
> probabilities" for the average person.
>
> Thanks in advance
>
> J. Pulido
> PhD Student in Economics
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Logistic-Regression-Help-Needed-Conf-Intervals-for-Predicted-Prob-for-the-average-individual-tp3412616p3412616.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1204 / Virus Database: 1435/3487 - Release Date: 03/07/11

Reply | Threaded
Open this post in threaded view
|

Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

Ryan
In reply to this post by Ryan
All,

I am not convinced that the EMMEANS subcommand in GENLIN procedure is
accurately estimating confidence limits about the predicted
probabilities. I agree with Bruce that the best approach would be to
first compute the lower and upper limits on the log-odds scale,
exponentiate the values, and then apply the transformation. I just
assumed GENLIN was doing this when estimating probabilities via the
EMMEANS subcommand; I now believe this assumption was incorrect. (I
actually think I looked into this before.)

I provide a simulation program BELOW my name to help show the
relationship between the log-odds and probability in the context of
logistic regression.

Also, here's a brief explanation of logistic regression by one of the
brilliant posters on SAS-L.

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0705C&L=sas-l&P=R12617

HTH.

Ryan

set seed 35963452.

new file.
inp pro.

loop ID= 1 to 1000.

   comp p1 = 0.55.
   comp p2 = 0.25

   comp logodds1 = ln(p1/(1-p1)).
   comp logodds2 = ln(p2/(1-p2)).

   comp b0 = logodds2.
   comp b1 = logodds1 - logodds2.

   comp group = rv.bernoulli(0.5).

   comp eta = b0 + b1*(group=0).
   comp p = exp(eta) / (1+exp(eta)).
   comp y = rv.bernoulli(p).

   end case.
 end loop.
end file.
end inp pro.
exe.

delete variables p1 p2 logodds1 logodds2 b0 b1 eta p.

GENLIN y (REFERENCE=FIRST) BY group (ORDER=ASCENDING)
  /MODEL group INTERCEPT=YES
 DISTRIBUTION=BINOMIAL LINK=LOGIT
  /EMMEANS TABLES=group SCALE=TRANSFORMED
  /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

On Mon, Mar 7, 2011 at 11:21 AM, R B <[hidden email]> wrote:

> I think a similar question was asked recently. The generalized linear
> model (GENLIN) procedure is capable of estimating confidence limits
> about predicted probabilities. Take a look at the documentation, and
> write back if you still have questions.
>
> Ryan
>
> On Mon, Mar 7, 2011 at 10:45 AM, jmdpulido <[hidden email]> wrote:
>> Dear all,
>>
>> I use SPSS. I run a logistic regression model using the LOGISTIC command. As
>> my "readers" do not understand odds ratios, I use excel and I compute a
>> "predicted probability" for the average individual, using the average value
>> in my sample for continuous regressors and the relative frequency for each
>> category of the categorical regressors. Using the logistic formula I easily
>> calculate in excel the predicted probabilities for the average individual
>> belonging to each category, and the marginal probability increase of a unit
>> increase in a continous predictor.
>>
>> Nevertheless, I need to present "confidence intervals" for my "predicted
>> probabilities" of the average individual (in all other variables) belonging
>> to each category of my categorical predictors. In OLS it is easy to do so
>> using the formula, but how to calculate these "confidence intervals" for
>> prediected probabilities in logistic regression? What's the right formula? I
>> have checked many books on logistic regression, but I have not found the
>> answer.
>>
>> I now Stata and other packages automatically do so. However,  I only have a
>> licence of SPSS. Therefore, I would appreciate very much if someone could
>> explain (or give me some reference of a paper or a book) where I can find
>> the
>> formula for calculating the "confidence intervals" for the "predicted
>> probabilities" for the average person.
>>
>> Thanks in advance
>>
>> J. Pulido
>> PhD Student in Economics
>>
>> --
>> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Logistic-Regression-Help-Needed-Conf-Intervals-for-Predicted-Prob-for-the-average-individual-tp3412616p3412616.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

jmdpulido
In reply to this post by Hector Maletta
Dear Hector,

Thanks for your precisions about the "average person". Of course it is not modal or even very frequent. And of course, logistic regression is not linear on probability, so coefficients will have a different effect on probability depending on where you evaluate the function.

Nevertheless, I find it usefull to show the differencial effect on probability of changing from one category to another (in categorical predictors).

Usually, when I present my results I calculate predicted probabilities for certain groups (i.e. women age 25-34, working class, etc...). Nevetheless, in this case (when I do not use the centroids) I do not know how to calculate the "confidence interval" for the predicted probability for this group. The command EMMeans in GENLIN (as Bruce kindly explained) can do it for the centroids, but not for a specific value of other independent variables.

Do you have any advice on how to calculate confidence intervals for certain groups (e.g. women age 25-34, working class)?
Reply | Threaded
Open this post in threaded view
|

Re: Logistic Regression Help Needed: Conf. Intervals for Predicted Prob. for the average individual.

jmdpulido
In reply to this post by Ryan
Dear All,

I have just checked what Ryan suggest. In fact, the EMMEANS command in GENLIN does not produce the same confidence intervales with option ORIGINAL that with option TRANSFORMED.

The difference is small, but it is no clear in which sense. I tired it in a very simple example. In fact, I was regressing a binary outcome (no=0 yes=1) on a categorial independent variables with 4 categories (academic level: 0 primary or less 1: secondary, etc...).

What I have found is that the predicted probability for each group is the same using EMMEANS with any option (transformed or original), but the confidence interval is slightly bigger in 2 categories and slightly smaller in 2 categories using the option TRANSFORMED instead of ORIGINAL.

I will search for some documentation to try to find out the reason for this change. By now, I will follow Bruce advise and use TRANSFORMED, and then change the limits of the confidence intervals to probabilities using the inverse logistic transformation.

I will appreciate any documentation on this topic.

I will post again when I find the explanation of this difference.

Kind Regards.