logistic regression: number of cases per category

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

logistic regression: number of cases per category

student09
Hello everybody,

I would be glad for advice concerning the followin problem:

The sample size for my logistic regression with four predictor variables appears somewhat small (N= 160) to me. Regarding the dichotomous dependent variable, there are 20 cases for category (a) and 140 cases for category (b). I wonder whether this might induce any problems for the results - is it ok to conduct a logistic regression with just 20 cases in one of the two categories for the dependent variable? Are there any references available regarding adeqaute sample sizes in logistic regression?  

Many thanks!
Jan
Reply | Threaded
Open this post in threaded view
|

Re: logistic regression: number of cases per category

Bruce Weaver
Administrator
student09 wrote
Hello everybody,

I would be glad for advice concerning the followin problem:

The sample size for my logistic regression with four predictor variables appears somewhat small (N= 160) to me. Regarding the dichotomous dependent variable, there are 20 cases for category (a) and 140 cases for category (b). I wonder whether this might induce any problems for the results - is it ok to conduct a logistic regression with just 20 cases in one of the two categories for the dependent variable? Are there any references available regarding adeqaute sample sizes in logistic regression?  

Many thanks!
Jan
Frank Harrell, author of the book "Regression Modeling Strategies", advocates a 20:1 rule, meaning 20 events per candidate predictor variable.  See the section on overfitting here:

   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist

Personally, I am comfortable relaxing that to 15:1, or even 10:1 at times, although 10:1 is really pushing it.  For more on overfitting, see the nice article by Mike Babyak, available here:

   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: logistic regression: number of cases per category

waqas imran
Mr Jan!

The biggest problem that comes to small samples is their testing. During modelling when the parameters movetoward the testing they mostly appears to be very large over or under estmated and does not shows any significance of the model. You can face problem during justification of interpretation of model estimates.

In SPSS the minimum cases accpeted for the logistic regression are 60. It doesn't matter, whether any catagory of your Dept. variable has less then 60 but the sum of all catagories of dependent variable (whether its dichotomous or nominal or ordinal or catagorical) should not be less than 60. Nut in such situation you will face the problems which i mentioned in my first para.

Do not make more than 2 two catagories of the indicators, try to supress them so that numbers would not distribute too much.

or

Try probit, tobit models e.t.c. or another from jthe literature relevent to models for qualitative data or for dummy/dichotomous dependent variables.

Best of Luck
Reply | Threaded
Open this post in threaded view
|

Re: logistic regression: number of cases per category

Ryan
In reply to this post by Bruce Weaver
I, too, have read these rules of thumb, and I generally adhere to them
when fitting a logistic regression model using maximum likelihood
estimation. However, it is worth noting that there are exact methods
that have been shown to perform well with small sample sizes/rare
events. Having said that, as far as I'm aware, the latest version of
SPSS does not offer a procedure to fit logistic regression using exact
methods.

Ryan

On Thu, Dec 23, 2010 at 7:10 AM, Bruce Weaver <[hidden email]> wrote:

> student09 wrote:
>>
>> Hello everybody,
>>
>> I would be glad for advice concerning the followin problem:
>>
>> The sample size for my logistic regression with four predictor variables
>> appears somewhat small (N= 160) to me. Regarding the dichotomous dependent
>> variable, there are 20 cases for category (a) and 140 cases for category
>> (b). I wonder whether this might induce any problems for the results - is
>> it ok to conduct a logistic regression with just 20 cases in one of the
>> two categories for the dependent variable? Are there any references
>> available regarding adeqaute sample sizes in logistic regression?
>>
>> Many thanks!
>> Jan
>>
>
> Frank Harrell, author of the book "Regression Modeling Strategies",
> advocates a 20:1 rule, meaning 20 events per candidate predictor variable.
> See the section on overfitting here:
>
>   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist
>
> Personally, I am comfortable relaxing that to 15:1, or even 10:1 at times,
> although 10:1 is really pushing it.  For more on overfitting, see the nice
> article by Mike Babyak, available here:
>
>   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf
>
> HTH.
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/logistic-regression-number-of-cases-per-category-tp3316148p3316269.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Automatic reply: logistic regression: number of cases per category

Genevieve Odoom
Hello,
Thank you for your email. I will be out of the office on  Wednesday, December 29th, returning on Thursday, December 30th and will respond to your message when I return.



Thanks!
Genevieve Odoom
Policy and Program Analyst
OANHSS
Suite 700 - 7050 Weston Rd. Woodbridge,
ON L4L 8G7
Tel: (905) 851-8821 x 241 Fax: (905) 851-0744
[hidden email]
www.oanhss.org<https://mail.oanhss.org/ecp/Organize/www.oanhss.org>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: logistic regression: number of cases per category

Bruce Weaver
Administrator
In reply to this post by Ryan
Good point, Ryan.  Here's the website for LogXact:

   http://www.cytel.com/Software/LogXact.aspx

I believe the demo is good for 30 days.


R B wrote
I, too, have read these rules of thumb, and I generally adhere to them
when fitting a logistic regression model using maximum likelihood
estimation. However, it is worth noting that there are exact methods
that have been shown to perform well with small sample sizes/rare
events. Having said that, as far as I'm aware, the latest version of
SPSS does not offer a procedure to fit logistic regression using exact
methods.

Ryan

On Thu, Dec 23, 2010 at 7:10 AM, Bruce Weaver <bruce.weaver@hotmail.com> wrote:
> student09 wrote:
>>
>> Hello everybody,
>>
>> I would be glad for advice concerning the followin problem:
>>
>> The sample size for my logistic regression with four predictor variables
>> appears somewhat small (N= 160) to me. Regarding the dichotomous dependent
>> variable, there are 20 cases for category (a) and 140 cases for category
>> (b). I wonder whether this might induce any problems for the results - is
>> it ok to conduct a logistic regression with just 20 cases in one of the
>> two categories for the dependent variable? Are there any references
>> available regarding adeqaute sample sizes in logistic regression?
>>
>> Many thanks!
>> Jan
>>
>
> Frank Harrell, author of the book "Regression Modeling Strategies",
> advocates a 20:1 rule, meaning 20 events per candidate predictor variable.
> See the section on overfitting here:
>
>   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist
>
> Personally, I am comfortable relaxing that to 15:1, or even 10:1 at times,
> although 10:1 is really pushing it.  For more on overfitting, see the nice
> article by Mike Babyak, available here:
>
>   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf
>
> HTH.
>
> -----
> --
> Bruce Weaver
> bweaver@lakeheadu.ca
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/logistic-regression-number-of-cases-per-category-tp3316148p3316269.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Automatic reply: logistic regression: number of cases per category

Genevieve Odoom
Hello,
Thank you for your email. I will be out of the office on  Tuesday, January 4th, returning on Wednesday, January 5th and will respond to your message when I return.


Thanks!
Genevieve Odoom
Policy and Program Analyst
OANHSS
Suite 700 - 7050 Weston Rd. Woodbridge,
ON L4L 8G7
Tel: (905) 851-8821 x 241 Fax: (905) 851-0744
[hidden email]
www.oanhss.org<https://mail.oanhss.org/ecp/Organize/www.oanhss.org>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

SPSS Community Website Progress

Jon K Peck
In reply to this post by Bruce Weaver
The SPSS Community website at www.ibm.com/developerworks/spssdevcentral is the successor to the SPSS Developer Central website, which will be closed down later in 2011.  The new site includes downloads, articles, forums, a blog, and other useful items for IBM SPSS Statistics and other IBM SPSS products.

We have now made available on the Community site the IBM SPSS Statistics Essentials for programmability using Python and .NET for Statistics versions 18 and 19.  The site also has the SDKs (Software Development Kit) underlying these items.  The Essentials for the Statistics patch release, version 19.0.1, are only available on the new site.  The R Essentials, however, are not yet available there.

Items that are available on the SPSS Community site are not available on the Developer Central site, which is no longer being updated.  Older plugins and Essentials should still be obtained from Developer Central.

There is also a new IBM Modeler article, Mining Your Warranty Data – Finding Anomalies (Part 1) on the site in the articles section.

Regards,
Jon Peck
Senior Software Engineer, IBM
[hidden email]