Overfitted Model

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Overfitted Model

E. Bernardo
What is "overfitted model" in classification modeling?
 
Thanks for any input.
Eins


Is he cheating on me?
Find Out on Yahoo Answers
Reply | Threaded
Open this post in threaded view
|

Re: Overfitted Model

SR Millis-3
Too many variables/covariates with too few subjects/observations/events.



Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sat, 8/22/09, Eins Bernardo <[hidden email]> wrote:

> From: Eins Bernardo <[hidden email]>
> Subject: Overfitted Model
> To: [hidden email]
> Date: Saturday, August 22, 2009, 8:38 PM
> What is "overfitted
> model" in classification modeling?
>
> Thanks for any input.
> Eins
>
>
> Is he cheating on me?
>  Find Out on Yahoo Answers

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Overfitted Model

E. Bernardo
This means that before the building of the model, we have already knowledge that the model is overfitted.  Is there a remedy to avoid overfitted without changing the number of variables and number of respondents?

--- On Sun, 8/23/09, SR Millis <[hidden email]> wrote:

From: SR Millis <[hidden email]>
Subject: Re: Overfitted Model
To: [hidden email]
Date: Sunday, 23 August, 2009, 12:42 AM

Too many variables/covariates with too few subjects/observations/events.



Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  smillis@...
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sat, 8/22/09, Eins Bernardo <einsbernardo@...> wrote:

> From: Eins Bernardo <einsbernardo@...>
> Subject: Overfitted Model
> To: SPSSX-L@...
> Date: Saturday, August 22, 2009, 8:38 PM
> What is "overfitted
> model" in classification modeling?
>
> Thanks for any input.
> Eins
>
>
> Is he cheating on me?
>  Find Out on Yahoo Answers

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Have a new Yahoo! Mail account?
Kick start your journey by importing all your contacts!
Reply | Threaded
Open this post in threaded view
|

Re: Overfitted Model

SR Millis-3
I think that we need to know more about your model, eg, what is the response variable (ie, continuous, binary, count, censored, etc), the total number of subjects (and, if binary, how many in each group), and the number of covariates/predictors entered into your model.

Harrell (2001) provides excellent advice on model development and evaluation. As he has discussed, when a model is overfitted, that is, when “…it has too many parameters to estimate the amount of information in the data, the worth of the model (e.g., R2) will be exaggerated and future values will not agree with predicted values” (p. 60) (Harrell F, 2001).  On the basis of models validated on independent datasets and simulation studies, sample size requirements are formulated as events per variable (EVP).  Several studies (Harrell FE Jr., Lee KL, Califf RM, Pryor DB, Rosati RA, 1984; Harrell FE Jr., Lee KL, Mark DB, 1996; Harrell FE Jr., Lee KL, Matchar DB, & Reichert TA, 1985) have shown the minimum EVP for obtaining reliable predictions is 10.  For binary outcome variables, the upper limit in determining the EVP is the smaller of the two groups (Harrell F, 2001).

If, after applying these guidelines, you still have too many vairables:

--Use the literature to eliminate unimportant variables.

--Eliminate variables whose distributions are too narrow.

--Eliminate variables that have a lot of missing data/observations.

--Consider using a data reduction technique like incomplete principal components regression.

--Examine the degree of collinearity among your covariates and eliminate offending variables.

--Consider using penalized methods.

--In extreme cases of too many variables with too few subjects (eg, microarray analysis), the penalized lasso seems promisingg.

Scott Millis

References
Harrell F (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. New York: Springer-Verlag.
Harrell FE Jr., Lee KL, Califf RM, Pryor DB, Rosati RA (1984). Regression modelling strategies for improved prognostic prediction. Stat Med, 3(2), 143-152.
Harrell FE Jr., Lee KL, Mark DB (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15(4), 361-387.
Harrell FE Jr., Lee KL, Matchar DB, & Reichert TA (1985). Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep, 69(10), 1071-1077.




--- On Sun, 8/23/09, Eins Bernardo <[hidden email]> wrote:

> From: Eins Bernardo <[hidden email]>
> Subject: Re: Overfitted Model
> To: [hidden email]
> Date: Sunday, August 23, 2009, 8:49 PM
> This
> means that before the building of the model, we have already
> knowledge that the model is overfitted.�  Is there a
> remedy to avoid overfitted without changing the number of
> variables and number of respondents?
>
> --- On Sun, 8/23/09, SR Millis
> <[hidden email]> wrote:
>
> From: SR Millis <[hidden email]>
> Subject: Re: Overfitted Model
> To: [hidden email]
> Date: Sunday, 23 August, 2009, 12:42 AM
>
> Too many variables/covariates with
> too few subjects/observations/events.
>
>
>
> Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
> Professor & Director of Research
> Dept of Physical Medicine & Rehabilitation
> Dept of Emergency Medicine
> Wayne State University
>  School of Medicine
> 261 Mack Blvd
> Detroit, MI 48201
> Email:�  [hidden email]
> Tel: 313-993-8085
> Fax: 313-966-7682
>
>
> --- On Sat, 8/22/09, Eins Bernardo <[hidden email]> wrote:
>
> > From: Eins Bernardo <[hidden email]>
> > Subject: Overfitted Model
> > To: [hidden email]
> > Date: Saturday, August 22, 2009, 8:38 PM
> > What is "overfitted
> > model" in classification modeling?
> >
> > Thanks for any input.
> > Eins
> >
> >
> > Is he cheating on
>  me?
> >�  Find Out on Yahoo Answers
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to
> SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
>
>
> Have a new Yahoo! Mail account?
> Kick start your journey by importing all your contacts!
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD