Professor N. Balakrishnan, editor-in-chief of the Encyclopedia of Statistical Sciences, recently identified Analysis of High-Dimensional Data as one of the most important emerging areas of statistical inquiry. In the second edition of “The Elements of Statistical Learning,” 2009, published by John Wiley, Hastie, Tibshirani, and Friedman added a chapter on High-Dimensional Problems, detailing their work and that of their protégés and students. High dimensional data are found in genomics, bioinformatics, chemometrics, and other areas. These data are characterized by relatively few samples and anywhere from dozens to hundreds of variables. In this situation, usual statistical approaches must be modified or new approaches must be developed. In addition, even when the number of cases exceeds the number of variables, the analyst is often interested in feature selection – finding the predictors that matter. While practitioners use forward selection, backward elimination, stepwise selection, and the like, these methods do not have good properties. With this in mind, we offer the following very timely workshop. Note: Workshop includes a discussion of the PLS method, principal components analysis, and other methods…
Regression Modeling with Many Correlated Predictors
Jay Magidson, Statistical Innovations
Tony Babinec, AB Analytics
Friday, April 8, 2011
8:30 am – 4:30 pm
Rush University Medical Center
1653 W Congress Parkway, Chicago, IL 60612
Sponsored by the Chicago Chapter of the American Statistical Association
http://www.chicagoasa.org
Abstract:
Recent advances in analysis of high dimensional data now allow reliable regression models to be developed even when the number of predictors exceeds the number of cases! In this course we begin by reviewing problems and limitations with traditional linear and logistic regression. Our applications-oriented presentation provides insight into how the new approaches work through examples and by providing an overview of the relevant theory, supplemented by the supporting equations. We use real and simulated data sets to illustrate the different approaches.
COURSE OUTLINE
Linear Regression Model
Bias-Variance Tradeoff
Logistic Regression and Discriminant Analysis
(BREAK)
Classification Tables and ROC Curves
Suppressor Variables
(LUNCH)
Penalized Regression Approaches
- Ridge Regression
- Lasso
- Elastic Net
Component Approaches
- Principal Components Regression
- Partial Least Squares Regression
- Correlated Components Regression – NEW Approach
(BREAK)
Ultra-high Dimensional Data
- Variable Reduction
- Extensions
Who Should Attend Marketing, biomedical and other researchers who want to improve their understanding of regression model development in the presence of many correlated predictors. Prerequisites Familiarity with linear and logistic regression analysis at an applied level. What you will learn · How to develop reliable models, even in the presence of extreme multicolinearity and when # predictors >> number of sample observations · Why many popular variable selection techniques are suboptimal · About a new powerful step-down variable reduction technique in CORExpressTM · About free and commercially available software for handing high dimensional data Software Note We will be discussing software in general, and how stepwise regression does not work at all well either in general or in the presence of multicolinearity or when the number of variables exceeds the number of cases. More specifically, Statistical Innovations is developing software currently named CORExpress. This software implements a version of Naïve Bayes regression as well as a new approach called correlated components regression along with various other approaches. We will present the results of simulation studies that include approaches implemented as R contributed packages, including especially GLMNET which implements the penalized approaches such as LASSO and Elastic Net. These latter are associated with Hastie, Tibshirani, and Friedman and their protégés. Workshop attendees will receive a demo version of CORExpress. Registration Fees Member $200 Student $50 The Chicago Chapter accepts payment by Visa or Mastercard. Register at: http://www.chicagoasa.org. |
Tony Babinec
[hidden email]