FW: Workshop: Analysis of High-Dimensional Data with many predictors

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

FW: Workshop: Analysis of High-Dimensional Data with many predictors

Anthony Babinec

Professor N. Balakrishnan, editor-in-chief of the Encyclopedia of Statistical Sciences, recently identified Analysis of High-Dimensional Data as one of the most important emerging areas of statistical inquiry. In the second edition of “The Elements of Statistical Learning,” 2009, published by John Wiley, Hastie, Tibshirani, and Friedman added a chapter on High-Dimensional Problems, detailing their work and that of their protégés and students. High dimensional data are found in genomics, bioinformatics, chemometrics, and other areas. These data are characterized by relatively few samples and anywhere from dozens to hundreds of variables. In this situation, usual statistical approaches must be modified or new approaches must be developed. In addition, even when the number of cases exceeds the number of variables, the analyst is often interested in feature selection – finding the predictors that matter. While practitioners use forward selection, backward elimination, stepwise selection, and the like, these methods do not have good properties. With this in mind, we offer the following very timely workshop. Note: Workshop includes a discussion of the PLS method, principal components analysis, and other methods…

 

 

Regression Modeling with Many Correlated Predictors

Jay Magidson, Statistical Innovations

Tony Babinec, AB Analytics

 

Friday, April 8, 2011

8:30 am – 4:30 pm

 

Rush University Medical Center

1653 W Congress Parkway, Chicago, IL 60612

 

Sponsored by the Chicago Chapter of the American Statistical Association

http://www.chicagoasa.org

 

Abstract:
Recent advances in analysis of high dimensional data now allow reliable regression models to be developed even when the number of predictors exceeds the number of cases! In this course we begin by reviewing problems and limitations with traditional linear and logistic regression. Our applications-oriented presentation provides insight into how the new approaches work through examples and by providing an overview of the relevant theory, supplemented by the supporting equations. We use real and simulated data sets to illustrate the different approaches.

COURSE OUTLINE

Linear Regression Model

Bias-Variance Tradeoff

Logistic Regression and Discriminant Analysis

(BREAK)

Classification Tables and ROC Curves

Suppressor Variables

(LUNCH)

Penalized Regression Approaches

  • Ridge Regression
  • Lasso
  • Elastic Net

Component Approaches

  • Principal Components Regression
  • Partial Least Squares Regression
  • Correlated Components Regression – NEW Approach

(BREAK)

Ultra-high Dimensional Data

  • Variable Reduction
  • Extensions

 

Who Should Attend
Marketing, biomedical and other researchers who want to improve their understanding of regression model development in the presence of many correlated predictors.

 

Prerequisites
Familiarity with linear and logistic regression analysis at an applied level.

What you will learn

·         How to develop reliable models, even in the presence of extreme multicolinearity and when # predictors  >> number of sample observations

·         Why many popular variable selection techniques are suboptimal

·         About a new powerful step-down variable reduction technique in CORExpressTM

·         About free and commercially available software for handing high dimensional data

 

Software Note

We will be discussing software in general, and how stepwise regression does not

work at all well either in general or in the presence of multicolinearity or when

the number of variables exceeds the number of cases. More specifically, Statistical

Innovations is developing software currently named CORExpress. This software implements

a version of Naïve Bayes regression as well as a new approach called correlated components

regression along with various other approaches. We will present the results of simulation

studies that include approaches implemented as R contributed packages, including especially

GLMNET which implements the penalized approaches such as LASSO and Elastic Net.

These latter are associated with Hastie, Tibshirani, and Friedman and their protégés.

Workshop attendees will receive a demo version of CORExpress.

 

Registration Fees

Member $200

Student $50

 

The Chicago Chapter accepts payment by Visa or Mastercard. Register at: http://www.chicagoasa.org.

 

 

 

Tony Babinec

[hidden email]