SPSSX Discussion

Help with model methodology

Classic

List

Threaded

3 messages Options

Cardiff Tyke

Help with model methodology

Hi,

I've been tasked with creating a model to predict responses from cases over an 84 month period based on historical data (this data only dates back around 30 months). The responses are scale (normally 0-1000, but occasionally larger) and I have roughly 100 variables to use to try and predict the responses of new cases added. I have a historical sample size of around 1 million cases.

I'm not sure where to start with the methodology for this project. Am I better served creating a regression model for each months response (1-30) and then performing a time series analysis for the remaining months? Do I try and segment the cases first and then begin the analysis?

Any help would be appreciated before I'm swamped with it all!

Regards,
JC

___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html

Maguin, Eugene

Re: Help with model methodology

Cardiff,

>>I've been tasked with creating a model to predict responses from cases
over an 84 month period based on historical data (this data only dates back
around 30 months). The responses are scale (normally 0-1000, but
occasionally larger) and I have roughly 100 variables to use to try and
predict the responses of new cases added. I have a historical sample size
of around 1 million cases.

Perhaps others will understand exactly what you have in mind but I don't.
More information about the project would be useful in addition to answers to
some specific questions. Maybe the first question concerns the specific
design of the historical and ongoing dataset. Does the historical and the
ongoing dataset have the same design structure? If no, how do they differ?
Were the 1 million cases assessed once (when?) or multiple times (if yes,
how many times?) at regular intervals or at irregular intervals? Do all 1
million have 30 months of followup? What is it that you are trying to
predict? What does the distribution of the response look like? Normal?
J-shaped? Continuous or categorical?

That might do for starters. Please post replies back to the list for all to
see.

Gene Maguin

Ornelas, Fermin

Re: Help with model methodology

In reply to this post by Cardiff Tyke

(1) To begin with if you have one million cases take a random sample to
develop two data sets one for development and a second one for
validation.

2) Do some exploratory analysis to verify data integrity and accuracy
(means standard deviations, max and min values)

3) Do graphical analysis to get an idea of how each of the predictors
correlates with the response variable and compute correlation among
predictors.

4) Select a reasonable number of predictors based on (1) (2) and (3) and
attempt some initial model building. Some modeling procedure would apply
such as stepwise variable selection. Other useful guidance are: PRESS,
Mallows C_p criteria, and MSE.

For the potential useful models you have to verify/test basic modeling
assumptions such as normality, outliers, time effect, constant variance
and independence. This would require you to do normal probability plots
to check for normality, plots of residuals versus predictors and fitted
response, and testing for constant variance, plots of residuals versus
time to check for time sequence effect.

It is not clear what the purpose of the project is. It is from here that
one can get an idea on how to define or modify the response variable
definition to suit the needs of the project. Moreover, depending on the
problems found in the initial data analysis, transformations of the
response variable or the predictors may be required to enhance the model
estimation process.

This list is intended just as a brief outline, you may want to check a
text book for a more formal description on the steps required for model
building and variable selection.

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Cardiff Tyke
Sent: Monday, April 16, 2007 9:29 AM
To: [hidden email]
Subject: Help with model methodology

Hi,

I've been tasked with creating a model to predict responses from cases
over an 84 month period based on historical data (this data only dates
back around 30 months). The responses are scale (normally 0-1000, but
occasionally larger) and I have roughly 100 variables to use to try and
predict the responses of new cases added. I have a historical sample
size of around 1 million cases.

I'm not sure where to start with the methodology for this project. Am I
better served creating a regression model for each months response
(1-30) and then performing a time series analysis for the remaining
months? Do I try and segment the cases first and then begin the
analysis?

Any help would be appreciated before I'm swamped with it all!

Regards,
JC

___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign
up for
your free account today
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07
.html

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific individual(s) to whom it is addressed. It may contain
information that is privileged and confidential under state and federal
law. This information may be used or disclosed only in accordance with
law, and you may be subject to penalties under law for improper use or
further disclosure of the information in this e-mail and its
attachments. If you have received this e-mail in error, please
immediately notify the person named above by reply e-mail, and then
delete the original e-mail. Thank you.