SPSSX Discussion

Fw: Help with model methodology

Classic

List

Threaded

3 messages Options

Cardiff Tyke

Fw: Help with model methodology

OK, thanks for your help, to clarify:

Each case is measured over an 84 month period. The oldest cases we have are 30 months old (the newest one month old) and new cases are being added all the time. All have the same structure and data available.

The project is medical and is to predict the effect of certain chemical compounds on cell structures. The data is continuous (on a percentage scale) and anything but normal, two thirds of the cases will have a score of 0% (i.e. no effect) and probably remain unchanged for the 84 month period, the remainder will have a score of varying degrees above this, but this distribution will not be uniform.

The crux of the problem is that I am unsure how to forecast potential monthly results based on a historical sample of various ages. Do I try and forecast out the seasoned data first, then the newer data and then apply this in a regression model to the as-yet untested data? As you can tell, I've never done this before and know of nowhere to go for help (except here). I'm not sure if this is something that would have been done before in a financial field (i.e. potential monthly returns from a loan etc.) or mainly in a scientific field (microbiology? I've never heard of any long-term experiments)

Thanks,
Jack Cardiff

----- Original Message ----
From: Gene Maguin <[hidden email]>
To: [hidden email]
Sent: Monday, 16 April, 2007 5:47:50 PM
Subject: Re: Help with model methodology

Cardiff,

>>I've been tasked with creating a model to predict responses from cases
over an 84 month period based on historical data (this data only dates back
around 30 months). The responses are scale (normally 0-1000, but
occasionally larger) and I have roughly 100 variables to use to try and
predict the responses of new cases added. I have a historical sample size
of around 1 million cases.

Perhaps others will understand exactly what you have in mind but I don't.
More information about the project would be useful in addition to answers to
some specific questions. Maybe the first question concerns the specific
design of the historical and ongoing dataset. Does the historical and the
ongoing dataset have the same design structure? If no, how do they differ?
Were the 1 million cases assessed once (when?) or multiple times (if yes,
how many times?) at regular intervals or at irregular intervals? Do all 1
million have 30 months of followup? What is it that you are trying to
predict? What does the distribution of the response look like? Normal?
J-shaped? Continuous or categorical?

That might do for starters. Please post replies back to the list for all to
see.

Gene Maguin

___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html

___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html

Richard Ristow

Re: Fw: Help with model methodology

Your question raise some questions for me.
At 03:28 PM 4/16/2007, Cardiff Tyke wrote:

>Each case is measured over an 84 month period. The oldest cases we
>have are 30 months old (the newest one month old) and new cases are
>being added all the time. All have the same structure and data
>available.

How is the 'age' of a case defined? The termination of its 84-month
period? (If it's the beginning, you can't have 84 months of data for a
30-month old case.)

Or is it, as you've seemed to say, "each case >will be< measured over
an 84 month period"; you now have up to 30 months on each of the
million(!) subjects, but will eventually have more; and you want to
predict the last 54 months based on what's observed in the first 30?

>The project is medical and is to predict the effect of certain
>chemical compounds on cell structures. The crux of the problem is that
>I am unsure how to forecast potential monthly results based on a
>historical sample of various ages.

You may want to post a little of your thinking, that leads you to try
this. You're observing for 84 months. You're trying to predict based on
the first 30 months. That sound like you may be thinking that the last
54 months won't tell you much new. (They won't, if they can be
predicted from the first 30). If you think that, or whatever else you
think by trying the prediction, you should step back and see what it
implies in the context of the study design.

Finally, observing a million subjects (even if they're Petri dishes or
something) for 84 months, sounds *expensive*. We want to help, here on
the list, but it sound like you could justify spending anything
remotely reasonable on hands-on high-powered statistical consulting.

Maguin, Eugene

Re: Help with model methodology

In reply to this post by Cardiff Tyke

Jack,

I'd like to ask some follow-up questions that are not yet clear to me. At
the same time, I hope that somebody else on the list has the knowledge to
assist you because I don't know that I do.

>>Each case is measured over an 84 month period.

1) How many times over the 84 month period is a case measured?
2) Are cases measured at regular intervals, such as every month?
I'm trying to understand how many datapoints you have per case and I haven't
got a clear answer yet.

>>The project is medical and is to predict the effect of certain chemical
compounds on cell structures. The data is continuous (on a percentage
scale) and anything but normal, two thirds of the cases will have a score of
0% (i.e. no effect) and probably remain unchanged for the 84 month period,
the remainder will have a score of varying degrees above this, but this
distribution will not be uniform.

In your original posting you said "The responses are scale (normally 0-1000,
but occasionally larger) ..."

Would you reconcile your two statements regarding the response scale. You
mention a 0-1000 scale in the original posting and then refer to two thirds
of cases having a 0% score.

>>The crux of the problem is that I am unsure how to forecast potential
monthly results based on a historical sample of various ages. Do I try and
forecast out the seasoned data first, then the newer data and then apply
this in a regression model to the as-yet untested data? As you can tell,
I've never done this before and know of nowhere to go for help (except
here). I'm not sure if this is something that would have been done before
in a financial field (i.e. potential monthly returns from a loan etc.) or
mainly in a scientific field (microbiology? I've never heard of any
long-term experiments)

I am completely unfamiliar with anything like this either and since your
project seems to involve the study of cells, I wonder if you might not get a
better response from a listserv serving the substantive area rather than a
listserv serving the analysis of the data from the substantive area. Perhaps
others have better knowledge.

Gene Maguin