OK, thanks for your help, to clarify:
Each case is measured over an 84 month period. The oldest cases we have are 30 months old (the newest one month old) and new cases are being added all the time. All have the same structure and data available. The project is medical and is to predict the effect of certain chemical compounds on cell structures. The data is continuous (on a percentage scale) and anything but normal, two thirds of the cases will have a score of 0% (i.e. no effect) and probably remain unchanged for the 84 month period, the remainder will have a score of varying degrees above this, but this distribution will not be uniform. The crux of the problem is that I am unsure how to forecast potential monthly results based on a historical sample of various ages. Do I try and forecast out the seasoned data first, then the newer data and then apply this in a regression model to the as-yet untested data? As you can tell, I've never done this before and know of nowhere to go for help (except here). I'm not sure if this is something that would have been done before in a financial field (i.e. potential monthly returns from a loan etc.) or mainly in a scientific field (microbiology? I've never heard of any long-term experiments) Thanks, Jack Cardiff ----- Original Message ---- From: Gene Maguin <[hidden email]> To: [hidden email] Sent: Monday, 16 April, 2007 5:47:50 PM Subject: Re: Help with model methodology Cardiff, >>I've been tasked with creating a model to predict responses from cases over an 84 month period based on historical data (this data only dates back around 30 months). The responses are scale (normally 0-1000, but occasionally larger) and I have roughly 100 variables to use to try and predict the responses of new cases added. I have a historical sample size of around 1 million cases. Perhaps others will understand exactly what you have in mind but I don't. More information about the project would be useful in addition to answers to some specific questions. Maybe the first question concerns the specific design of the historical and ongoing dataset. Does the historical and the ongoing dataset have the same design structure? If no, how do they differ? Were the 1 million cases assessed once (when?) or multiple times (if yes, how many times?) at regular intervals or at irregular intervals? Do all 1 million have 30 months of followup? What is it that you are trying to predict? What does the distribution of the response look like? Normal? J-shaped? Continuous or categorical? That might do for starters. Please post replies back to the list for all to see. Gene Maguin ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html |
Your question raise some questions for me.
At 03:28 PM 4/16/2007, Cardiff Tyke wrote: >Each case is measured over an 84 month period. The oldest cases we >have are 30 months old (the newest one month old) and new cases are >being added all the time. All have the same structure and data >available. How is the 'age' of a case defined? The termination of its 84-month period? (If it's the beginning, you can't have 84 months of data for a 30-month old case.) Or is it, as you've seemed to say, "each case >will be< measured over an 84 month period"; you now have up to 30 months on each of the million(!) subjects, but will eventually have more; and you want to predict the last 54 months based on what's observed in the first 30? >The project is medical and is to predict the effect of certain >chemical compounds on cell structures. The crux of the problem is that >I am unsure how to forecast potential monthly results based on a >historical sample of various ages. You may want to post a little of your thinking, that leads you to try this. You're observing for 84 months. You're trying to predict based on the first 30 months. That sound like you may be thinking that the last 54 months won't tell you much new. (They won't, if they can be predicted from the first 30). If you think that, or whatever else you think by trying the prediction, you should step back and see what it implies in the context of the study design. Finally, observing a million subjects (even if they're Petri dishes or something) for 84 months, sounds *expensive*. We want to help, here on the list, but it sound like you could justify spending anything remotely reasonable on hands-on high-powered statistical consulting. |
In reply to this post by Cardiff Tyke
Jack,
I'd like to ask some follow-up questions that are not yet clear to me. At the same time, I hope that somebody else on the list has the knowledge to assist you because I don't know that I do. >>Each case is measured over an 84 month period. 1) How many times over the 84 month period is a case measured? 2) Are cases measured at regular intervals, such as every month? I'm trying to understand how many datapoints you have per case and I haven't got a clear answer yet. >>The project is medical and is to predict the effect of certain chemical compounds on cell structures. The data is continuous (on a percentage scale) and anything but normal, two thirds of the cases will have a score of 0% (i.e. no effect) and probably remain unchanged for the 84 month period, the remainder will have a score of varying degrees above this, but this distribution will not be uniform. In your original posting you said "The responses are scale (normally 0-1000, but occasionally larger) ..." Would you reconcile your two statements regarding the response scale. You mention a 0-1000 scale in the original posting and then refer to two thirds of cases having a 0% score. >>The crux of the problem is that I am unsure how to forecast potential monthly results based on a historical sample of various ages. Do I try and forecast out the seasoned data first, then the newer data and then apply this in a regression model to the as-yet untested data? As you can tell, I've never done this before and know of nowhere to go for help (except here). I'm not sure if this is something that would have been done before in a financial field (i.e. potential monthly returns from a loan etc.) or mainly in a scientific field (microbiology? I've never heard of any long-term experiments) I am completely unfamiliar with anything like this either and since your project seems to involve the study of cells, I wonder if you might not get a better response from a listserv serving the substantive area rather than a listserv serving the analysis of the data from the substantive area. Perhaps others have better knowledge. Gene Maguin |
Free forum by Nabble | Edit this page |