|
Hi all, i need to analyze the last two years trips for 10.000 frequent
flyers and predict the Next Trip for a window time, 1 month, 2 months, 6 months, 1 year; with associated probabilities. I use SPSS 17. All ideas are welcome. TIA, Lee ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
You need to be more specific. Are you trying to predict when, where,
how many, or any trips? What kind of data do you have? What X's to predict what Y's? Rodrigo A. Guerrero | Director Of Marketing Research and Analysis | The Scooter Store | 830.627.4317 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Libardo Lopez Sent: Wednesday, November 04, 2009 8:36 AM To: [hidden email] Subject: How to Predict the Next Trip Hi all, i need to analyze the last two years trips for 10.000 frequent flyers and predict the Next Trip for a window time, 1 month, 2 months, 6 months, 1 year; with associated probabilities. I use SPSS 17. All ideas are welcome. TIA, Lee ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD The information transmitted is intended only for the addressee(s) and may contain confidential or privileged material, or both. Any review, receipt, dissemination or other use of this information by non-addressees is prohibited. If you received this in error or are a non-addressee, please contact the sender and delete the transmitted information. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Libardo López Guzmán
Libardo,
I'm sure there are listmembers with direct, practical experience with problems similar to yours. That said, I'm curious as to exactly what defines a 'record' and what data you have for each record. Is a record something like this PersonID TripDate TripTime 1028564 04/05/2007 10:15 1028564 04/05/2007 13:31 1028564 04/06/2007 20:15 1939923 03/14/2007 05:35 Where you have a person id and date-time information on each 'leg' of a trip? Or something different? What defines a 'next trip'? How can you identify 'trips' in your data? Somehow, I imagine identifyig trips might be quite difficult, especially for people that travel 150 to 200+ days a year. Assuming that you can clearly identify 'trips', my first idea would be that Cox regression (survival) might be a place to wind up the analysis because you're interested in time to an event--the next trip. Gene Maguin >>Hi all, i need to analyze the last two years trips for 10.000 frequent flyers and predict the Next Trip for a window time, 1 month, 2 months, 6 months, 1 year; with associated probabilities. I use SPSS 17. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks to all,
My data contain: PersonalID, TripDate, TripTime, Trip Origen: NY, Barcelona, Oslo, Caracas.... Trip Destine: MI, NY, Pekin, Buenos Aires..... With the trends found in the historical data, i need to predict then next trip: Where: Origen-Destine, When. And for marketing purposes i will focus int he highest probabilities. Some Frequent Flyers make a lot of trips with the same origen destine, but other routs may be not; some months the trips are cyclical: if the guy change the routine may be is because of him changes work or residence, etc. I think i need different steps like MBA for Associations, Correlations, Time Series Analisis. TIA, Lee On Wed, Nov 4, 2009 at 10:05 AM, Gene Maguin <[hidden email]> wrote: Libardo, |
|
Libardo,
Ok, so a data record is PersonalID, TripDate, TripTime, Trip Origen, Trip Destine. If I were given this dataset and the questions you stated, I wouldn't know what to do. I can only guess that this is an area where serious statistical work has been done; however, I have no knowledge of that work. Ignoring origin and destination, I'd suggest that you begin by aggregating the data by month and plotting to get an idea of mean, variability, and trend. I'd guess that there would be seasonality effects in the monthly data but I doubt that two years of data are adequate to model seasonality. If ARIMA were a candidate technique, I think this would give you forecasts of future trip volume, along with confidence intervals. But there might well be better techniques. When you bring in orign and destination data, the complexity explodes because you can be interested in forecasts of trip volume by origin, by destination and by origin-destination pair, all of which would be useful to know and all of which require a data series to estimate. However, it is probably also true the certain origins, destinations and origin-destination pairs dominate the data. The other issue I wonder about is whether you are interested trip volume across your pool of frequent fliers, which is what the above discussion is about, or you are interested in predicted trips by flier id. That is, predicting trip volume for each flier. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Libardo López Guzmán
Libardo López Guzmán wrote:
> My data contain: > > PersonalID, TripDate, TripTime, > Trip Origen: NY, Barcelona, Oslo, Caracas.... > Trip Destine: MI, NY, Pekin, Buenos Aires..... > > With the trends found in the historical data, i need to predict then > next trip: Where: Origen-Destine, When. And for marketing purposes i > will focus int he highest probabilities. > > Some Frequent Flyers make a lot of trips with the same origen destine, > but other routs may be not; some months the trips are cyclical: if the > guy change the routine may be is because of him changes work or > residence, etc. > > I think i need different steps like MBA for Associations, Correlations, > Time Series Analisis. This looks like a classic data mining problem. And like most data mining problems, you need to do a lot of restructuring. First split your data into a training sample and evaluation sample. Build your statistical model on the training sample (as described below) and then test it on the evaluation sample. Then, if you have two years worth of data, you need to take the first year's worth of data in the training sample for a passenger, and create a bunch of independent variables. Then take the second year's worth of data in the training sample and create a bunch of outcome variables. Then use the independent variables to predict the outcomes. There are a million different ways you can do this, but here's a possible start. In the first year's worth of data, create a variable that counts the number of trips in that one year span. In the second year's worth of data, create an indicator variable that is 1 if they take a trip in the first month, and zero otherwise. Create another indicator variable that is 1 if they take a trip in the first two months, and zero otherwise. Etc. Then use a logistic regression model with number of trips as the independent variable and the indicator for trips within one month, two months, etc. as the dependent variables. What you will probably find is that the probability of taking a trip quickly will increase as the number of trips taken increases. Now this is much too simplistic for a variety of reasons, but you can add more independent variables (total number of miles traveled in a year, indicator variables for particular originations and destinations) and so forth. You should also expand your outcome variables (1 if they take a trip to New York in the first month and 0 otherwise, 1 if they take a trip to Seattle in the first month and 0 otherwise, etc.) The problem is that you will get dozens or hundreds of variables when you are done and that means that classic methods like stepwise regression, which don't work well in the best of circumstances, will probably fare miserably in this more difficult setting. Once you've failed, then hire an expert in data mining and ask them to use specialized software (like Clementine). I just visited the SPSS website and Clementine is now PASW Modeler. What an awful name! If you do not have the funds to hire an outside expert, then tell your boss that this project is very complicated and it is highly likely that any model you provide will perform poorly in the real world. Then do your best, of course, but this is not a trivial project. Even the best of us is likely to encounter serious problems along the way. Of course, if you hire an outside expert, there's a good chance that they will fail also. But if somehow you or your consultant comes up with a model that succeeds, then be sure to brag about it at one of the upcoming SPSS conferences. The folks at SPSS/IBM will adore you and put your face and story on all of their promotional literature. -- Steve Simon, Standard Disclaimer The Monthly Mean is celebrating its first anniversary. Sign up at www.pmean.com/news ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
