How to Predict the Next Trip

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to Predict the Next Trip

Libardo López Guzmán
Hi all, i need to analyze the last two years trips for 10.000 frequent
flyers and predict the Next Trip for a window time, 1 month, 2 months, 6
months, 1 year; with associated probabilities. I use SPSS 17.

All ideas are welcome.

TIA,

Lee

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to Predict the Next Trip

Guerrero, Rodrigo
You need to be more specific.  Are you trying to predict when, where,
how many, or any trips?  What kind of data do you have?  What X's to
predict what Y's?


Rodrigo A. Guerrero | Director Of Marketing Research and Analysis | The
Scooter Store | 830.627.4317




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Libardo Lopez
Sent: Wednesday, November 04, 2009 8:36 AM
To: [hidden email]
Subject: How to Predict the Next Trip

Hi all, i need to analyze the last two years trips for 10.000 frequent
flyers and predict the Next Trip for a window time, 1 month, 2 months, 6
months, 1 year; with associated probabilities. I use SPSS 17.

All ideas are welcome.

TIA,

Lee

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


The information transmitted is intended only for the addressee(s) and may contain confidential or privileged material, or both.  Any review, receipt, dissemination or other use of this information by non-addressees is prohibited.   If you received this in error or are a non-addressee, please contact the sender and delete the transmitted information.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to Predict the Next Trip

Maguin, Eugene
In reply to this post by Libardo López Guzmán
Libardo,

I'm sure there are listmembers with direct, practical experience with
problems similar to yours. That said, I'm curious as to exactly what defines
a 'record' and what data you have for each record. Is a record something
like this

PersonID  TripDate    TripTime
1028564   04/05/2007  10:15
1028564   04/05/2007  13:31
1028564   04/06/2007  20:15
1939923   03/14/2007  05:35

Where you have a person id and date-time information on each 'leg' of a
trip? Or something different? What defines a 'next trip'? How can you
identify 'trips' in your data? Somehow, I imagine identifyig trips might be
quite difficult, especially for people that travel 150 to 200+ days a year.

Assuming that you can clearly identify 'trips', my first idea would be that
Cox regression (survival) might be a place to wind up the analysis because
you're interested in time to an event--the next trip.

Gene Maguin

>>Hi all, i need to analyze the last two years trips for 10.000 frequent
flyers and predict the Next Trip for a window time, 1 month, 2 months, 6
months, 1 year; with associated probabilities. I use SPSS 17.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to Predict the Next Trip

Libardo López Guzmán
Thanks to all,

My data contain:

PersonalID, TripDate, TripTime,
Trip Origen: NY, Barcelona, Oslo, Caracas....
Trip Destine: MI, NY, Pekin, Buenos Aires.....

With the trends found in the historical data, i need to predict then next trip: Where: Origen-Destine, When. And for marketing purposes i will focus int he highest probabilities.

Some Frequent Flyers make a lot of trips with the same origen destine, but other routs may be not; some months the trips are cyclical: if the guy change the routine may be is because of him changes work or residence, etc.

I think i need different steps like MBA for Associations, Correlations, Time Series Analisis.

TIA,

Lee


On Wed, Nov 4, 2009 at 10:05 AM, Gene Maguin <[hidden email]> wrote:
Libardo,

I'm sure there are listmembers with direct, practical experience with
problems similar to yours. That said, I'm curious as to exactly what defines
a 'record' and what data you have for each record. Is a record something
like this

PersonID  TripDate    TripTime
1028564   04/05/2007  10:15
1028564   04/05/2007  13:31
1028564   04/06/2007  20:15
1939923   03/14/2007  05:35

Where you have a person id and date-time information on each 'leg' of a
trip? Or something different? What defines a 'next trip'? How can you
identify 'trips' in your data? Somehow, I imagine identifyig trips might be
quite difficult, especially for people that travel 150 to 200+ days a year.

Assuming that you can clearly identify 'trips', my first idea would be that
Cox regression (survival) might be a place to wind up the analysis because
you're interested in time to an event--the next trip.

Gene Maguin

>>Hi all, i need to analyze the last two years trips for 10.000 frequent
flyers and predict the Next Trip for a window time, 1 month, 2 months, 6
months, 1 year; with associated probabilities. I use SPSS 17.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: How to Predict the Next Trip

Maguin, Eugene
Libardo,

Ok, so a data record is
PersonalID, TripDate, TripTime, Trip Origen, Trip Destine.

If I were given this dataset and the questions you stated, I wouldn't know
what to do. I can only guess that this is an area where serious statistical
work has been done; however, I have no knowledge of that work. Ignoring
origin and destination, I'd suggest that you begin by aggregating the data
by month and plotting to get an idea of mean, variability, and trend. I'd
guess that there would be seasonality effects in the monthly data but I
doubt that two years of data are adequate to model seasonality. If ARIMA
were a candidate technique, I think this would give you forecasts of future
trip volume, along with confidence intervals. But there might well be better
techniques.

When you bring in orign and destination data, the complexity explodes
because you can be interested in forecasts of trip volume by origin, by
destination and by origin-destination pair, all of which would be useful to
know and all of which require a data series to estimate. However, it is
probably also true the certain origins, destinations and origin-destination
pairs dominate the data.

The other issue I wonder about is whether you are interested trip volume
across your pool of frequent fliers, which is what the above discussion is
about, or you are interested in predicted trips by flier id. That is,
predicting trip volume for each flier.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to Predict the Next Trip

Steve Simon, P.Mean Consulting
In reply to this post by Libardo López Guzmán
Libardo López Guzmán wrote:

> My data contain:
>
> PersonalID, TripDate, TripTime,
> Trip Origen: NY, Barcelona, Oslo, Caracas....
> Trip Destine: MI, NY, Pekin, Buenos Aires.....
>
> With the trends found in the historical data, i need to predict then
> next trip: Where: Origen-Destine, When. And for marketing purposes i
> will focus int he highest probabilities.
>
> Some Frequent Flyers make a lot of trips with the same origen destine,
> but other routs may be not; some months the trips are cyclical: if the
> guy change the routine may be is because of him changes work or
> residence, etc.
>
> I think i need different steps like MBA for Associations, Correlations,
> Time Series Analisis.

This looks like a classic data mining problem. And like most data mining
problems, you need to do a lot of restructuring. First split your data
into a training sample and evaluation sample. Build your statistical
model on the training sample (as described below) and then test it on
the evaluation sample. Then, if you have two years worth of data, you
need to take the first year's worth of data in the training sample for a
passenger, and create a bunch of independent variables. Then take the
second year's worth of data in the training sample and create a bunch of
outcome variables. Then use the independent variables to predict the
outcomes.

There are a million different ways you can do this, but here's a
possible start.

In the first year's worth of data, create a variable that counts the
number of trips in that one year span. In the second year's worth of
data, create an indicator variable that is 1 if they take a trip in the
first month, and zero otherwise. Create another indicator variable that
is 1 if they take a trip in the first two months, and zero otherwise. Etc.

Then use a logistic regression model with number of trips as the
independent variable and the indicator for trips within one month, two
months, etc. as the dependent variables. What you will probably find is
that the probability of taking a trip quickly will increase as the
number of trips taken increases.

Now this is much too simplistic for a variety of reasons, but you can
add more independent variables (total number of miles traveled in a
year, indicator variables for particular originations and destinations)
and so forth. You should also expand your outcome variables (1 if they
take a trip to New York in the first month and 0 otherwise, 1 if they
take a trip to Seattle in the first month and 0 otherwise, etc.) The
problem is that you will get dozens or hundreds of variables when you
are done and that means that classic methods like stepwise regression,
which don't work well in the best of circumstances, will probably fare
miserably in this more difficult setting.

Once you've failed, then hire an expert in data mining and ask them to
use specialized software (like Clementine). I just visited the SPSS
website and Clementine is now PASW Modeler. What an awful name!

If you do not have the funds to hire an outside expert, then tell your
boss that this project is very complicated and it is highly likely that
any model you provide will perform poorly in the real world. Then do
your best, of course, but this is not a trivial project. Even the best
of us is likely to encounter serious problems along the way.

Of course, if you hire an outside expert, there's a good chance that
they will fail also.

But if somehow you or your consultant comes up with a model that
succeeds, then be sure to brag about it at one of the upcoming SPSS
conferences. The folks at SPSS/IBM will adore you and put your face and
story on all of their promotional literature.
--
Steve Simon, Standard Disclaimer
The Monthly Mean is celebrating its first anniversary.
Sign up at www.pmean.com/news

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD