analyzing customer data in long format

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

analyzing customer data in long format

Nancy Rusinak-2
Hello,

I have a client who offers a subscription as a service (SaaS) with a paywall.  They have not been in business very long (since about 2012).  They are interested in learning how to predict churn.  I have a file with all of their new business data in it, mostly 1 record per company but not always.  I also have a renewal data set with multiple rows per client.  I know who has churned and who has not.

I've merged the two data sets (they share the same fields).  One of the fields is 365 Day Value, which is a dollar amount.  It's the sales price/days of contract.  Some customers subscribed in 2012 and have continued throughout.  Others joined later.  Some dropped out early, others later.

I'm interested in knowing the relationship between change in 365 Day Value and likelihood to churn.  Are accounts with increasing amounts of 365 Day Value more likely to churn?  Or because they have $ invested, are they less likely to churn?

I've done analysis work for 20 years but have always worked with a wide format, one-row-per-client data set.  While I understand I can transpose data in SPSS, I'm wondering if there is a statistical analysis that would analyze the data as is with multiple rows per client.  I have multiple variables and the data set would get unwieldy if I have to change it to wide format.

I'm a better data analyst than statistician but I am really trying to learn here so please be kind with your responses.

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: analyzing customer data in long format

David Marso
Administrator
"I'm interested in knowing the relationship between change in 365 Day Value and likelihood to churn.  Are accounts with increasing amounts of 365 Day Value more likely to churn?  Or because they have $ invested, are they less likely to churn? "

Define this as a mathematical expression and maybe some aggregate code followed by compute glue will fall out of the group.  The question as posted is way too general.
AGGREGATE is the most basic method of building summaries of records connected by a common caseid.
COMPUTE takes the results of the AGGREGATES and pushes them into meaningful statistical entities.

Nancy Rusinak-2 wrote
Hello,

I have a client who offers a subscription as a service (SaaS) with a paywall.  They have not been in business very long (since about 2012).  They are interested in learning how to predict churn.  I have a file with all of their new business data in it, mostly 1 record per company but not always.  I also have a renewal data set with multiple rows per client.  I know who has churned and who has not.

I've merged the two data sets (they share the same fields).  One of the fields is 365 Day Value, which is a dollar amount.  It's the sales price/days of contract.  Some customers subscribed in 2012 and have continued throughout.  Others joined later.  Some dropped out early, others later.

I'm interested in knowing the relationship between change in 365 Day Value and likelihood to churn.  Are accounts with increasing amounts of 365 Day Value more likely to churn?  Or because they have $ invested, are they less likely to churn?

I've done analysis work for 20 years but have always worked with a wide format, one-row-per-client data set.  While I understand I can transpose data in SPSS, I'm wondering if there is a statistical analysis that would analyze the data as is with multiple rows per client.  I have multiple variables and the data set would get unwieldy if I have to change it to wide format.

I'm a better data analyst than statistician but I am really trying to learn here so please be kind with your responses.

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: analyzing customer data in long format

David Marso
Administrator
You might also want to study the MIXED procedure which assume the data are in long format.

David Marso wrote
"I'm interested in knowing the relationship between change in 365 Day Value and likelihood to churn.  Are accounts with increasing amounts of 365 Day Value more likely to churn?  Or because they have $ invested, are they less likely to churn? "

Define this as a mathematical expression and maybe some aggregate code followed by compute glue will fall out of the group.  The question as posted is way too general.
AGGREGATE is the most basic method of building summaries of records connected by a common caseid.
COMPUTE takes the results of the AGGREGATES and pushes them into meaningful statistical entities.

Nancy Rusinak-2 wrote
Hello,

I have a client who offers a subscription as a service (SaaS) with a paywall.  They have not been in business very long (since about 2012).  They are interested in learning how to predict churn.  I have a file with all of their new business data in it, mostly 1 record per company but not always.  I also have a renewal data set with multiple rows per client.  I know who has churned and who has not.

I've merged the two data sets (they share the same fields).  One of the fields is 365 Day Value, which is a dollar amount.  It's the sales price/days of contract.  Some customers subscribed in 2012 and have continued throughout.  Others joined later.  Some dropped out early, others later.

I'm interested in knowing the relationship between change in 365 Day Value and likelihood to churn.  Are accounts with increasing amounts of 365 Day Value more likely to churn?  Or because they have $ invested, are they less likely to churn?

I've done analysis work for 20 years but have always worked with a wide format, one-row-per-client data set.  While I understand I can transpose data in SPSS, I'm wondering if there is a statistical analysis that would analyze the data as is with multiple rows per client.  I have multiple variables and the data set would get unwieldy if I have to change it to wide format.

I'm a better data analyst than statistician but I am really trying to learn here so please be kind with your responses.

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: analyzing customer data in long format

Rich Ulrich
In reply to this post by Nancy Rusinak-2

I think that looking at means will be more helpful, at least at the start, than doing tests that give obscure results even when

there are apparently positive results.


It sounds like you have annual data; the first year is not labeled "renewal" but it is the same as the rest.  That difference can be

ignored because the files have been merged. Somewhere, you have information about who does not renew.


Data start at different years for different companies; "ends" happen in different years.  The N available, in companies or in dropouts,

is unstated.


My first approach would be to see how much of /what/  is suggested by simple means or tabulations.


Consider the dropouts as a group:  Do they differ from non-dropouts? - on whatever parameters you  have.


Take the dropouts, and label each record with the drop-out date:  Then, compute Time-minus-one, Time-minus-two, etc.

 - Then you can compare these Times.  If you have enough data, you would want to look separately at the companies with

one, two, three, ... years of data.


For the people who do not drop out: what is there ordinary progression over the years? - this might be tabulated from 2012,

or it could use anchors for each company of "Year1", "Year2", etc.


--

Rich Ulrich



From: SPSSX(r) Discussion <[hidden email]> on behalf of Nancy <[hidden email]>
Sent: Monday, January 16, 2017 7:46:47 PM
To: [hidden email]
Subject: analyzing customer data in long format
 
Hello,

I have a client who offers a subscription as a service (SaaS) with a paywall.  They have not been in business very long (since about 2012).  They are interested in learning how to predict churn.  I have a file with all of their new business data in it, mostly 1 record per company but not always.  I also have a renewal data set with multiple rows per client.  I know who has churned and who has not.

I've merged the two data sets (they share the same fields).  One of the fields is 365 Day Value, which is a dollar amount.  It's the sales price/days of contract.  Some customers subscribed in 2012 and have continued throughout.  Others joined later.  Some dropped out early, others later.

I'm interested in knowing the relationship between change in 365 Day Value and likelihood to churn.  Are accounts with increasing amounts of 365 Day Value more likely to churn?  Or because they have $ invested, are they less likely to churn?

I've done analysis work for 20 years but have always worked with a wide format, one-row-per-client data set.  While I understand I can transpose data in SPSS, I'm wondering if there is a statistical analysis that would analyze the data as is with multiple rows per client.  I have multiple variables and the data set would get unwieldy if I have to change it to wide format.

I'm a better data analyst than statistician but I am really trying to learn here so please be kind with your responses.

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD