All correlations are significant

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

All correlations are significant

Heidi Green

Hello-

I would love to hear some of your expert suggestions on the following project I am working on. First off, please bear with me because it has been about 10 years or more since I have had to do any actual statistical analysis in my work (lately it has been just data manipulation and straight frequencies/graphs for output).

 

I have been asked to explore some correlations between responses to questions on a Customer Satisfaction Survey. The survey begins with an “Overall Satisfaction” question, and then has more specific questions relating to the store that the respondent chose, their retail experience at that store, their salesperson, their purchase transaction (paperwork, financing, etc), product delivery process, and follow-up after the purchase. (This is a large-ticket item we are talking about).

 

The main goal of my analysis is to be able to say something like: “If a store’s overall satisfaction score is low, they need to work on improving _____, because that is highly correlated with overall satisfaction”. Where the ______ is one of the more specific items, such as Store Cleanliness, Knowledge of Product, Integrity of Salesperson, Ability of staff to answer questions”, etc.

 

The Overall Satisfaction question is structured like this:

Overall, how satisfied are you with your purchase experience at (your store)?

100 = Completely Satisfied

  80 = Very Satisfied

  50 = Somewhat Satisfied

  20 = Slightly Satisfied

    0 = Not At All Satisfied

 

The more specific questions look like this: (examples.)

 

Please rate your salesperson on each of the following:

 

Being considerate of my time…

100 = Excellent

  80 = Good

  50 = Average

  20 = Fair

    0 = Poor

 

Ability to answer my questions…

100 = Excellent

  80 = Good

  50 = Average

  20 = Fair

    0 = Poor

 

I have plenty of responses to work with (over 100,000 respondents across thousands of stores). However, the vast majority of people ARE satisfied (90% of respondents are either completely or very satisfied). The distribution is “J” shaped. Because of that, I have been running Bivariate Correlations, looking at the overall satisfaction question against each of the more specific questions, and looking at both the Kendall’s tau-b statistic and Spearman’s Rho. (Again, forgive my statistical ignorance). Right now, I’m looking at all stores lumped together, because with thousands of stores, it would be too time consuming to look at the correlations for each store.

 

The problem: Everything is significant! The correlation coefficients range from approx. .44 to .67. I can obviously pick the highest one, but I’m not sure I feel comfortable picking out one question if the difference between the correlation coefficients for that question and the next is .645 vs. .634.  I guess I’m just not seeing an “Ah ha” type of question that stands out above the rest.

 

Since I’m more concerned about LOW overall satisfaction scores, is there an accepted procedure or statistically defensible way for me to divide my data or just use a subset, and see if that produces anything more interesting?

 

By the way, I don’t want to use regression or any other more complicated procedure if possible, because I need to be able to come up with a straightforward action plan for my client. (I can’t say something like “you can increase your score if you work on Q1 * X + Q2 * Y + ….etc).

 

 Thank you in advance for sharing your thoughts.

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: All correlations are significant

Bob Schacht-3
At 08:45 AM 4/16/2009, Heidi Green wrote:
Hello-
I would love to hear some of your expert suggestions on the following project I am working on. First off, please bear with me because it has been about 10 years or more since I have had to do any actual statistical analysis in my work (lately it has been just data manipulation and straight frequencies/graphs for output).
 
I have been asked to explore some correlations between responses to questions on a Customer Satisfaction Survey. The survey begins with an “Overall Satisfaction” question, and then has more specific questions relating to the store that the respondent chose, their retail experience at that store, their salesperson, their purchase transaction (paperwork, financing, etc), product delivery process, and follow-up after the purchase. (This is a large-ticket item we are talking about).
 

[snip]

 
I have plenty of responses to work with (over 100,000 respondents across thousands of stores). However, the vast majority of people ARE satisfied (90% of respondents are either completely or very satisfied). The distribution is “J” shaped. Because of that, I have been running Bivariate Correlations, looking at the overall satisfaction question against each of the more specific questions, and looking at both the Kendall’s tau-b statistic and Spearman’s Rho. (Again, forgive my statistical ignorance). Right now, I’m looking at all stores lumped together, because with thousands of stores, it would be too time consuming to look at the correlations for each store.
 
The problem: Everything is significant! The correlation coefficients range from approx. .44 to .67. I can obviously pick the highest one, but I’m not sure I feel comfortable picking out one question if the difference between the correlation coefficients for that question and the next is .645 vs. .634.  I guess I’m just not seeing an “Ah ha” type of question that stands out above the rest.
 
Since I’m more concerned about LOW overall satisfaction scores, is there an accepted procedure or statistically defensible way for me to divide my data or just use a subset, and see if that produces anything more interesting? . . .
 


Heidi,
Your problem is a lot like mine. I am also dealing with a satisfaction survey. Your situation is better than mine in that you've got decent sample sizes, and I don't.  So please allow me to brainstorm with you a little.

A lot of satisfaction surveys show a similar pattern: customers tend to be globally satisfied, or not. If they are globally satisfied, they are probably not going to provide much discrimination in their responses. However, those who are not satisfied are usually not globally dissatisfied, but dissatisfied for a particular reason. In other words, it is those who are least satisfied who often provide the most valuable feedback. This suggests to me that your cases be divided into those who are globally satisfied (and therefore whose questionnaires provide the least useful information, other than their general satisfaction), and those who were least satisfied. Then do your analysis again based on this reduced subset (the least-happy campers). You can do this because you've got good sample sizes, and I don't.

Our current survey is only one page, but it seems to cover varied issues: Overall, counselor helpfulness, customer participation in setting goals, identifying services, and choosing service providers, etc. In fact, as you found out, these all seem to be highly intercorrelated.

One suggestion is to look for which of your questions shows the largest variety of responses -- in other words, the least J-shaped distributions. A crude measure of this is just to calculate the standard deviation of the responses. Of course, you have to check the wording of the  question, because a badly worded question can produce a variety of confused responses. But you could try weighting each question in proportion to its standard deviation, and see if that helps.

Also, I'd look at the overall satisfaction scores for each store. If you've got a problem store, you want to know about that, right?

Ah, the joys of a decent sample size!

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

Reply | Threaded
Open this post in threaded view
|

Re: All correlations are significant

Heidi Green

Thank you so much for your response, Bob.

I will explore the data further in the ways you mentioned.

If I strike gold in a way that could be useful for smaller samples, I will let you know!

I really appreciate your help,

-Heidi

 


From: Bob Schacht [mailto:[hidden email]]
Sent: Thursday, April 16, 2009 12:35 PM
To: Heidi Green; [hidden email]
Subject: Re: All correlations are significant

 

At 08:45 AM 4/16/2009, Heidi Green wrote:

Hello-
I would love to hear some of your expert suggestions on the following project I am working on. First off, please bear with me because it has been about 10 years or more since I have had to do any actual statistical analysis in my work (lately it has been just data manipulation and straight frequencies/graphs for output).
 
I have been asked to explore some correlations between responses to questions on a Customer Satisfaction Survey. The survey begins with an “Overall Satisfaction” question, and then has more specific questions relating to the store that the respondent chose, their retail experience at that store, their salesperson, their purchase transaction (paperwork, financing, etc), product delivery process, and follow-up after the purchase. (This is a large-ticket item we are talking about).
 


[snip]


 
I have plenty of responses to work with (over 100,000 respondents across thousands of stores). However, the vast majority of people ARE satisfied (90% of respondents are either completely or very satisfied). The distribution is “J” shaped. Because of that, I have been running Bivariate Correlations, looking at the overall satisfaction question against each of the more specific questions, and looking at both the Kendall’s tau-b statistic and Spearman’s Rho. (Again, forgive my statistical ignorance). Right now, I’m looking at all stores lumped together, because with thousands of stores, it would be too time consuming to look at the correlations for each store.
 
The problem: Everything is significant! The correlation coefficients range from approx. .44 to .67. I can obviously pick the highest one, but I’m not sure I feel comfortable picking out one question if the difference between the correlation coefficients for that question and the next is .645 vs. .634.  I guess I’m just not seeing an “Ah ha” type of question that stands out above the rest.
 
Since I’m more concerned about LOW overall satisfaction scores, is there an accepted procedure or statistically defensible way for me to divide my data or just use a subset, and see if that produces anything more interesting? . . .
 



Heidi,
Your problem is a lot like mine. I am also dealing with a satisfaction survey. Your situation is better than mine in that you've got decent sample sizes, and I don't.  So please allow me to brainstorm with you a little.

A lot of satisfaction surveys show a similar pattern: customers tend to be globally satisfied, or not. If they are globally satisfied, they are probably not going to provide much discrimination in their responses. However, those who are not satisfied are usually not globally dissatisfied, but dissatisfied for a particular reason. In other words, it is those who are least satisfied who often provide the most valuable feedback. This suggests to me that your cases be divided into those who are globally satisfied (and therefore whose questionnaires provide the least useful information, other than their general satisfaction), and those who were least satisfied. Then do your analysis again based on this reduced subset (the least-happy campers). You can do this because you've got good sample sizes, and I don't.

Our current survey is only one page, but it seems to cover varied issues: Overall, counselor helpfulness, customer participation in setting goals, identifying services, and choosing service providers, etc. In fact, as you found out, these all seem to be highly intercorrelated.

One suggestion is to look for which of your questions shows the largest variety of responses -- in other words, the least J-shaped distributions. A crude measure of this is just to calculate the standard deviation of the responses. Of course, you have to check the wording of the  question, because a badly worded question can produce a variety of confused responses. But you could try weighting each question in proportion to its standard deviation, and see if that helps.

Also, I'd look at the overall satisfaction scores for each store. If you've got a problem store, you want to know about that, right?

Ah, the joys of a decent sample size!

Bob Schacht


Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814



____________
DefenderMX2.
Reply | Threaded
Open this post in threaded view
|

Re: All correlations are significant

statisticsdoc
In reply to this post by Heidi Green
Heidi,
 
Given the happy fact that all of the predictors are strongly related to overall satisfaction, you might want to base your recommendations to the client on the distribution of answers to the specific items.  Granted, the items all show high levels of satisfaction, but are there any where a higher percentage of the respondents are in the unhappy tail?  I would also consider a factor analysis of the specific items so you might speak to your client about some broad underlying dimensions that drive satisfaction.
 
Just some thoughts - hope they help,
 
Stephen Brand
 
 
 
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Heidi Green
Sent: Thursday, April 16, 2009 2:45 PM
To: [hidden email]
Subject: All correlations are significant

Hello-

I would love to hear some of your expert suggestions on the following project I am working on. First off, please bear with me because it has been about 10 years or more since I have had to do any actual statistical analysis in my work (lately it has been just data manipulation and straight frequencies/graphs for output).

 

I have been asked to explore some correlations between responses to questions on a Customer Satisfaction Survey. The survey begins with an “Overall Satisfaction” question, and then has more specific questions relating to the store that the respondent chose, their retail experience at that store, their salesperson, their purchase transaction (paperwork, financing, etc), product delivery process, and follow-up after the purchase. (This is a large-ticket item we are talking about).

 

The main goal of my analysis is to be able to say something like: “If a store’s overall satisfaction score is low, they need to work on improving _____, because that is highly correlated with overall satisfaction”. Where the ______ is one of the more specific items, such as Store Cleanliness, Knowledge of Product, Integrity of Salesperson, Ability of staff to answer questions”, etc.

 

The Overall Satisfaction question is structured like this:

Overall, how satisfied are you with your purchase experience at (your store)?

100 = Completely Satisfied

  80 = Very Satisfied

  50 = Somewhat Satisfied

  20 = Slightly Satisfied

    0 = Not At All Satisfied

 

The more specific questions look like this: (examples.)

 

Please rate your salesperson on each of the following:

 

Being considerate of my time…

100 = Excellent

  80 = Good

  50 = Average

  20 = Fair

    0 = Poor

 

Ability to answer my questions…

100 = Excellent

  80 = Good

  50 = Average

  20 = Fair

    0 = Poor

 

I have plenty of responses to work with (over 100,000 respondents across thousands of stores). However, the vast majority of people ARE satisfied (90% of respondents are either completely or very satisfied). The distribution is “J” shaped. Because of that, I have been running Bivariate Correlations, looking at the overall satisfaction question against each of the more specific questions, and looking at both the Kendall’s tau-b statistic and Spearman’s Rho. (Again, forgive my statistical ignorance). Right now, I’m looking at all stores lumped together, because with thousands of stores, it would be too time consuming to look at the correlations for each store.

 

The problem: Everything is significant! The correlation coefficients range from approx. .44 to .67. I can obviously pick the highest one, but I’m not sure I feel comfortable picking out one question if the difference between the correlation coefficients for that question and the next is .645 vs. .634.  I guess I’m just not seeing an “Ah ha” type of question that stands out above the rest.

 

Since I’m more concerned about LOW overall satisfaction scores, is there an accepted procedure or statistically defensible way for me to divide my data or just use a subset, and see if that produces anything more interesting?

 

By the way, I don’t want to use regression or any other more complicated procedure if possible, because I need to be able to come up with a straightforward action plan for my client. (I can’t say something like “you can increase your score if you work on Q1 * X + Q2 * Y + ….etc).

 

 Thank you in advance for sharing your thoughts.