Re: Stats qns

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Sibusiso Moyo
Dear All,

I have market share data that has 83 cases (products) by 15 sources of information (variables). So the whole matrix is populated with share information with the 15 sources of information being possible places where respondent would have heard about the product (before they used it).

Now my simple task is to determine whether the shares differ depending on what the source of information was.

Could anyone have an idea on how to approach this?

The data looks something like this:



         info source1    info source2    info source3    info source4    info source5
prod 1   9.67    6.04    2.14    5.10    6.00
prod 2   3.00    6.67    .       0.00    6.25
prod 3   31.17   30.16   0.00    30.00   29.27
prod 4   3.75    0.74    0.00    1.00    3.75
prod 5   25.00   28.33   .       5.00    15.00
prod 6   8.38    2.87    3.14    2.05    2.00
prod 7   .       2.50    0.00    0.00    10.00
prod 8   22.25   17.87   10.04   17.40   18.92
prod 9   6.00    6.83    2.83    1.52    3.67
prod 10  6.33    2.74    3.80    2.73    3.18
prod 11  .       2.00    .       .       .
prod 12  .       0.00    0.00    0.00    0.00


Thanks,

Sibusiso.
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Marta García-Granero
Hi Sibusiso

I'm not really sure about what you are asking. Are you interested in
finding out if the sources differ (they don't give the same mean value
for all the products) or if they are consistent (if product nr. 3 has
the highest value in source 1, then source 2, 3... should give it also
higher values than the rest of products...).

First question would be a repeated measures ANOVA (or Friedman test if
data are non normally distributed). Second question would answered by
Kendall's test (warning: I'm NOT talking about Kendall's tau correlation
coefficient).

Both cases, you have a problem: those scattered missing data will
lower your sample size.

SM> I have market share data that has 83 cases (products) by 15
SM> sources of information (variables). So the whole matrix is
SM> populated with share information with the 15 sources of
SM> information being possible places where respondent would have
SM> heard about the product (before they used it).

SM> Now my simple task is to determine whether the shares differ
SM> depending on what the source of information was.

SM> Could anyone have an idea on how to approach this?

SM> The data looks something like this:

SM>          info source1    info source2    info source3    info source4    info source5
SM> prod 1   9.67    6.04    2.14    5.10    6.00
SM> prod 2   3.00    6.67    .       0.00    6.25
SM> prod 3   31.17   30.16   0.00    30.00   29.27
SM> prod 4   3.75    0.74    0.00    1.00    3.75
SM> prod 5   25.00   28.33   .       5.00    15.00
SM> prod 6   8.38    2.87    3.14    2.05    2.00
SM> prod 7   .       2.50    0.00    0.00    10.00
SM> prod 8   22.25   17.87   10.04   17.40   18.92
SM> prod 9   6.00    6.83    2.83    1.52    3.67
SM> prod 10  6.33    2.74    3.80    2.73    3.18
SM> prod 11  .       2.00    .       .       .
SM> prod 12  .       0.00    0.00    0.00    0.00




--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Sibusiso Moyo
Marta,

Thanks for your response.

What I am trying to find out is which Source of information consistently yields a higher score (/share), across different products. That is if these people heard from info-source1 are their scores higher than if they heard it from info-source 2 etc?

I hope this is much clearer.

Sibusiso.


-----Original Message-----
From:   SPSSX(r) Discussion on behalf of Marta García-Granero
Sent:   Fri 11/3/2006 1:49 AM
To:     [hidden email]
Cc:
Subject:             Re: Stats qns

Hi Sibusiso

I'm not really sure about what you are asking. Are you interested in
finding out if the sources differ (they don't give the same mean value
for all the products) or if they are consistent (if product nr. 3 has
the highest value in source 1, then source 2, 3... should give it also
higher values than the rest of products...).

First question would be a repeated measures ANOVA (or Friedman test if
data are non normally distributed). Second question would answered by
Kendall's test (warning: I'm NOT talking about Kendall's tau correlation
coefficient).

Both cases, you have a problem: those scattered missing data will
lower your sample size.

SM> I have market share data that has 83 cases (products) by 15
SM> sources of information (variables). So the whole matrix is
SM> populated with share information with the 15 sources of
SM> information being possible places where respondent would have
SM> heard about the product (before they used it).

SM> Now my simple task is to determine whether the shares differ
SM> depending on what the source of information was.

SM> Could anyone have an idea on how to approach this?

SM> The data looks something like this:

SM>          info source1    info source2    info source3    info source4    info source5
SM> prod 1   9.67    6.04    2.14    5.10    6.00
SM> prod 2   3.00    6.67    .       0.00    6.25
SM> prod 3   31.17   30.16   0.00    30.00   29.27
SM> prod 4   3.75    0.74    0.00    1.00    3.75
SM> prod 5   25.00   28.33   .       5.00    15.00
SM> prod 6   8.38    2.87    3.14    2.05    2.00
SM> prod 7   .       2.50    0.00    0.00    10.00
SM> prod 8   22.25   17.87   10.04   17.40   18.92
SM> prod 9   6.00    6.83    2.83    1.52    3.67
SM> prod 10  6.33    2.74    3.80    2.73    3.18
SM> prod 11  .       2.00    .       .       .
SM> prod 12  .       0.00    0.00    0.00    0.00




--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Sibusiso Moyo
In reply to this post by Sibusiso Moyo
Marta,

Here is the test I ended up running:

GLM
  w5_mean_1.1 w5_mean_1.2 w5_mean_1.3 w5_mean_1.5 w5_mean_1.6 w5_mean_1.7
  w5_mean_1.8 w5_mean_1.9 w5_mean_1.10 w5_mean_1.11 w5_mean_1.12 w5_mean_1.13
  w5_mean_1.14
  /WSFACTOR = Source 13 Polynomial
  /METHOD = SSTYPE(3)
  /EMMEANS = TABLES(Source)
  /PRINT = DESCRIPTIVE ETASQ OPOWER PARAMETER TEST(MMATRIX) LOF GEF
  /PLOT = RESIDUALS
  /CRITERIA = ALPHA(.05)
  /WSDESIGN = Source .


My justification is that each of my products provides data for all the 14 sources of information. So I am looking at this as a One Way within Subjects ANOVA.

Thanks for your insight,

Sibusiso.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Friday, November 03, 2006 1:49 AM
To: [hidden email]
Subject: Re: Stats qns


Hi Sibusiso

I'm not really sure about what you are asking. Are you interested in
finding out if the sources differ (they don't give the same mean value
for all the products) or if they are consistent (if product nr. 3 has
the highest value in source 1, then source 2, 3... should give it also
higher values than the rest of products...).

First question would be a repeated measures ANOVA (or Friedman test if
data are non normally distributed). Second question would answered by
Kendall's test (warning: I'm NOT talking about Kendall's tau correlation
coefficient).

Both cases, you have a problem: those scattered missing data will
lower your sample size.

SM> I have market share data that has 83 cases (products) by 15
SM> sources of information (variables). So the whole matrix is
SM> populated with share information with the 15 sources of
SM> information being possible places where respondent would have
SM> heard about the product (before they used it).

SM> Now my simple task is to determine whether the shares differ
SM> depending on what the source of information was.

SM> Could anyone have an idea on how to approach this?

SM> The data looks something like this:

SM>          info source1    info source2    info source3    info source4    info source5
SM> prod 1   9.67    6.04    2.14    5.10    6.00
SM> prod 2   3.00    6.67    .       0.00    6.25
SM> prod 3   31.17   30.16   0.00    30.00   29.27
SM> prod 4   3.75    0.74    0.00    1.00    3.75
SM> prod 5   25.00   28.33   .       5.00    15.00
SM> prod 6   8.38    2.87    3.14    2.05    2.00
SM> prod 7   .       2.50    0.00    0.00    10.00
SM> prod 8   22.25   17.87   10.04   17.40   18.92
SM> prod 9   6.00    6.83    2.83    1.52    3.67
SM> prod 10  6.33    2.74    3.80    2.73    3.18
SM> prod 11  .       2.00    .       .       .
SM> prod 12  .       0.00    0.00    0.00    0.00




--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Marta García-Granero
Hi Sibusiso

Did you loose a lotsample size after listwise deletion? If you want to
recover those partial data you have for any product with missing
sources, you can use varstocases to change your dataset from wide to
long and use a two factor unianova with no interaction terms. The only
condition for the last approach (besides normality, of course) is that
Mauchly's sphericity test (you got it as part pf your GLM analysis) is
non significant.

If you need a bit of assistance to transform your dataset and run the
unianova tell me, and I'll send a worked sample dataset.

Regards,
Marta

SM> GLM
SM>   w5_mean_1.1 w5_mean_1.2 w5_mean_1.3 w5_mean_1.5 w5_mean_1.6 w5_mean_1.7
SM>   w5_mean_1.8 w5_mean_1.9 w5_mean_1.10 w5_mean_1.11 w5_mean_1.12 w5_mean_1.13
SM>   w5_mean_1.14
SM>   /WSFACTOR = Source 13 Polynomial
SM>   /METHOD = SSTYPE(3)
SM>   /EMMEANS = TABLES(Source)
SM>   /PRINT = DESCRIPTIVE ETASQ OPOWER PARAMETER TEST(MMATRIX) LOF GEF
SM>   /PLOT = RESIDUALS
SM>   /CRITERIA = ALPHA(.05)
SM>   /WSDESIGN = Source .

SM> My justification is that each of my products provides data
SM> for all the 14 sources of information. So I am looking at this as
SM> a One Way within Subjects ANOVA.
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Sibusiso Moyo
In reply to this post by Sibusiso Moyo
Marta,

Thanks for your help.

I have been using "series means" to replace the missing values. I am not sure how accuarate that is, but I am willing to try the method you are proposing.

Please send a worked sample data set.

Thanks a million,

Sibusiso.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Saturday, November 04, 2006 5:35 AM
To: [hidden email]
Subject: Re: Stats qns


Hi Sibusiso

Did you loose a lotsample size after listwise deletion? If you want to
recover those partial data you have for any product with missing
sources, you can use varstocases to change your dataset from wide to
long and use a two factor unianova with no interaction terms. The only
condition for the last approach (besides normality, of course) is that
Mauchly's sphericity test (you got it as part pf your GLM analysis) is
non significant.

If you need a bit of assistance to transform your dataset and run the
unianova tell me, and I'll send a worked sample dataset.

Regards,
Marta

SM> GLM
SM>   w5_mean_1.1 w5_mean_1.2 w5_mean_1.3 w5_mean_1.5 w5_mean_1.6 w5_mean_1.7
SM>   w5_mean_1.8 w5_mean_1.9 w5_mean_1.10 w5_mean_1.11 w5_mean_1.12 w5_mean_1.13
SM>   w5_mean_1.14
SM>   /WSFACTOR = Source 13 Polynomial
SM>   /METHOD = SSTYPE(3)
SM>   /EMMEANS = TABLES(Source)
SM>   /PRINT = DESCRIPTIVE ETASQ OPOWER PARAMETER TEST(MMATRIX) LOF GEF
SM>   /PLOT = RESIDUALS
SM>   /CRITERIA = ALPHA(.05)
SM>   /WSDESIGN = Source .

SM> My justification is that each of my products provides data
SM> for all the 14 sources of information. So I am looking at this as
SM> a One Way within Subjects ANOVA.
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Marta García-Granero
Hi Sibusiso

SM> I have been using "series means" to replace the missing
SM> values.

Houmm... There are several caveats to that approach. Read this
excellent work on the topic, please:

http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html

There is also a program called NORM that works with SPSS and generates a
large collection of datasets with different replacements of missing
data:

http://www.stat.psu.edu/~jls/misoftwa.html

http://www.stat.psu.edu/~jls/mifaq.html

SM> I am not sure how accurate that is, but I am willing to
SM> try the method you are proposing.

SM> Please send a worked sample data set.

Here it goes (ignore the error message when the dataset is defined,
it's due to the missing data).

DATA LIST LIST/control zone1 zone2 zone3 (4 F8.1).
BEGIN DATA
15.0 17.9 16.5 16.7
23.5 26.5 35.4 34.1
20.1 45.2 22.6  .
26.1 39.1 33.4 30.6
26.5 35.2 37.6 30.1
19.4 35.1 30.4 24.6
16.4 31.8  .   20.1
21.1 21.4 20.8 18.4
19.8 33.1 29.4 24.3
17.4 31.1 28.4 29.6
END DATA.
VAR LABEL control 'Cu levels in Control skin'
         /zone1   'Cu levels en Zone 1 burned skin'
         /zone2   'Cu levels en Zone 2 burned skin'
         /zone3   'Cu levels en Zone 3 burned skin'.

VARSTOCASES
 /ID = rats
 /MAKE copper FROM control TO zone3
 /INDEX = zones 'Burned skin zones'(4)
 /KEEP =
 /NULL = KEEP.
VAR LAB copper 'Copper levels'.
VAL LAB zones 1'Control' 2'Zone 1' 3'Zone 2' 4'Zone 3'.

UNIANOVA
  copper  BY zones rats
  /RANDOM = rats
  /METHOD = SSTYPE(4)
  /INTERCEPT = EXCLUDE
  /EMMEANS = TABLES(zones) COMPARE ADJ(BONFERRONI)
  /PLOT = RESIDUALS
  /CRITERIA = ALPHA(.05)
  /DESIGN = zones rats .


Regards,
Marta
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Sibusiso Moyo
In reply to this post by Sibusiso Moyo
Marta,

When I ran the GLM (WSFACTOR) in wide format I had 83 cases, and that number drops down to 23 useful ones due to missing values.

My data was actually in long format before I used Casestovars to tranform it to a wide format. So i have both formats. I will now proceed and run the second method you suggested!

Thank you,

Sibusiso.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Saturday, November 04, 2006 5:35 AM
To: [hidden email]
Subject: Re: Stats qns


Hi Sibusiso

Did you loose a lotsample size after listwise deletion? If you want to
recover those partial data you have for any product with missing
sources, you can use varstocases to change your dataset from wide to
long and use a two factor unianova with no interaction terms. The only
condition for the last approach (besides normality, of course) is that
Mauchly's sphericity test (you got it as part pf your GLM analysis) is
non significant.

If you need a bit of assistance to transform your dataset and run the
unianova tell me, and I'll send a worked sample dataset.

Regards,
Marta

SM> GLM
SM>   w5_mean_1.1 w5_mean_1.2 w5_mean_1.3 w5_mean_1.5 w5_mean_1.6 w5_mean_1.7
SM>   w5_mean_1.8 w5_mean_1.9 w5_mean_1.10 w5_mean_1.11 w5_mean_1.12 w5_mean_1.13
SM>   w5_mean_1.14
SM>   /WSFACTOR = Source 13 Polynomial
SM>   /METHOD = SSTYPE(3)
SM>   /EMMEANS = TABLES(Source)
SM>   /PRINT = DESCRIPTIVE ETASQ OPOWER PARAMETER TEST(MMATRIX) LOF GEF
SM>   /PLOT = RESIDUALS
SM>   /CRITERIA = ALPHA(.05)
SM>   /WSDESIGN = Source .

SM> My justification is that each of my products provides data
SM> for all the 14 sources of information. So I am looking at this as
SM> a One Way within Subjects ANOVA.
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Swank, Paul R
This would be better as a mixed models analysis.

________________________________

From: SPSSX(r) Discussion on behalf of Sibusiso Moyo
Sent: Mon 11/6/2006 10:00 AM
To: [hidden email]
Subject: Re: Stats qns



Marta,

When I ran the GLM (WSFACTOR) in wide format I had 83 cases, and that number drops down to 23 useful ones due to missing values.

My data was actually in long format before I used Casestovars to tranform it to a wide format. So i have both formats. I will now proceed and run the second method you suggested!

Thank you,

Sibusiso.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Saturday, November 04, 2006 5:35 AM
To: [hidden email]
Subject: Re: Stats qns


Hi Sibusiso

Did you loose a lotsample size after listwise deletion? If you want to
recover those partial data you have for any product with missing
sources, you can use varstocases to change your dataset from wide to
long and use a two factor unianova with no interaction terms. The only
condition for the last approach (besides normality, of course) is that
Mauchly's sphericity test (you got it as part pf your GLM analysis) is
non significant.

If you need a bit of assistance to transform your dataset and run the
unianova tell me, and I'll send a worked sample dataset.

Regards,
Marta

SM> GLM
SM>   w5_mean_1.1 w5_mean_1.2 w5_mean_1.3 w5_mean_1.5 w5_mean_1.6 w5_mean_1.7
SM>   w5_mean_1.8 w5_mean_1.9 w5_mean_1.10 w5_mean_1.11 w5_mean_1.12 w5_mean_1.13
SM>   w5_mean_1.14
SM>   /WSFACTOR = Source 13 Polynomial
SM>   /METHOD = SSTYPE(3)
SM>   /EMMEANS = TABLES(Source)
SM>   /PRINT = DESCRIPTIVE ETASQ OPOWER PARAMETER TEST(MMATRIX) LOF GEF
SM>   /PLOT = RESIDUALS
SM>   /CRITERIA = ALPHA(.05)
SM>   /WSDESIGN = Source .

SM> My justification is that each of my products provides data
SM> for all the 14 sources of information. So I am looking at this as
SM> a One Way within Subjects ANOVA.
Reply | Threaded
Open this post in threaded view
|

Re: Stats qns

Sibusiso Moyo
In reply to this post by Sibusiso Moyo
Paul,
 
It is my understanding that Linear Mixed Models are related to the GLM Univariate or GLM Repeated Measures procedures, that is they can be used interchangeably (after having taken care of correlation issues).
 
So Paul I was wondering why you thought Mixed Model analysis would be better in this case?
 
Thanks,
 
Sibusiso.

-----Original Message-----
From: Swank, Paul R [mailto:[hidden email]]
Sent: Monday, November 06, 2006 3:30 PM
To: Sibusiso Moyo; [hidden email]
Subject: RE: Stats qns


This would be better as a mixed models analysis.

  _____  

From: SPSSX(r) Discussion on behalf of Sibusiso Moyo
Sent: Mon 11/6/2006 10:00 AM
To: [hidden email]
Subject: Re: Stats qns



Marta,

When I ran the GLM (WSFACTOR) in wide format I had 83 cases, and that number drops down to 23 useful ones due to missing values.

My data was actually in long format before I used Casestovars to tranform it to a wide format. So i have both formats. I will now proceed and run the second method you suggested!

Thank you,

Sibusiso.

-----Original Message-----
From: SPSSX(r) Discussion [ mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Saturday, November 04, 2006 5:35 AM
To: [hidden email]
Subject: Re: Stats qns


Hi Sibusiso

Did you loose a lotsample size after listwise deletion? If you want to
recover those partial data you have for any product with missing
sources, you can use varstocases to change your dataset from wide to
long and use a two factor unianova with no interaction terms. The only
condition for the last approach (besides normality, of course) is that
Mauchly's sphericity test (you got it as part pf your GLM analysis) is
non significant.

If you need a bit of assistance to transform your dataset and run the
unianova tell me, and I'll send a worked sample dataset.

Regards,
Marta

SM> GLM
SM>   w5_mean_1.1 w5_mean_1.2 w5_mean_1.3 w5_mean_1.5 w5_mean_1.6 w5_mean_1.7
SM>   w5_mean_1.8 w5_mean_1.9 w5_mean_1.10 w5_mean_1.11 w5_mean_1.12 w5_mean_1.13
SM>   w5_mean_1.14
SM>   /WSFACTOR = Source 13 Polynomial
SM>   /METHOD = SSTYPE(3)
SM>   /EMMEANS = TABLES(Source)
SM>   /PRINT = DESCRIPTIVE ETASQ OPOWER PARAMETER TEST(MMATRIX) LOF GEF
SM>   /PLOT = RESIDUALS
SM>   /CRITERIA = ALPHA(.05)
SM>   /WSDESIGN = Source .

SM> My justification is that each of my products provides data
SM> for all the 14 sources of information. So I am looking at this as
SM> a One Way within Subjects ANOVA.