non-parametric one-sample test?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

non-parametric one-sample test?

Chris Smith-49
Hi

I have a data set of roughly 50 companies - each were asked to state the
percentage of their work that they would classify as 'green' in nature.
I want to run a test to show that over the sample, this differs from
50%, i.e. on average there is not an even balance between green and
non-green work being done.


If this was a normally distributed variable, I'd run a standard
one-sample t-test - however the distribution is fairly positively skewed
(most companies seem to report between 10-30% - a few report higher),
but with a small spike at 100%. Therefore at the very least trying a
non-paramtric equivalent alongside a t-test seems appropriate - however
going through the SPSS menus I can't find the non-parametric equivalent
to the one-sample t-test. Any ideas/refernces - I'm sure I'm missing
something obvious here!

Thanks in advance for any help

Chris


We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
Reply | Threaded
Open this post in threaded view
|

Re: non-parametric one-sample test?

Martin Holt
Hi Chris,
 
Would a sign test be the one ?
 
Best Wishes,
 
Martin Holt
----- Original Message -----
Sent: Thursday, March 18, 2010 10:28 AM
Subject: non-parametric one-sample test?

Hi

I have a data set of roughly 50 companies - each were asked to state the
percentage of their work that they would classify as 'green' in nature.
I want to run a test to show that over the sample, this differs from
50%, i.e. on average there is not an even balance between green and
non-green work being done.


If this was a normally distributed variable, I'd run a standard
one-sample t-test - however the distribution is fairly positively skewed
(most companies seem to report between 10-30% - a few report higher),
but with a small spike at 100%. Therefore at the very least trying a
non-paramtric equivalent alongside a t-test seems appropriate - however
going through the SPSS menus I can't find the non-parametric equivalent
to the one-sample t-test. Any ideas/refernces - I'm sure I'm missing
something obvious here!

Thanks in advance for any help

Chris


We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
Reply | Threaded
Open this post in threaded view
|

Re: non-parametric one-sample test?

statisticsdoc
In reply to this post by Chris Smith-49
Chris,

Consider using the one-sample wilcoxon signed rank test, which tests
whether the median value in the population is equal to zero.  It is
available starting with SPSS 18.

HTH,

Steve Brand

www.StatisticsDoc.com





On Thu, Mar 18, 2010 at 6:28 AM, Chris Smith wrote:

> Hi
>
> I have a data set of roughly 50 companies - each were asked to state
> the percentage of their work that they would classify as 'green' in
> nature. I want to run a test to show that over the sample, this
> differs from 50%, i.e. on average there is not an even balance between
> green and non-green work being done.
>
>
> If this was a normally distributed variable, I'd run a standard
> one-sample t-test - however the distribution is fairly positively
> skewed (most companies seem to report between 10-30% - a few report
> higher), but with a small spike at 100%. Therefore at the very least
> trying a non-paramtric equivalent alongside a t-test seems appropriate
> - however going through the SPSS menus I can't find the non-parametric
> equivalent to the one-sample t-test. Any ideas/refernces - I'm sure
> I'm missing something obvious here!
>
> Thanks in advance for any help
>
> Chris
>
> _________________________________________________________________
> Tell us your greatest, weirdest and funniest Hotmail stories
> http://clk.atdmt.com/UKM/go/195013117/direct/01/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: non-parametric one-sample test?

Garry Gelade
In reply to this post by Chris Smith-49

Dear Chris,

 

You might try a data transformation such as 1/x or log(x) to see if that makes the distribution closer to normal, and then use a t-test. Or, f you have the stomach for it, bootsrapping.

 

Garry Gelade

Business Analytic Ltd.

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chris Smith
Sent: 18 March 2010 10:29
To: [hidden email]
Subject: non-parametric one-sample test?

 

Hi

I have a data set of roughly 50 companies - each were asked to state the
percentage of their work that they would classify as 'green' in nature.
I want to run a test to show that over the sample, this differs from
50%, i.e. on average there is not an even balance between green and
non-green work being done.


If this was a normally distributed variable, I'd run a standard
one-sample t-test - however the distribution is fairly positively skewed
(most companies seem to report between 10-30% - a few report higher),
but with a small spike at 100%. Therefore at the very least trying a
non-paramtric equivalent alongside a t-test seems appropriate - however
going through the SPSS menus I can't find the non-parametric equivalent
to the one-sample t-test. Any ideas/refernces - I'm sure I'm missing
something obvious here!

Thanks in advance for any help

Chris

 


We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now

__________ Information from ESET NOD32 Antivirus, version of virus signature database 4954 (20100318) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__________ Information from ESET NOD32 Antivirus, version of virus signature database 4954 (20100318) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
Reply | Threaded
Open this post in threaded view
|

Re: non-parametric one-sample test?

Marta Garcia-Granero
In reply to this post by Chris Smith-49
Chris Smith wrote:

> I have a data set of roughly 50 companies - each were asked to state the
> percentage of their work that they would classify as 'green' in nature.
> I want to run a test to show that over the sample, this differs from
> 50%, i.e. on average there is not an even balance between green and
> non-green work being done.
>
>
> If this was a normally distributed variable, I'd run a standard
> one-sample t-test - however the distribution is fairly positively skewed
> (most companies seem to report between 10-30% - a few report higher),
> but with a small spike at 100%. Therefore at the very least trying a
> non-paramtric equivalent alongside a t-test seems appropriate - however
> going through the SPSS menus I can't find the non-parametric equivalent
> to the one-sample t-test. Any ideas/refernces - I'm sure I'm missing
> something obvious here!
>

Hi Chris:

You have already received a lot of responses, but I think that none
focused on the problem that SPSS can't (won't?) do a one sample
non-parametric test, like Wilcoxon signed ranks test or signs test.  At
the end of this message I will give you a workaround to get both.
Concerning which one is better for your data, I would discard Wilcoxon
test, since you have mentioned that your data are positively skewed, and
symmetry (but not normality)  is a condition for Wilcoxon test. You can
either try what's already been suggested to you: log-transform your data
and check symmetry again, or use sign test.

Workaround with SPSS: you have to compute a new variable with the
reference value (50% in your case), and then run a two related samples
non-parametric test:

COMPUTE RefValue=50.
NPAR TEST
  /WILCOXON= RefValue WITH PercentGreen
  /SIGN    = RefValue WITH PercentGreen.

HTH,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: non-parametric one-sample test?

Richard Ristow
In reply to this post by Chris Smith-49
At 06:28 AM 3/18/2010, Chris Smith wrote:

I have a data set of roughly 50 companies - each were asked to state the percentage of their work that they would classify as 'green' in nature. I want to run a test to show that over the sample, this differs from 50%, i.e. on average there is not an even balance between green and non-green work being done.

This is one of those problems that raise at least as many problems of interpretation as of methodology.

You want to show that "on average there is not an even balance between green and non-green work".

First, there's your measure: "the percentage of their work that they would classify as 'green'" (emphasis added). I dare say your actual question was more rigorous than this sounds; but as phrase, the terms 'green' and 'percentage' are both free-floating, with respondents using what definitions they choose. 'Green' is a notoriously flexible word, but 'percentage' isn't much better: Percent by total revenue? By number of product lines? By labor-hours applied? By square feet of factory floor?

Do you know what definition each respondent used, of either 'green' or 'percentage'?

On top of that, are the percentages measured, or estimated? By common experience (and, I believe, psychological experiments), humans don't estimate proportions well; particularly, low-proportion categories are commonly either inflated or ignored in estimates.

And this lays aside the question of bias, for example the highly likely desire to report one's own company's work as 'green' as possible. I lay it aside, but you'll have to discuss it carefully when you report the results.

Then, if you can argue you have a meaningful measure of the "percentage of their work that [is] 'green'", and want to test whether "on average there is an even balance" on that measure, then what does "on average" mean? Is there any statistic on your data that you can defend as a measure that may legitimately be called an 'average'? Among other things, a data transformation that doesn't have theoretical backup makes any 'average' hard to defend. (What is the meaning of the geometric mean of a set of measures of proportion? But if you log-transform, you're saying that that is the meaningful measure.)

I guess you could classify each company as "<50%" and ">50%", and test whether the counts showed a significant difference from equal proportion (see BINOMIAL or CHISQUARE in NPAR TESTS), but what does the answer tell you?

OK, let's see what you might do with your data. You say that "most companies seem to report between 10-30% - a few report higher, but with a small spike at 100%". That's a 'cloud and outlier' distribution -- the "10-30%" is the 'cloud', the 100% the 'outlier'. In the absence of a good theoretical model, it's best to analyze that in two steps: outlier vs. cloud, and variations within cloud. Unfortunately, that takes a goodly sample size; 50 companies is probably far too few.

But it sounds like remotely reasonable summary statistics will give a value within the 'cloud', i.e. "between 10-30%". That solves your problem whether the value is above or below 50%.

The only question is whether the "small spike at 100%" can be used to argue against that conclusion. Either you say no, that spike doesn't change the picture; or, you address two more questions:

First, do you believe those "100%" reports? Are they making up self-congratulatory numbers? It they're companies with specifically 'green' lines of work like wind energy, do you agree that makes them 100% 'green'? (What about pollution from a wind-turbine factory?)

If you do believe them, you can look for a summary statistic that you can claim is reasonable, and that assigns enough weight to the 'green' work by the companies reporting high numbers, to offset the work of the many that report much below 50%.

-Best of luck to you,
 Richard Ristow
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: non-parametric one-sample test?

Chris Smith-49
Hi all
I have a couple of situations where one respondent's response on an outcome variable is systematically related to other respondents, and am not sure quite how to model them:

i) I have 20 teams of 5 individuals i.e. 100 people nested within 20 teams. Within each team there is a competition to find the best employee (winner coded 1, 4 losers coded 0). I want to (attempt to) predict the winners from their demographic characteristics.

At the moment I have the data at the employee level i.e.

ID TEAM WINNER AGE SEX
1   1        0           25   0
2   1        0           27   0
3   1        1           21   1
4   1        0           45   0
5   1        0           27   1
6   2        0           35   0
7   2        1           55   0
8   2        0           17   1
etc

My initial though was to use logistic regression, but within each team the outcome var, WINNER, is systematically related between employees, in that if one employee = 1, the rest must equal 0. Any ideas how to proceed?


ii) is very similar, in that the data is identical, but instead of winner I have a ranking of employees performance in the competition from 1 to 5 i.e.

ID TEAM CPRANK AGE SEX
1   1        2           25   0
2   1        3           27   0
3   1        1           21   1
4   1        4           45   0
5   1        5           27   1
6   2        4           35   0
7   2        1           55   0
8   2        3           17   1
etc

I could use ordinal or standard multiple regression, but again the outcomes are related (can only have 1 '1', 1 '2', 1 '3' etc within each team)


Any help or suggestions for references much appreciated ;-)


Get a free e-mail account with Hotmail. Sign-up now.
Reply | Threaded
Open this post in threaded view
|

predicting winners/losers and ranks?

Chris Smith-49
In reply to this post by Richard Ristow
Hi all
I have a couple of situations where one respondent's response on an outcome variable is systematically related to other respondents, and am not sure quite how to model them:

i) I have 20 teams of 5 individuals i.e. 100 people nested within 20 teams. Within each team there is a competition to find the best employee (winner coded 1, 4 losers coded 0). I want to (attempt to) predict the winners from their demographic characteristics.

At the moment I have the data at the employee level i.e.

ID TEAM WINNER AGE SEX
1   1        0           25   0
2   1        0           27   0
3   1        1           21   1
4   1        0           45   0
5   1        0           27   1
6   2        0           35   0
7   2        1           55   0
8   2        0           17   1
etc

My initial though was to use logistic regression, but within each team the outcome var, WINNER, is systematically related between employees, in that if one employee = 1, the rest must equal 0. Any ideas how to proceed?


ii) is very similar, in that the data is identical, but instead of winner I have a ranking of employees performance in the competition from 1 to 5 i.e.

ID TEAM CPRANK AGE SEX
1   1        2           25   0
2   1        3           27   0
3   1        1           21   1
4   1        4           45   0
5   1        5           27   1
6   2        4           35   0
7   2        1           55   0
8   2        3           17   1
etc

I could use ordinal or standard multiple regression, but again the outcomes are related (can only have 1 '1', 1 '2', 1 '3' etc within each team)


Any help or suggestions for references much appreciated ;-)


Get a free e-mail account with Hotmail. Sign-up now. <a href="http://">

Get a new e-mail account with Hotmail - Free. Sign-up now.