SPSSX Discussion

testing statistical dfference between medians of a sample and a subsample extracted from the sample

Classic

List

Threaded

9 messages Options

vini

testing statistical dfference between medians of a sample and a subsample extracted from the sample

HI all !

In a data analysis I am required to perform a statistical test (parametric) to know the statistical difference(if significant) between median of 2 sample where one is full sample and another is sub sample extracted from the full sample based on a given characteristics (e.g. respondents belonging to certain age group).

Can anyone suggest how togo about it in spss ?

regards
vini

Rich Ulrich

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

"Parametric" and "median" don't usually go together. See (3)
for a possible meaning. There are other problems.

First - There is no such thing as a "proper" test for a sub-sample
versus the whole sample that it comes from. The necessary logic
says that you compare a sub-sample to the *rest* of the sample.

You may occasionally see a good presentation that does use the
approximate tests of this sort, for convenience and ease, plus a
strong desire to accommodate Ns that are unequal. For sub-samples
with equal Ns, you can use a simple Confidence Interval around the
overall mean.

Second - Almost nobody actually, ever compares "medians". That
description is less often accurate than it is an erroneous reference
to a test of ranks.

Third - The most "non-parametric" way to put a Confidence Interval
around the median of a single sample (full sample, here?) is to
end up using ranks of scores in the sample to delimit the range.
For instance, for a sample of a certain N, the 40th and 60th centiles
might determine the scores to mark the 95% CI. There is no strong
reason to expect that CI to be symmetrical around the median. If
you wanted a "parametric" version of that, I suppose you would use
the SD to determine a range. Do you want to pick out the samples
whose medians do not fall in that range?

--
Rich Ulrich

> Date: Tue, 21 Aug 2012 01:48:46 -0700

> From: [hidden email]
> Subject: testing statistical dfference between medians of a sample and a subsample extracted from the sample
> To: [hidden email]
>
> HI all !
>
> In a data analysis I am required to perform a statistical test (parametric)
> to know the statistical difference(if significant) between median of 2
> sample where one is full sample and another is sub sample extracted from the
> full sample based on a given characteristics (e.g. respondents belonging to
> certain age group).
>
> Can anyone suggest how togo about it in spss ?
>
> regards

...

Bruce Weaver

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

Administrator

I too was wondering why you wanted a test comparing medians. People sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U) compares medians (as opposed to means). But that is only true if the two populations being compared are identical apart from a shift in location. And in that case, the test could be said to be comparing means, medians, or any other percentile point you might choose.

By the way, the WMW is quite sensitive to small differences in variance or skewness in the populations, which can cause it to reject H0 far too often when it is used purely as a test of differences in location. See for example the nice article by Fagerland & Sandvik (2009).

http://www.ncbi.nlm.nih.gov/pubmed/19247980

HTH.

Rich Ulrich-2 wrote

"Parametric" and "median" don't usually go together. See (3)
for a possible meaning. There are other problems.

First - There is no such thing as a "proper" test for a sub-sample
versus the whole sample that it comes from. The necessary logic
says that you compare a sub-sample to the *rest* of the sample.

You may occasionally see a good presentation that does use the
approximate tests of this sort, for convenience and ease, plus a
strong desire to accommodate Ns that are unequal. For sub-samples
with equal Ns, you can use a simple Confidence Interval around the
overall mean.

Second - Almost nobody actually, ever compares "medians". That
description is less often accurate than it is an erroneous reference
to a test of ranks.

Third - The most "non-parametric" way to put a Confidence Interval
around the median of a single sample (full sample, here?) is to
end up using ranks of scores in the sample to delimit the range.
For instance, for a sample of a certain N, the 40th and 60th centiles
might determine the scores to mark the 95% CI. There is no strong
reason to expect that CI to be symmetrical around the median. If
you wanted a "parametric" version of that, I suppose you would use
the SD to determine a range. Do you want to pick out the samples
whose medians do not fall in that range?

--
Rich Ulrich

> Date: Tue, 21 Aug 2012 01:48:46 -0700
> From: [hidden email]
> Subject: testing statistical dfference between medians of a sample and a subsample extracted from the sample
> To: [hidden email]
>
> HI all !
>
> In a data analysis I am required to perform a statistical test (parametric)
> to know the statistical difference(if significant) between median of 2
> sample where one is full sample and another is sub sample extracted from the
> full sample based on a given characteristics (e.g. respondents belonging to
> certain age group).
>
> Can anyone suggest how togo about it in spss ?
>
> regards
...

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Marta Garcia-Granero

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

Hi:

1) Bruce, see also Anna Hart. "Mann-Whitnet test is not just a test of
medians: differences in spread can be important" (BMJ 2001;323:391-3). I
lost track of a reference that stated the same for Kruskal-Wallis test,
I'll try to dig it (to many files in my external hard disk).

2) Maybe vinikalra could compute a 95%CI for the median of the
subsample, and check if the full sample median is included within the
limits.

Best regards,
Marta GG

El 21/08/2012 23:32, Bruce Weaver escribió:

> I too was wondering why you wanted a test comparing medians. People
> sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U)
> compares medians (as opposed to means). But that is only true if the two
> populations being compared are identical apart from a shift in location.
> And in that case, the test could be said to be comparing means, medians, or
> any other percentile point you might choose.
>
> By the way, the WMW is quite sensitive to small differences in variance or
> skewness in the populations, which can cause it to reject H0 far too often
> when it is used purely as a test of differences in location. See for
> example the nice article by Fagerland & Sandvik (2009).
>
> http://www.ncbi.nlm.nih.gov/pubmed/19247980
>
> HTH.
>
>
> Rich Ulrich-2 wrote
>> "Parametric" and "median" don't usually go together. See (3)
>> for a possible meaning. There are other problems.
>>
>> First - There is no such thing as a "proper" test for a sub-sample
>> versus the whole sample that it comes from. The necessary logic
>> says that you compare a sub-sample to the *rest* of the sample.
>>
>> You may occasionally see a good presentation that does use the
>> approximate tests of this sort, for convenience and ease, plus a
>> strong desire to accommodate Ns that are unequal. For sub-samples
>> with equal Ns, you can use a simple Confidence Interval around the
>> overall mean.
>>
>> Second - Almost nobody actually, ever compares "medians". That
>> description is less often accurate than it is an erroneous reference
>> to a test of ranks.
>>
>> Third - The most "non-parametric" way to put a Confidence Interval
>> around the median of a single sample (full sample, here?) is to
>> end up using ranks of scores in the sample to delimit the range.
>> For instance, for a sample of a certain N, the 40th and 60th centiles
>> might determine the scores to mark the 95% CI. There is no strong
>> reason to expect that CI to be symmetrical around the median. If
>> you wanted a "parametric" version of that, I suppose you would use
>> the SD to determine a range. Do you want to pick out the samples
>> whose medians do not fall in that range?
>>
>> --
>> Rich Ulrich
>>
>>> Date: Tue, 21 Aug 2012 01:48:46 -0700
>>> From: vinikalra@
>>> Subject: testing statistical dfference between medians of a sample and a
>>> subsample extracted from the sample
>>> To: SPSSX-L@.UGA
>>>
>>> HI all !
>>>
>>> In a data analysis I am required to perform a statistical test
>>> (parametric)
>>> to know the statistical difference(if significant) between median of 2
>>> sample where one is full sample and another is sub sample extracted from
>>> the
>>> full sample based on a given characteristics (e.g. respondents belonging
>>> to
>>> certain age group).
>>>
>>> Can anyone suggest how togo about it in spss ?
>>>
>>> regards
>> ...
>>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714788.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

vini

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

In reply to this post by vini

Thanks for your reply. In the light of discussion above, it seems to me that to test the statistical difference between 2 samples' mean would be a better idea and in that case I can go for t-test ( As the data based on field survey, it can 'safely' be considered as normally distributed. ANY COMMENT ?).

And as far as the relevance of comparing a full sample and a sub sample is concerned, the idea is to analysie if a particular sub sample (extracted based on certain parameter e.g. age group, education etc.) has the influence on the full sample .
However, my question was how to go about it in SPSS(steps?) i.e. comparing full sample and sub sample of the full sample. Any suggestions?

regards,
vini

Bruce Weaver

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

Administrator

In reply to this post by Marta Garcia-Granero

Thanks Marta. Another good article on the same topic is Don Zimmerman's simulation study:

http://www.tandfonline.com/doi/abs/10.1207/S15328031US0204_03

Unfortunately, that journal is now defunct, so it can be difficult tracking down a PDF. I managed to get one from a colleague at another university, where the library still had access to it.

Cheers,
Bruce

Marta García-Granero-2 wrote

Hi:

1) Bruce, see also Anna Hart. "Mann-Whitnet test is not just a test of
medians: differences in spread can be important" (BMJ 2001;323:391-3). I
lost track of a reference that stated the same for Kruskal-Wallis test,
I'll try to dig it (to many files in my external hard disk).

2) Maybe vinikalra could compute a 95%CI for the median of the
subsample, and check if the full sample median is included within the
limits.

Best regards,
Marta GG

El 21/08/2012 23:32, Bruce Weaver escribió:
> I too was wondering why you wanted a test comparing medians. People
> sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U)
> compares medians (as opposed to means). But that is only true if the two
> populations being compared are identical apart from a shift in location.
> And in that case, the test could be said to be comparing means, medians, or
> any other percentile point you might choose.
>
> By the way, the WMW is quite sensitive to small differences in variance or
> skewness in the populations, which can cause it to reject H0 far too often
> when it is used purely as a test of differences in location. See for
> example the nice article by Fagerland & Sandvik (2009).
>
> http://www.ncbi.nlm.nih.gov/pubmed/19247980
>
> HTH.
>
>
> Rich Ulrich-2 wrote
>> "Parametric" and "median" don't usually go together. See (3)
>> for a possible meaning. There are other problems.
>>
>> First - There is no such thing as a "proper" test for a sub-sample
>> versus the whole sample that it comes from. The necessary logic
>> says that you compare a sub-sample to the *rest* of the sample.
>>
>> You may occasionally see a good presentation that does use the
>> approximate tests of this sort, for convenience and ease, plus a
>> strong desire to accommodate Ns that are unequal. For sub-samples
>> with equal Ns, you can use a simple Confidence Interval around the
>> overall mean.
>>
>> Second - Almost nobody actually, ever compares "medians". That
>> description is less often accurate than it is an erroneous reference
>> to a test of ranks.
>>
>> Third - The most "non-parametric" way to put a Confidence Interval
>> around the median of a single sample (full sample, here?) is to
>> end up using ranks of scores in the sample to delimit the range.
>> For instance, for a sample of a certain N, the 40th and 60th centiles
>> might determine the scores to mark the 95% CI. There is no strong
>> reason to expect that CI to be symmetrical around the median. If
>> you wanted a "parametric" version of that, I suppose you would use
>> the SD to determine a range. Do you want to pick out the samples
>> whose medians do not fall in that range?
>>
>> --
>> Rich Ulrich
>>
>>> Date: Tue, 21 Aug 2012 01:48:46 -0700
>>> From: vinikalra@
>>> Subject: testing statistical dfference between medians of a sample and a
>>> subsample extracted from the sample
>>> To: SPSSX-L@.UGA
>>>
>>> HI all !
>>>
>>> In a data analysis I am required to perform a statistical test
>>> (parametric)
>>> to know the statistical difference(if significant) between median of 2
>>> sample where one is full sample and another is sub sample extracted from
>>> the
>>> full sample based on a given characteristics (e.g. respondents belonging
>>> to
>>> certain age group).
>>>
>>> Can anyone suggest how togo about it in spss ?
>>>
>>> regards
>> ...
>>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714788.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Zuluaga, Juan

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

In reply to this post by vini

Perhaps it is just an exercise to understand the properties of such methods, useful to convince ourselves of the theoretical appropriateness of it.
:)
BEGIN PROGRAM R.
# read dataset, suppose it is called column x
mydata <- spssdata.GetDataFromSPSS()
fullsample <- mydata$x
median.fullsample <- median(fullsample)
# loop if you want some kind of bootstraping
subsample <- subset( [whatever condition], fullsample)
#
t.test(subsample,mu=median.fullsample)
END PROGRAM.

-----Original Message-----
From: vini [mailto:[hidden email]]
Sent: Tuesday, August 21, 2012 3:49 AM
Subject: testing statistical dfference between medians of a sample and a subsample extracted from the sample

HI all !

In a data analysis I am required to perform a statistical test (parametric) to know the statistical difference(if significant) between median of 2 sample where one is full sample and another is sub sample extracted from the full sample based on a given characteristics (e.g. respondents belonging to certain age group).

Can anyone suggest how togo about it in spss ?

regards
vini

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

In reply to this post by Bruce Weaver

Hi Bruce:

El 22/08/2012 14:49, Bruce Weaver escribió:
> Thanks Marta. Another good article on the same topic is Don Zimmerman's
> simulation study:
>
> http://www.tandfonline.com/doi/abs/10.1207/S15328031US0204_03
>
> Unfortunately, that journal is now defunct, so it can be difficult tracking
> down a PDF. I managed to get one from a colleague at another university,
> where the library still had access to it.

I have been able to get a copy too using my University access to that
journal archives.

When I saw the name (Zimmerman) I realized that the paper I was trying
to remember (where Kruskal-Walis test was also discussed) was by
Zimmerman too (The Journal of General Psychology, 2000,127(4), 354-364
http://dx.doi.org/10.1080/00221300009598589). That narrowed the search
and I was able to find the copy inside the backup drive.

Thanks again and best regards,
Marta

>
> Cheers,
> Bruce
>
>
>
> Marta García-Granero-2 wrote
>> Hi:
>>
>> 1) Bruce, see also Anna Hart. "Mann-Whitnet test is not just a test of
>> medians: differences in spread can be important" (BMJ 2001;323:391-3). I
>> lost track of a reference that stated the same for Kruskal-Wallis test,
>> I'll try to dig it (to many files in my external hard disk).
>>
>> 2) Maybe vinikalra could compute a 95%CI for the median of the
>> subsample, and check if the full sample median is included within the
>> limits.
>>
>> Best regards,
>> Marta GG
>>
>> El 21/08/2012 23:32, Bruce Weaver escribió:
>>> I too was wondering why you wanted a test comparing medians. People
>>> sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U)
>>> compares medians (as opposed to means). But that is only true if the two
>>> populations being compared are identical apart from a shift in location.
>>> And in that case, the test could be said to be comparing means, medians,
>>> or
>>> any other percentile point you might choose.
>>>
>>> By the way, the WMW is quite sensitive to small differences in variance
>>> or
>>> skewness in the populations, which can cause it to reject H0 far too
>>> often
>>> when it is used purely as a test of differences in location. See for
>>> example the nice article by Fagerland & Sandvik (2009).
>>>
>>> http://www.ncbi.nlm.nih.gov/pubmed/19247980
>>>
>>> HTH.
>>>
>>>
>>> Rich Ulrich-2 wrote
>>>> "Parametric" and "median" don't usually go together. See (3)
>>>> for a possible meaning. There are other problems.
>>>>
>>>> First - There is no such thing as a "proper" test for a sub-sample
>>>> versus the whole sample that it comes from. The necessary logic
>>>> says that you compare a sub-sample to the *rest* of the sample.
>>>>
>>>> You may occasionally see a good presentation that does use the
>>>> approximate tests of this sort, for convenience and ease, plus a
>>>> strong desire to accommodate Ns that are unequal. For sub-samples
>>>> with equal Ns, you can use a simple Confidence Interval around the
>>>> overall mean.
>>>>
>>>> Second - Almost nobody actually, ever compares "medians". That
>>>> description is less often accurate than it is an erroneous reference
>>>> to a test of ranks.
>>>>
>>>> Third - The most "non-parametric" way to put a Confidence Interval
>>>> around the median of a single sample (full sample, here?) is to
>>>> end up using ranks of scores in the sample to delimit the range.
>>>> For instance, for a sample of a certain N, the 40th and 60th centiles
>>>> might determine the scores to mark the 95% CI. There is no strong
>>>> reason to expect that CI to be symmetrical around the median. If
>>>> you wanted a "parametric" version of that, I suppose you would use
>>>> the SD to determine a range. Do you want to pick out the samples
>>>> whose medians do not fall in that range?
>>>>
>>>> --
>>>> Rich Ulrich
>>>>
>>>>> Date: Tue, 21 Aug 2012 01:48:46 -0700
>>>>> From: vinikalra@
>>>>> Subject: testing statistical dfference between medians of a sample and
>>>>> a
>>>>> subsample extracted from the sample
>>>>> To: SPSSX-L@.UGA
>>>>>
>>>>> HI all !
>>>>>
>>>>> In a data analysis I am required to perform a statistical test
>>>>> (parametric)
>>>>> to know the statistical difference(if significant) between median of 2
>>>>> sample where one is full sample and another is sub sample extracted
>>>>> from
>>>>> the
>>>>> full sample based on a given characteristics (e.g. respondents
>>>>> belonging
>>>>> to
>>>>> certain age group).
>>>>>
>>>>> Can anyone suggest how togo about it in spss ?
>>>>>
>>>>> regards
>>>> ...
>>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> bweaver@
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714788.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714796.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

Bruce Weaver

Re: testing statistical dfference between medians of a sample and a subsample extracted from the sample

Administrator

In reply to this post by vini

There is still "no such thing as a "proper" test for a sub-sample versus the whole sample that it comes from", as Rich Ulrich said.

It sounds to me like you want a linear regression model with age, education etc (and perhaps some product terms to look at interactions) included in the model. (If all explanatory variables are categorical, you could run it as an ANOVA model instead.)

HTH.

vini wrote

Thanks for your reply. In the light of discussion above, it seems to me that to test the statistical difference between 2 samples' mean would be a better idea and in that case I can go for t-test ( As the data based on field survey, it can 'safely' be considered as normally distributed. ANY COMMENT ?).

And as far as the relevance of comparing a full sample and a sub sample is concerned, the idea is to analysie if a particular sub sample (extracted based on certain parameter e.g. age group, education etc.) has the influence on the full sample .
However, my question was how to go about it in SPSS(steps?) i.e. comparing full sample and sub sample of the full sample. Any suggestions?

regards,
vini