HI all !
In a data analysis I am required to perform a statistical test (parametric) to know the statistical difference(if significant) between median of 2 sample where one is full sample and another is sub sample extracted from the full sample based on a given characteristics (e.g. respondents belonging to certain age group). Can anyone suggest how togo about it in spss ? regards vini |
"Parametric" and "median" don't usually go together. See (3)
for a possible meaning. There are other problems. First - There is no such thing as a "proper" test for a sub-sample versus the whole sample that it comes from. The necessary logic says that you compare a sub-sample to the *rest* of the sample. You may occasionally see a good presentation that does use the approximate tests of this sort, for convenience and ease, plus a strong desire to accommodate Ns that are unequal. For sub-samples with equal Ns, you can use a simple Confidence Interval around the overall mean. Second - Almost nobody actually, ever compares "medians". That description is less often accurate than it is an erroneous reference to a test of ranks. Third - The most "non-parametric" way to put a Confidence Interval around the median of a single sample (full sample, here?) is to end up using ranks of scores in the sample to delimit the range. For instance, for a sample of a certain N, the 40th and 60th centiles might determine the scores to mark the 95% CI. There is no strong reason to expect that CI to be symmetrical around the median. If you wanted a "parametric" version of that, I suppose you would use the SD to determine a range. Do you want to pick out the samples whose medians do not fall in that range? -- Rich Ulrich > Date: Tue, 21 Aug 2012 01:48:46 -0700 > From: [hidden email] > Subject: testing statistical dfference between medians of a sample and a subsample extracted from the sample > To: [hidden email] > > HI all ! > > In a data analysis I am required to perform a statistical test (parametric) > to know the statistical difference(if significant) between median of 2 > sample where one is full sample and another is sub sample extracted from the > full sample based on a given characteristics (e.g. respondents belonging to > certain age group). > > Can anyone suggest how togo about it in spss ? > > regards |
Administrator
|
I too was wondering why you wanted a test comparing medians. People sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U) compares medians (as opposed to means). But that is only true if the two populations being compared are identical apart from a shift in location. And in that case, the test could be said to be comparing means, medians, or any other percentile point you might choose.
By the way, the WMW is quite sensitive to small differences in variance or skewness in the populations, which can cause it to reject H0 far too often when it is used purely as a test of differences in location. See for example the nice article by Fagerland & Sandvik (2009). http://www.ncbi.nlm.nih.gov/pubmed/19247980 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Hi:
1) Bruce, see also Anna Hart. "Mann-Whitnet test is not just a test of medians: differences in spread can be important" (BMJ 2001;323:391-3). I lost track of a reference that stated the same for Kruskal-Wallis test, I'll try to dig it (to many files in my external hard disk). 2) Maybe vinikalra could compute a 95%CI for the median of the subsample, and check if the full sample median is included within the limits. Best regards, Marta GG El 21/08/2012 23:32, Bruce Weaver escribió: > I too was wondering why you wanted a test comparing medians. People > sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U) > compares medians (as opposed to means). But that is only true if the two > populations being compared are identical apart from a shift in location. > And in that case, the test could be said to be comparing means, medians, or > any other percentile point you might choose. > > By the way, the WMW is quite sensitive to small differences in variance or > skewness in the populations, which can cause it to reject H0 far too often > when it is used purely as a test of differences in location. See for > example the nice article by Fagerland & Sandvik (2009). > > http://www.ncbi.nlm.nih.gov/pubmed/19247980 > > HTH. > > > Rich Ulrich-2 wrote >> "Parametric" and "median" don't usually go together. See (3) >> for a possible meaning. There are other problems. >> >> First - There is no such thing as a "proper" test for a sub-sample >> versus the whole sample that it comes from. The necessary logic >> says that you compare a sub-sample to the *rest* of the sample. >> >> You may occasionally see a good presentation that does use the >> approximate tests of this sort, for convenience and ease, plus a >> strong desire to accommodate Ns that are unequal. For sub-samples >> with equal Ns, you can use a simple Confidence Interval around the >> overall mean. >> >> Second - Almost nobody actually, ever compares "medians". That >> description is less often accurate than it is an erroneous reference >> to a test of ranks. >> >> Third - The most "non-parametric" way to put a Confidence Interval >> around the median of a single sample (full sample, here?) is to >> end up using ranks of scores in the sample to delimit the range. >> For instance, for a sample of a certain N, the 40th and 60th centiles >> might determine the scores to mark the 95% CI. There is no strong >> reason to expect that CI to be symmetrical around the median. If >> you wanted a "parametric" version of that, I suppose you would use >> the SD to determine a range. Do you want to pick out the samples >> whose medians do not fall in that range? >> >> -- >> Rich Ulrich >> >>> Date: Tue, 21 Aug 2012 01:48:46 -0700 >>> From: vinikalra@ >>> Subject: testing statistical dfference between medians of a sample and a >>> subsample extracted from the sample >>> To: SPSSX-L@.UGA >>> >>> HI all ! >>> >>> In a data analysis I am required to perform a statistical test >>> (parametric) >>> to know the statistical difference(if significant) between median of 2 >>> sample where one is full sample and another is sub sample extracted from >>> the >>> full sample based on a given characteristics (e.g. respondents belonging >>> to >>> certain age group). >>> >>> Can anyone suggest how togo about it in spss ? >>> >>> regards >> ... >> > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714788.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by vini
Thanks for your reply. In the light of discussion above, it seems to me that to test the statistical difference between 2 samples' mean would be a better idea and in that case I can go for t-test ( As the data based on field survey, it can 'safely' be considered as normally distributed. ANY COMMENT ?).
And as far as the relevance of comparing a full sample and a sub sample is concerned, the idea is to analysie if a particular sub sample (extracted based on certain parameter e.g. age group, education etc.) has the influence on the full sample . However, my question was how to go about it in SPSS(steps?) i.e. comparing full sample and sub sample of the full sample. Any suggestions? regards, vini |
Administrator
|
In reply to this post by Marta Garcia-Granero
Thanks Marta. Another good article on the same topic is Don Zimmerman's simulation study:
http://www.tandfonline.com/doi/abs/10.1207/S15328031US0204_03 Unfortunately, that journal is now defunct, so it can be difficult tracking down a PDF. I managed to get one from a colleague at another university, where the library still had access to it. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by vini
Perhaps it is just an exercise to understand the properties of such methods, useful to convince ourselves of the theoretical appropriateness of it.
:) BEGIN PROGRAM R. # read dataset, suppose it is called column x mydata <- spssdata.GetDataFromSPSS() fullsample <- mydata$x median.fullsample <- median(fullsample) # loop if you want some kind of bootstraping subsample <- subset( [whatever condition], fullsample) # t.test(subsample,mu=median.fullsample) END PROGRAM. -----Original Message----- From: vini [mailto:[hidden email]] Sent: Tuesday, August 21, 2012 3:49 AM Subject: testing statistical dfference between medians of a sample and a subsample extracted from the sample HI all ! In a data analysis I am required to perform a statistical test (parametric) to know the statistical difference(if significant) between median of 2 sample where one is full sample and another is sub sample extracted from the full sample based on a given characteristics (e.g. respondents belonging to certain age group). Can anyone suggest how togo about it in spss ? regards vini -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
Hi Bruce:
El 22/08/2012 14:49, Bruce Weaver escribió: > Thanks Marta. Another good article on the same topic is Don Zimmerman's > simulation study: > > http://www.tandfonline.com/doi/abs/10.1207/S15328031US0204_03 > > Unfortunately, that journal is now defunct, so it can be difficult tracking > down a PDF. I managed to get one from a colleague at another university, > where the library still had access to it. I have been able to get a copy too using my University access to that journal archives. When I saw the name (Zimmerman) I realized that the paper I was trying to remember (where Kruskal-Walis test was also discussed) was by Zimmerman too (The Journal of General Psychology, 2000,127(4), 354-364 http://dx.doi.org/10.1080/00221300009598589). That narrowed the search and I was able to find the copy inside the backup drive. Thanks again and best regards, Marta > > Cheers, > Bruce > > > > Marta García-Granero-2 wrote >> Hi: >> >> 1) Bruce, see also Anna Hart. "Mann-Whitnet test is not just a test of >> medians: differences in spread can be important" (BMJ 2001;323:391-3). I >> lost track of a reference that stated the same for Kruskal-Wallis test, >> I'll try to dig it (to many files in my external hard disk). >> >> 2) Maybe vinikalra could compute a 95%CI for the median of the >> subsample, and check if the full sample median is included within the >> limits. >> >> Best regards, >> Marta GG >> >> El 21/08/2012 23:32, Bruce Weaver escribió: >>> I too was wondering why you wanted a test comparing medians. People >>> sometimes assume that the Wilcoxon-Mann-Whitney test (aka Mann-Whitney U) >>> compares medians (as opposed to means). But that is only true if the two >>> populations being compared are identical apart from a shift in location. >>> And in that case, the test could be said to be comparing means, medians, >>> or >>> any other percentile point you might choose. >>> >>> By the way, the WMW is quite sensitive to small differences in variance >>> or >>> skewness in the populations, which can cause it to reject H0 far too >>> often >>> when it is used purely as a test of differences in location. See for >>> example the nice article by Fagerland & Sandvik (2009). >>> >>> http://www.ncbi.nlm.nih.gov/pubmed/19247980 >>> >>> HTH. >>> >>> >>> Rich Ulrich-2 wrote >>>> "Parametric" and "median" don't usually go together. See (3) >>>> for a possible meaning. There are other problems. >>>> >>>> First - There is no such thing as a "proper" test for a sub-sample >>>> versus the whole sample that it comes from. The necessary logic >>>> says that you compare a sub-sample to the *rest* of the sample. >>>> >>>> You may occasionally see a good presentation that does use the >>>> approximate tests of this sort, for convenience and ease, plus a >>>> strong desire to accommodate Ns that are unequal. For sub-samples >>>> with equal Ns, you can use a simple Confidence Interval around the >>>> overall mean. >>>> >>>> Second - Almost nobody actually, ever compares "medians". That >>>> description is less often accurate than it is an erroneous reference >>>> to a test of ranks. >>>> >>>> Third - The most "non-parametric" way to put a Confidence Interval >>>> around the median of a single sample (full sample, here?) is to >>>> end up using ranks of scores in the sample to delimit the range. >>>> For instance, for a sample of a certain N, the 40th and 60th centiles >>>> might determine the scores to mark the 95% CI. There is no strong >>>> reason to expect that CI to be symmetrical around the median. If >>>> you wanted a "parametric" version of that, I suppose you would use >>>> the SD to determine a range. Do you want to pick out the samples >>>> whose medians do not fall in that range? >>>> >>>> -- >>>> Rich Ulrich >>>> >>>>> Date: Tue, 21 Aug 2012 01:48:46 -0700 >>>>> From: vinikalra@ >>>>> Subject: testing statistical dfference between medians of a sample and >>>>> a >>>>> subsample extracted from the sample >>>>> To: SPSSX-L@.UGA >>>>> >>>>> HI all ! >>>>> >>>>> In a data analysis I am required to perform a statistical test >>>>> (parametric) >>>>> to know the statistical difference(if significant) between median of 2 >>>>> sample where one is full sample and another is sub sample extracted >>>>> from >>>>> the >>>>> full sample based on a given characteristics (e.g. respondents >>>>> belonging >>>>> to >>>>> certain age group). >>>>> >>>>> Can anyone suggest how togo about it in spss ? >>>>> >>>>> regards >>>> ... >>>> >>> >>> >>> ----- >>> -- >>> Bruce Weaver >>> bweaver@ >>> http://sites.google.com/a/lakeheadu.ca/bweaver/ >>> >>> "When all else fails, RTFM." >>> >>> NOTE: My Hotmail account is not monitored regularly. >>> To send me an e-mail, please use the address shown above. >>> >>> -- >>> View this message in context: >>> http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714788.html >>> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/testing-statistical-dfference-between-medians-of-a-sample-and-a-subsample-extracted-from-the-sample-tp5714777p5714796.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by vini
There is still "no such thing as a "proper" test for a sub-sample versus the whole sample that it comes from", as Rich Ulrich said.
It sounds to me like you want a linear regression model with age, education etc (and perhaps some product terms to look at interactions) included in the model. (If all explanatory variables are categorical, you could run it as an ANOVA model instead.) HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |