SPSSX Discussion

Re: Sample Means

Posted by statisticsdoc on
URL: http://spssx-discussion.165.s1.nabble.com/Sample-Means-tp1072828p1072840.html

Stephen Brand
www.statisticsdoc.com

Jan,

I think there is still some confusion about the research question that Samir is trying to answer, and even the sampling design. I will try to cover the various possibilities.

Your posting is quite correct if we assume that there are only 500 students in the population of 1,000 cases - i.e. the 1000 cases are made of 500 students and 500 non-students). If in fact the larger population of 1000 cases contains only 500 students, then there is no need to utilize inferential statistics - there is nothing to infer. There is no null hypothesis to test, both means are constants, and what you say is correct. Samir may still be interested in knowing whether the difference between two subpopulation means is interesting and meaningful, but that is not a question of statistical significance in the sense of testing an inference about population paramters from sample statistics. As another poster pointed out, computing the effect size would be informative.

On the other hand, if the 500 students comprise a sample of students that was drawn in some way from a larger population, the additional procedures are justified (either the z-statistic or the one-sample t-test).

On possibility is that the 500 students were a sample drawn from the population of 1,000. That is, there are 1,000 students, and Samir has drawn a subsample of 500 of them. Samir may be interested in knowing whether the sampling process that he used was unbiased. Assuming that he knows not only the mean but the population standard deviation from the population of 1,000, he can compute the distribution of sampling means with n=500 and apply the z-statistic the calculate the likelihood of obtaining the observed sample mean if the sampling process was random and unbiased.

Another possibility is that the sample 500 cases were drawn from some population other than the 1,000 cases. This is the scenario that I had in mind when I posted that the one-sample t-test would be justified. In this instance, Samir would be interested in testing the hypothesis that the sample of 500 students was drawn from a population whose mean was equal to the mean of the population of 1,000 cases.

My conclusion is that Richard and I are both right, and so are you.

Cheers,

Stephen Brand

P.S. I think that I might use this example as an extra-credit question on my next stats exam :)

Jan Spousta Wrote:

Now it is my turn to support Richard a bit :-)

If the 1,000 cases were the whole available population and 500 of them were students, the the one-sample procedure would be still _unjustified_, because then both the population mean and the subpoplation mean are constants and it is nonsense to test the difference between two constants. If the two are different, then the difference is always "significant" in the exact meaning of the word.

The interesting case is when the sampled population is _almost_ the whole available population (e.g. five students from the 500 are missing) , but then the statistics starts to be rather complicated and you still cannot use the "standard" techniques under Compare Means in SPSS. Ask Marta, she will tell you...

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Statisticsdoc
Sent: Tuesday, December 19, 2006 10:42 PM
To: [hidden email]
Subject: Sample Means

Stephen Brand
www.statisticsdoc.com

Richard,

Thanks for citing me on both sides of this discussion :) Let me say a
little more about why I would accept that 1,000 cases can constitute a population, and under what conditions.

It is not too hard to imagine population definitions that encompass small numbers of people (e.g., all of the left-handed residents of the town of Exeter, Rhode Island; the Fall 2006 intake of a small college).

The question of whether you accept that 1,000 cases make up a population depends on the definition of the population. If these 1,000 cases are all of the potential members of the population, then the mean of those cases constitutes the population mean. Whatever random processes might have influence the mean score of that population, that score is the population parameter. We are not trying to estimate a parameter of a wider population from which we have obtained the 1,000 cases. In this instance, the one-sample procedure is justified.

Granted, you might say that the left-handed residents of Exeter, or the 2006 intake of a small college, constitute a sub-set of your population of interest, but then I think that you have to allow that these cases do not exhaust the potential membership of the population (which might constitute the left-handed population of Rhode Island, or the various cohorts of potential incoming first year students), and then your means
become sample statistics, not population parameters. BTW, in this
instance, the Exeter sample is not a very random one :)

It all depends on where the boundaries of the population are drawn.

Best,

Stephen Brand

Stephen Brand

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Tuesday, December 19, 2006 2:59 PM
To: [hidden email]
Subject: Re: Significant difference - Means

To weigh in with two comments:

At 03:54 AM 12/19/2006, Spousta Jan wrote:

>The error of that 3.5 is about sqrt(1/1000) = 0,03 while the error of
>2.9 for students is about sqrt(1/500) = 0.045. That is both errors are
>of the same order of magnitude and the population error cannot be
>neglected in this case.

I'd like to second, and emphasize, this. Jan is clearly right here, where the two groups are the same size. However, the same thing holds when the sizes are quite different.

First, the t-test algorithm correctly allows for the increased precision in measuring the mean in the larger group. Replacing it by a constant only 'gains' you a little precision you don't really have.

Second, inequality of group size matters less than one might think. Roughly, precision goes as the square root of sample size. (Under 'nice' conditions, that's exact: standard error of estimate goes as the square root of sample size.) That means increasing the sample size ten-fold leaves the SEE still 1/3 of the size it had - quite a long way from letting it be considered a constant.

And at 10:42 AM 12/19/2006, Statisticsdoc (Stephen Brand) wrote:

>If your population consists of the 1000 students, then the mean of 3.5
>is a population parameter, and you would be justified is using the
>one-sample t-test suggested by John.

(This won't be quite fair to Stephen Brand, who'd also written "Formally one should test the null hypothesis that the two samples have the same mean, by using the independent groups t-test.")

There's a philosophical position, which I agree with, that will hardly ever accept something like "[my] population consists of the 1000 students." The argument is that, even if those 1,000 students are all you've ever seen or ever will see, their observed values constitute a set generated by an underlying random mechanism, and that randomness must be allowed for in estimation exactly as if you were aware of 100,000 similar students.

('Generated by an underlying random mechanism' is sometimes expressed as 'drawn from a conceptually infinite population.' However, while this is technically accurate, I don't blame anyone who considers a 'conceptually infinite population' a very odd notion.)

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.25/593 - Release Date: 12/19/2006 1:17 PM

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com