|
In SPSS? It automatically assumes a sample and I do not know how to get SPSS to give me a standard deviation for a population. Many thanks, in advance.
Nancy |
|
Administrator
|
Here's one way to get it. 1. Use AGGREGATE to write the sample SD and N for the variable of interest to the working data file. 2. Compute SS = SD^2 x (n-1) 3. Compute Pop. Variance = SS/n 4. Compute Pop. SD = SQRT(Pop. Variance) E.g., data list free / x (f2.0) . begin data 2 5 4 9 8 7 4 3 1 end data. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /ssd_x=SD(x) /n = nu(x). compute #svarx = ssd_x**2 . /* sample variance of X . compute #SS_x = #svarx * (n-1) . /* SS for X . compute #pvarx = #SS_x / n . /* population variance of X . compute psd_x = SQRT(#pvarx). /* population SD . formats ssd_x psd_x (f8.3). var lab ssd_x 'Sample SD of X' psd_x 'Population SD of X' . descrip ssd_x psd_x / stat = mean.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Nancy Rusinak
Hi Nancy You can do this by weighting your cases. Suppose you have
N cases. COMPUTE a new variable equal to N/N-1. WEIGHT the cases by w. Then run your DESCRIPTIVES command and the reported SD is the
population SD you need. Garry Gelade Business Analytic Ltd. From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of Nancy Rusinak In SPSS? It automatically assumes a sample and I do
not know how to get SPSS to give me a standard deviation for a
population. Many thanks, in advance. __________ Information from ESET NOD32 Antivirus, version of virus signature database 4794 (20100121) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com |
|
In reply to this post by Bruce Weaver
I am not sure about Bruce's formula. The sample SD is SS divided by n. The
ESTIMATE of the population SD, based in the sample, is SS divided by n-1. If I'm right about this, the SPSS SD should be multiplied by n and divided by n-1 to get the estimated population SD. On the other hand, if Nancy's dataset represents itself the whole of the population (e.g. a census), perhaps Nancy believes that the population variance should be computed directly. But that would not be right: (a) The directly computed population variance is SS/n, just as in a sample (a population is a sample of size n=N, with a sampling ratio of 1:1. (b) On the other hand, a measurement of a population (e.g. a census) is just one sample measurement (out of many measurements you can take, with different census takers or at different times of day, etc.), and therefore even a census is a sample, with a standard error of the estimate (possibly quite small unless the census takers are very sloppy). From this viewpoint, again the SD of the one (sampled) take of the census is SS/n; the estimate of the SD for the whole "population" of various possible measurements of the same (human) population would be SS/(n-1) which in this case is also SS/(N-1). Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: 21 January 2010 18:57 To: [hidden email] Subject: Re: Can anyone help me get a population standard deviation? Nancy Rusinak wrote: > > In SPSS? It automatically assumes a sample and I do not know how to get > SPSS to give me a standard deviation for a population. Many thanks, in > advance. > > Nancy > > Here's one way to get it. 1. Use AGGREGATE to write the sample SD and N for the variable of interest to the working data file. 2. Compute SS = SD^2 x (n-1) 3. Compute Pop. Variance = SS/n 4. Compute Pop. SD = SQRT(Pop. Variance) E.g., data list free / x (f2.0) . begin data 2 5 4 9 8 7 4 3 1 end data. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /ssd_x=SD(x) /n = nu(x). compute #svarx = ssd_x**2 . /* sample variance of X . compute #SS_x = #svarx * (n-1) . /* SS for X . compute #pvarx = #SS_x / n . /* population variance of X . compute psd_x = SQRT(#pvarx). /* population SD . formats ssd_x psd_x (f8.3). var lab ssd_x 'Sample SD of X' psd_x 'Population SD of X' . descrip ssd_x psd_x / stat = mean. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://old.nabble.com/Can-anyone-help-me-get-a-population-standard-deviation --tp27263817p27265309.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
Hector, FWIW, Excel gives the same results I obtained with SPSS:
2.728 from STDEV (with division by n-1) 2.572 from STDEVP (with division by N) I don't have time to try Gerry's weighting suggestion right now, but off the top of my head, I think that it will work for variances, but not standard deviations.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
In reply to this post by Garry Gelade
Hi Garry. First, I apologize for misspelling your name in my other post. Second, SPSS always divides by n-1 when computing a variance, so your weight would have to be (n-1)/n. But even then, it doesn't work, as far as I can tell. It is the squared deviation scores that would have to be so weighted, not the raw scores themselves. I.e., 1. Use AGGREGATE to add the mean of X to the file 2. Compute the squared deviation from the mean for each score 3. let W = (n-1)/n, and weight by W 4. Use DESCRIPTIVES (or MEANS) to get the mean of the squared deviation score from step 2 compute w = (n-1)/n. weight by w. descrip sqdev / stat = mean . This weighted mean of squared deviations = the population variance (with division by N).
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Bruce Weaver
Bruce,
According to many standard textbooks, such as the classic H.Blalock "Social Statistics", the sample standard deviation is s= sqrt(SS/n) (equation 6.3), but a footnote specifies: "Some texts define s with n-1 in the denominator instead of n. We shall later define delta=sqrt(SS/(n-1)) with delta being an unbiased estimate of sigma [the population SD] for RANDOM samples." (note 1 of section 6.4, 1980 edition). Blalock notes that some authors present directly s with n-1 in the denominator, but he prefers using n, and then noting the bias of that formula before introducing the correction in the denominator in order to get an unbiased estimate of the population SD. The difference is of course significative only for small samples. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: 21 January 2010 20:44 To: [hidden email] Subject: Re: Can anyone help me get a population standard deviation? Hector, FWIW, Excel gives the same results I obtained with SPSS: 2.728 from STDEV (with division by n-1) 2.572 from STDEVP (with division by N) I don't have time to try Gerry's weighting suggestion right now, but off the top of my head, I think that it will work for variances, but not standard deviations. Hector Maletta wrote: > > I am not sure about Bruce's formula. The sample SD is SS divided by n. The > ESTIMATE of the population SD, based in the sample, is SS divided by n-1. > If > I'm right about this, the SPSS SD should be multiplied by n and divided by > n-1 to get the estimated population SD. > On the other hand, if Nancy's dataset represents itself the whole of the > population (e.g. a census), perhaps Nancy believes that the population > variance should be computed directly. But that would not be right: > (a) The directly computed population variance is SS/n, just as in a sample > (a population is a sample of size n=N, with a sampling ratio of 1:1. > (b) On the other hand, a measurement of a population (e.g. a census) is > just > one sample measurement (out of many measurements you can take, with > different census takers or at different times of day, etc.), and therefore > even a census is a sample, with a standard error of the estimate (possibly > quite small unless the census takers are very sloppy). From this > viewpoint, > again the SD of the one (sampled) take of the census is SS/n; the estimate > of the SD for the whole "population" of various possible measurements of > the > same (human) population would be SS/(n-1) which in this case is also > SS/(N-1). > Hector > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Bruce Weaver > Sent: 21 January 2010 18:57 > To: [hidden email] > Subject: Re: Can anyone help me get a population standard deviation? > > Nancy Rusinak wrote: >> >> In SPSS? It automatically assumes a sample and I do not know how to get >> SPSS to give me a standard deviation for a population. Many thanks, in >> advance. >> >> Nancy >> >> > > Here's one way to get it. > > 1. Use AGGREGATE to write the sample SD and N for the variable of interest > to the working data file. > 2. Compute SS = SD^2 x (n-1) > 3. Compute Pop. Variance = SS/n > 4. Compute Pop. SD = SQRT(Pop. Variance) > > E.g., > > data list free / x (f2.0) . > begin data > 2 5 4 9 8 7 4 3 1 > end data. > > AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK= > /ssd_x=SD(x) > /n = nu(x). > > compute #svarx = ssd_x**2 . /* sample variance of X . > compute #SS_x = #svarx * (n-1) . /* SS for X . > compute #pvarx = #SS_x / n . /* population variance of X . > compute psd_x = SQRT(#pvarx). /* population SD . > formats ssd_x psd_x (f8.3). > var lab > ssd_x 'Sample SD of X' > psd_x 'Population SD of X' > . > descrip ssd_x psd_x / stat = mean. > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > -- > View this message in context: > > --tp27263817p27265309.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://old.nabble.com/Can-anyone-help-me-get-a-population-standard-deviation --tp27263817p27266244.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
OK, I see that we are just having a problem with terminology. Here is how I understood the original question.
1. SPSS computes the SD with n-1 in the denominator. 2. Nancy wants a standard deviation with n in the denominator. I showed a way to do that. It now occurs to me that I took an extremely scenic route to the answer--Rube Goldberg would be very proud of that code. Here's a much more direct way. data list free / x (f2.0) . begin data 2 5 4 9 8 7 4 3 1 end data. * SPSS gives the sample SD -- i.e., denominator is n-1, not N . * Use AGGREGATE to get SD and N for X. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /s_x=SD(x) /n = nu(x). compute sigma_x = s_x * SQRT((n-1)/n). format s_x sigma_x (f5.3). descrip x s_x sigma_x. Regarding the issue of terminology, I respectfully disagree with that excerpt from Blalock. I have always been taught to use the terminology as it is given on the Wikipedia page on the standard deviation: http://en.wikipedia.org/wiki/Standard_deviation The population variance has N as the denominator, and is only computed if one has the entire population of scores (which occurs only rarely). The sample variance, on the other hand, uses n-1 as the denominator, and is used when you have a sample from some population. For random samples, the sample variance is an unbiased estimator of the population variance. Taking the square roots of those variances yields the corresponding standard deviations. But note that the sample SD (with n-1 in the denominator) is not an unbiased estimator of the population SD. The Wikipedia page says, "s is not an unbiased estimator for the standard deviation [sigma]; it tends to underestimate the population standard deviation". There is another Wikipedia page on "unbiased estimation of the standard deviation" here: http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Both approaches are right. The sample SD divided by n is just the average
squared deviation. It is, however, shown that this is a biased estimate of the population SD because one degree of freedom has been used up already when computing the mean (a prior step for computing SD). Thence the correction of the denominator to n-1 in order to estimate the pop SD. Now, with the theoretical (and usually unknown) population SD, dubbed sigma, the correct denominator is N, not N-1 (where n is sample size and N pop size). This is because the mean and the SD are simply the first and second order moments of the variable distribution (the kth moment of a variable centred on its mean uses the variable's deviation raised to the kth power). For instance, the skewness and kurtosis of a distribution depend on the third and fifth moments. All those moments are averages of deviations, raised to the kth power, and therefore divided by N. Regarding the case of data covering the whole population, as I explained before, if the variable is a random variable itself, even a full population enumeration is still a "sample" from the universe of possible measurements, and affected by random error of measurement. Therefore an unbiased ESTIMATE of the true pop SD required dividing by n-1, including the extreme case of n=N. Since nothing is excempt from random error measurement, not even in Physics (let alone Social Sciences), all unbiased estimates of the true SD use n-1 in the denominator. In analysis of variance, too, total variance is computed with n-1 degrees of freedom, including k degrees of freedom for the k groups determined by factors, interactions and covariates, and n-k-1 to the residual variance, for a total of n-1 DF. The remaining degree of freedom has been used up by the mean. Thus the essential distinction is between the "true" pop SD (divided by N) and the "unbiased estimate" of that pop SD based on a sample (divided by N-1). For the sample, its "intrinsic" or "descriptive" SD is SS divided by n, giving the SD of sample units around sample mean, but for inferential purposes (unbiased estimate of pop SD) the denominator should be n-1. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: 22 January 2010 12:31 To: [hidden email] Subject: Re: Can anyone help me get a population standard deviation? OK, I see that we are just having a problem with terminology. Here is how I understood the original question. 1. SPSS computes the SD with n-1 in the denominator. 2. Nancy wants a standard deviation with n in the denominator. I showed a way to do that. It now occurs to me that I took an extremely scenic route to the answer--Rube Goldberg would be very proud of that code. Here's a much more direct way. data list free / x (f2.0) . begin data 2 5 4 9 8 7 4 3 1 end data. * SPSS gives the sample SD -- i.e., denominator is n-1, not N . * Use AGGREGATE to get SD and N for X. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /s_x=SD(x) /n = nu(x). compute sigma_x = s_x * SQRT((n-1)/n). format s_x sigma_x (f5.3). descrip x s_x sigma_x. Regarding the issue of terminology, I respectfully disagree with that excerpt from Blalock. I have always been taught to use the terminology as it is given on the Wikipedia page on the standard deviation: http://en.wikipedia.org/wiki/Standard_deviation The population variance has N as the denominator, and is only computed if one has the entire population of scores (which occurs only rarely). The sample variance, on the other hand, uses n-1 as the denominator, and is used when you have a sample from some population. For random samples, the sample variance is an unbiased estimator of the population variance. Taking the square roots of those variances yields the corresponding standard deviations. But note that the sample SD (with n-1 in the denominator) is not an unbiased estimator of the population SD. The Wikipedia page says, "s is not an unbiased estimator for the standard deviation [sigma]; it tends to underestimate the population standard deviation". There is another Wikipedia page on "unbiased estimation of the standard deviation" here: http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation Cheers, Bruce Hector Maletta wrote: > > Bruce, > According to many standard textbooks, such as the classic H.Blalock > "Social > Statistics", the sample standard deviation is s= sqrt(SS/n) (equation > 6.3), > but a footnote specifies: "Some texts define s with n-1 in the denominator > instead of n. We shall later define delta=sqrt(SS/(n-1)) with delta being > an > unbiased estimate of sigma [the population SD] for RANDOM samples." (note > 1 > of section 6.4, 1980 edition). Blalock notes that some authors present > directly s with n-1 in the denominator, but he prefers using n, and then > noting the bias of that formula before introducing the correction in the > denominator in order to get an unbiased estimate of the population SD. > The difference is of course significative only for small samples. > > Hector > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Bruce Weaver > Sent: 21 January 2010 20:44 > To: [hidden email] > Subject: Re: Can anyone help me get a population standard deviation? > > Hector, FWIW, Excel gives the same results I obtained with SPSS: > > 2.728 from STDEV (with division by n-1) > 2.572 from STDEVP (with division by N) > > I don't have time to try Gerry's weighting suggestion right now, but off > the > top of my head, I think that it will work for variances, but not standard > deviations. > > > > Hector Maletta wrote: >> >> I am not sure about Bruce's formula. The sample SD is SS divided by n. >> The >> ESTIMATE of the population SD, based in the sample, is SS divided by n-1. >> If >> I'm right about this, the SPSS SD should be multiplied by n and divided >> by >> n-1 to get the estimated population SD. >> On the other hand, if Nancy's dataset represents itself the whole of the >> population (e.g. a census), perhaps Nancy believes that the population >> variance should be computed directly. But that would not be right: >> (a) The directly computed population variance is SS/n, just as in a >> sample >> (a population is a sample of size n=N, with a sampling ratio of 1:1. >> (b) On the other hand, a measurement of a population (e.g. a census) is >> just >> one sample measurement (out of many measurements you can take, with >> different census takers or at different times of day, etc.), and >> therefore >> even a census is a sample, with a standard error of the estimate >> (possibly >> quite small unless the census takers are very sloppy). From this >> viewpoint, >> again the SD of the one (sampled) take of the census is SS/n; the >> estimate >> of the SD for the whole "population" of various possible measurements of >> the >> same (human) population would be SS/(n-1) which in this case is also >> SS/(N-1). >> Hector >> -----Original Message----- >> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >> Bruce Weaver >> Sent: 21 January 2010 18:57 >> To: [hidden email] >> Subject: Re: Can anyone help me get a population standard deviation? >> >> Nancy Rusinak wrote: >>> >>> In SPSS? It automatically assumes a sample and I do not know how to get >>> SPSS to give me a standard deviation for a population. Many thanks, in >>> advance. >>> >>> Nancy >>> >>> >> >> Here's one way to get it. >> >> 1. Use AGGREGATE to write the sample SD and N for the variable of >> interest >> to the working data file. >> 2. Compute SS = SD^2 x (n-1) >> 3. Compute Pop. Variance = SS/n >> 4. Compute Pop. SD = SQRT(Pop. Variance) >> >> E.g., >> >> data list free / x (f2.0) . >> begin data >> 2 5 4 9 8 7 4 3 1 >> end data. >> >> AGGREGATE >> /OUTFILE=* MODE=ADDVARIABLES >> /BREAK= >> /ssd_x=SD(x) >> /n = nu(x). >> >> compute #svarx = ssd_x**2 . /* sample variance of X . >> compute #SS_x = #svarx * (n-1) . /* SS for X . >> compute #pvarx = #SS_x / n . /* population variance of X . >> compute psd_x = SQRT(#pvarx). /* population SD . >> formats ssd_x psd_x (f8.3). >> var lab >> ssd_x 'Sample SD of X' >> psd_x 'Population SD of X' >> . >> descrip ssd_x psd_x / stat = mean. >> >> >> >> ----- >> -- >> Bruce Weaver >> [hidden email] >> http://sites.google.com/a/lakeheadu.ca/bweaver/ >> "When all else fails, RTFM." >> >> NOTE: My Hotmail account is not monitored regularly. >> To send me an e-mail, please use the address shown above. >> -- >> View this message in context: >> > >> --tp27263817p27265309.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > -- > View this message in context: > > --tp27263817p27266244.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://old.nabble.com/Can-anyone-help-me-get-a-population-standard-deviation --tp27263817p27274958.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hector Maletta wrote:
> Both approaches are right. The sample SD divided by n is just the average > squared deviation. It is, however, shown that this is a biased estimate of > the population SD because one degree of freedom has been used up already > when computing the mean (a prior step for computing SD). Thence the > correction of the denominator to n-1 in order to estimate the pop SD. Now, > with the theoretical (and usually unknown) population SD, dubbed sigma, the > correct denominator is N, not N-1 (where n is sample size and N pop size). > This is because the mean and the SD are simply the first and second order > moments of the variable distribution (the kth moment of a variable centred > on its mean uses the variable's deviation raised to the kth power). For > instance, the skewness and kurtosis of a distribution depend on the third > and fifth moments. Best regards and happy weekend. Marta -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks, Marta for correcting my involuntary lapsus mentis.
I take it you agree with the rest. Hector Marta Garcia Granero wrote: A minor clarification. Kurtosis depends on the FOURTH moment, not the fifth Best regards and happy weekend. Marta -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of García-Granero Sent: 22 January 2010 13:25 To: [hidden email] Subject: Re: Can anyone help me get a population standard deviation? Hector Maletta wrote: > Both approaches are right. The sample SD divided by n is just the average > squared deviation. It is, however, shown that this is a biased estimate of > the population SD because one degree of freedom has been used up already > when computing the mean (a prior step for computing SD). Thence the > correction of the denominator to n-1 in order to estimate the pop SD. Now, > with the theoretical (and usually unknown) population SD, dubbed sigma, the > correct denominator is N, not N-1 (where n is sample size and N pop size). > This is because the mean and the SD are simply the first and second order > moments of the variable distribution (the kth moment of a variable centred > on its mean uses the variable's deviation raised to the kth power). For > instance, the skewness and kurtosis of a distribution depend on the third > and fifth moments. -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hector Maletta
I’d be a biy dubious about any analysis which provided such an estimate But Blackelock says some sociologists DO prefer the biased estimate We wouldn’t want to interfere with these sociologists’ freedom of choice, if they do exist. Just as long as they tell us what they are doing and provide full information, including N in their report Best Diana Professor Diana Kornbrot email: d.e.kornbrot@... web: http://web.me.com/kornbrot/KornbrotHome.html Work School of Psychology University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK voice: +44 (0) 170 728 4626 fax: +44 (0) 170 728 5073 Home 19 Elmhurst Avenue London N2 0LT, UK voice: +44 (0) 208 883 3657 mobile: +44 (0) 796 890 2102 fax: +44 (0) 870 706 4997 |
|
I concur, as long as data are considered
as sample results and are used for inferential purposes. However, the
distinction is irrelevant for most practical purposes, except for EXTREMELY
small samples. Dividing a SS of, say, 0.5 or 5 by 100 or by 99 usually results
in the same value down to the third decimal at least. The difference becomes
noticeable if your sample is, say, smaller than 30 cases or so. With SS=0.5,
dividing by 30 or by 29 gives you respectively 0.166 or 1.72, and the standard
error of the estimate (dividing SD by the sqrt of n) would be respectively 0.0030
and 0.0032, a difference hardly important for most purposes although somewhat
noticeable. Now, if you have a sample of 10 or 15 cases, beware of the
denominator you use. Hector From: kornbrot Who wants a biased estimate of
population SD? Professor Diana
Kornbrot |
|
Administrator
|
In reply to this post by Hector Maletta
Hi Hector. You are still saying that a SD computed using a random sample and with n-1 in the denominator provides an unbiased estimate of the "true" population SD. This is not correct--it underestimates the population SD. See the Background section of the second Wikipedia page I gave in my previous post. http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation#Background Note especially this bit: "The use of n − 1 instead of n in the formula for the sample variance is known as Bessel's correction, which corrects the bias in the estimation of the sample variance, and some, but not all of the bias in the estimation of the sample standard deviation." As for why virtually no one uses an unbiased version of the sample SD, I suspect the answer is that it's considerably more complicated than just taking the square root of the sample variance, and for most applications, the difference doesn't matter that much. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
In reply to this post by Kornbrot, Diana
Diana, most of us use a biased estimate of the population SD all the time. See the page I mentioned in my response to Hector. http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation#Background From that site: --- start excerpt --- The use of n − 1 instead of n in the formula for the sample variance is known as Bessel's correction, which corrects the bias in the estimation of the sample variance, and some, but not all of the bias in the estimation of the sample standard deviation. It is not possible to find an estimate of the standard deviation which is unbiased for all population distributions, as the bias depends on the particular distribution. Much of the following relates to estimation assuming a normal distribution. --- end excerpt --- Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Hector Maletta
In a previous message to this thread I
referred to the case when the entire population is measured (e.g. in a census).
That measurement is but one of the many measurements that could be taken with
random variations (different census takers, different time of day, different
household member being interviewed as main informant, different data entry workers,
etc). If several such measurements are made of the same objects (say, a national
population census or the measurement of the diameters of all buttons in a large
jar) slightly different results would come out for the mean of any variable (e.g.
mean age in a human population, or mean diameter for buttons) due to random differences
in the individual measurements of the various persons or buttons. Thus even a census is to be considered a “sample”
drawn from the “universe” of possible such “samples” of
size n=N. The same is valid for a smaller sample. Random error involved in passing from
sample to universe can be seen as composed of two main parts: (1) the random selection
of a particular subset of n cases (n<N) and (2) random error incurred while studying
that subset with a certain group of interviewers, entering the data with a
definite group of data punchers and other random differences between samples,
even if two samples are composed of the same subset of cases. The standard
error of an estimate, i.e. SD/n^(1/2), would be probably very small for n
approaching N or =N, but would very rarely be zero. The first component
(particular subset) will disappear with n=N, but not the second one. The SD of a
variable in the census divided by sqrt(N-1) gives an unbiased estimate of this
standard error of census results (notice that for not very small N, subtracting
or not subtracting 1 in the denominator would cause no perceptible difference
in that std error). The standard error of population estimates derived from one
single random sample exists for all sample sizes, from 1 to N, although for
large N it is probably very small (imagine dividing an SD of, say, age (an SD probably
not larger than 15 years) by the SQRT of population of the entire country (which
is probably millions). For a relatively small country of 25 million people, the
SQRT is 5000 and the std error would be 15/5000=0.003 years, or about 1 day of
age (26 hours). For a mean age of 30 years, this would be about 0.000027% of
the mean. For a large country like the One may want to consider random
measurement error as distinct from “pure” sampling error (i.e.
defining sampling error by just the first component) but this is a matter of
words that can hardly make the second part disappear. In fact, when one infers
population values from a sample, the error involved is a conflation of the two
components. One may further distinguish between “intrinsic”
measurement error (e.g. related to the precision of the type of instrument used
for the measurement) and “sample-related” measurement error related
to the particular circumstances of the actual measurement (choice of particular
interviewers and data-entry workers, identity of the informant, time of day,
etc.). The examples above do not include the intrinsic imprecision of the
instruments (meter, questionnaire), because the instrument is supposed to be
the same for all, whatever its intrinsic precision. Hector From: James Parry
|
|
In reply to this post by Bruce Weaver
http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation#Background Actually, most of us don’t use estimates of the sd most of the time at all. Most inferential estimates are based on unbiased estimates of variance, not biased estimates of SD. But its a salutary reminder that when one does tests of heterogeneity of variance they are based on mean variance, not mean SD. Hence if one is testing hypothesis that one group is more variable than another, then mean variances for each group rather than mean SDs should be provided. Similaelyy, when people use the currently fashionable confidence limits, they are based on a t-distribution for se = SD√n (presumably that’s unbiased? Bruce). NEW POINT Having read the wikipaedial article, I wondered it it is ALWAYS true that families of dsitribtuions other than the normal have mean and variance not independent. It certainly true of many families, but ALL? Best Diana On 22/01/2010 18:24, "Bruce Weaver" <bruce.weaver@...> wrote: kornbrot wrote: Professor Diana Kornbrot email:? d.e.kornbrot@... web: http://web.me.com/kornbrot/KornbrotHome.html Work School of Psychology University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK voice: +44 (0) 170 728 4626 fax: +44 (0) 170 728 5073 Home 19 Elmhurst Avenue London N2 0LT, UK voice: +44 (0) 208 883 3657 mobile: +44 (0) 796 890 2102 fax: +44 (0) 870 706 4997 |
|
Administrator
|
See note 2 here: http://en.wikipedia.org/wiki/Standard_error_%28statistics%29#Standard_error_of_the_mean Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Bruce Weaver
You're right. My suggestion was rubbish! It doesn't even work for variances.
Sorry - long day and all that. Garry -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: 21 January 2010 23:44 To: [hidden email] Subject: Re: Can anyone help me get a population standard deviation? Hector, FWIW, Excel gives the same results I obtained with SPSS: 2.728 from STDEV (with division by n-1) 2.572 from STDEVP (with division by N) I don't have time to try Gerry's weighting suggestion right now, but off the top of my head, I think that it will work for variances, but not standard deviations. Hector Maletta wrote: > > I am not sure about Bruce's formula. The sample SD is SS divided by n. The > ESTIMATE of the population SD, based in the sample, is SS divided by n-1. > If > I'm right about this, the SPSS SD should be multiplied by n and divided by > n-1 to get the estimated population SD. > On the other hand, if Nancy's dataset represents itself the whole of the > population (e.g. a census), perhaps Nancy believes that the population > variance should be computed directly. But that would not be right: > (a) The directly computed population variance is SS/n, just as in a sample > (a population is a sample of size n=N, with a sampling ratio of 1:1. > (b) On the other hand, a measurement of a population (e.g. a census) is > just > one sample measurement (out of many measurements you can take, with > different census takers or at different times of day, etc.), and therefore > even a census is a sample, with a standard error of the estimate (possibly > quite small unless the census takers are very sloppy). From this > viewpoint, > again the SD of the one (sampled) take of the census is SS/n; the estimate > of the SD for the whole "population" of various possible measurements of > the > same (human) population would be SS/(n-1) which in this case is also > SS/(N-1). > Hector > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Bruce Weaver > Sent: 21 January 2010 18:57 > To: [hidden email] > Subject: Re: Can anyone help me get a population standard deviation? > > Nancy Rusinak wrote: >> >> In SPSS? It automatically assumes a sample and I do not know how to get >> SPSS to give me a standard deviation for a population. Many thanks, in >> advance. >> >> Nancy >> >> > > Here's one way to get it. > > 1. Use AGGREGATE to write the sample SD and N for the variable of interest > to the working data file. > 2. Compute SS = SD^2 x (n-1) > 3. Compute Pop. Variance = SS/n > 4. Compute Pop. SD = SQRT(Pop. Variance) > > E.g., > > data list free / x (f2.0) . > begin data > 2 5 4 9 8 7 4 3 1 > end data. > > AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK= > /ssd_x=SD(x) > /n = nu(x). > > compute #svarx = ssd_x**2 . /* sample variance of X . > compute #SS_x = #svarx * (n-1) . /* SS for X . > compute #pvarx = #SS_x / n . /* population variance of X . > compute psd_x = SQRT(#pvarx). /* population SD . > formats ssd_x psd_x (f8.3). > var lab > ssd_x 'Sample SD of X' > psd_x 'Population SD of X' > . > descrip ssd_x psd_x / stat = mean. > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > -- > View this message in context: > > --tp27263817p27265309.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://old.nabble.com/Can-anyone-help-me-get-a-population-standard-deviation --tp27263817p27266244.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD __________ Information from ESET NOD32 Antivirus, version of virus signature database 4794 (20100121) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 4797 (20100122) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
