Hi,
I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it? Thank you for your attention. Sincerely, Jialin Huang |
Jialin, As I remember, Spss has a specific command, Sample, for sampling cases. If that command is inadequate, please explain the significance of ‘fixed mean and SD’ and ‘sample size is from 300-500’. I understand replacement is not allowed. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin Hi, I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it? Thank you for your attention. Sincerely, Jialin Huang |
I will be out of the office until Friday, January 27. If you need immediate assistance, please call 812-856-5824. I will respond to your e-mail when I return to the office. Thank you, Shimon Sarraf Center for Postsecondary Research Indiana University Bloomington
|
In reply to this post by huang jialin
You can sample in SPSS with: sample <n> from <N> where n is the sample size you want and N is the number of cases in the data set, or you can use: sample <p> where p is the proportion you want to sample expressed as a decimal. John Hall Email: [hidden email] Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: (+33) (0) 2.33.45.91.47 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin Hi, I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it? Thank you for your attention. Sincerely, Jialin Huang |
In reply to this post by huang jialin
Should have said you do that in syntax. From data editor: File > New > Syntax . . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc. Email: [hidden email] Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: (+33) (0) 2.33.45.91.47 From: John F Hall [mailto:[hidden email]] You can sample in SPSS with: sample <n> from <N> where n is the sample size you want and N is the number of cases in the data set, or you can use: sample <p> where p is the proportion you want to sample expressed as a decimal. John Hall Email: [hidden email] Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: (+33) (0) 2.33.45.91.47 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin Hi, I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it? Thank you for your attention. Sincerely, Jialin Huang |
Hi everyone,
Thanks for your reply. Let me elaborate what I am planning to do. I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7.
I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated. Thank you again. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote:
|
SAMPLE 300 FROM 1000.
Depending on your definition of "around", the mean and standard deviation will probably meet your requirement. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] Phone: 312.893.4922 | T/L: 206-4922 From: huang jialin <[hidden email]> To: [hidden email] Date: 01/24/2012 01:30 PM Subject: Re: sampling with fixed mean and SD Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi everyone, Thanks for your reply. Let me elaborate what I am planning to do. I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7. I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated. Thank you again. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@...> wrote: Should have said you do that in syntax.
From data editor:
File > New > Syntax
. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.
Email: johnfhall@... Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47
From: John F Hall [mailto:johnfhall@...]
You can sample in SPSS with:
sample <n> from <N>
where n is the sample size you want and N is the number of cases in the data set, or you can use:
sample <p>
where p is the proportion you want to sample expressed as a decimal.
John Hall
Email: johnfhall@... Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of huang jialin
Hi,
I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?
Thank you for your attention.
Sincerely, Jialin Huang
|
Rick, Thanks. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 1:45 PM, Rick Oliver <[hidden email]> wrote: SAMPLE 300 FROM 1000. |
Someone who knows something about complex
samples might have some suggestions.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] Phone: 312.893.4922 | T/L: 206-4922 From: huang jialin <[hidden email]> To: Rick Oliver/Chicago/IBM@IBMUS Cc: [hidden email] Date: 01/24/2012 01:52 PM Subject: Re: sampling with fixed mean and SD Rick, Thanks for your response. Actually, it would be nice to have the exact mean and sd as listed. I am concerned that random sampling may not be able to pull out the lower distribution cases. Thanks. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 1:45 PM, Rick Oliver <oliverr@...> wrote: SAMPLE 300 FROM 1000. Depending on your definition of "around", the mean and standard deviation will probably meet your requirement. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: oliverr@... Phone: <a href=tel:312.893.4922 target=_blank>312.893.4922 | T/L: 206-4922 From: huang jialin <huangpsych@...> To: [hidden email] Date: 01/24/2012 01:30 PM Subject: Re: sampling with fixed mean and SD Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi everyone, Thanks for your reply. Let me elaborate what I am planning to do. I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7. I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated. Thank you again. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@...> wrote: Should have said you do that in syntax.
From data editor:
File > New > Syntax
. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.
Email: johnfhall@... Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47
From: John F Hall [mailto:johnfhall@...]
You can sample in SPSS with:
sample <n> from <N>
where n is the sample size you want and N is the number of cases in the data set, or you can use:
sample <p>
where p is the proportion you want to sample expressed as a decimal.
John Hall
Email: johnfhall@... Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of huang jialin
Hi,
I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?
Thank you for your attention.
Sincerely, Jialin Huang
|
In reply to this post by huang jialin
Huang,
You don't even have to use syntax! From the menu, 'Data, Select Cases, Random Sample of Cases, Exactly (and then specify no. of cases you want to select)...' ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by huang jialin
Ok, that’s what I figured you had in mind. There might be some sort of optimal solution to your question but I don’t know what it is; perhaps, others do. In lieu of that optimal solution, this is my first round suggestion. I’ll assume your dataset of 1000 records of a variable y, with mean=17, sd=5.1. Compute ydev=abs(y-15). Compute rannum=uniform(1). Sort cases by rannum. Select if ($casenum le 300). Descriptives y. This will give you a sample of 300 cases randomly selected without replacement. I think the mean will be near 15. The standard deviation will not be near 5.7, probably near 5.1. And the distribution probably will be skewed, if it is not already. As an experiment, you could also compute the squared deviation (ydev**2), randomly order it and select 300 cases from that. I think the result will be similar, i.e., mean near 15 and sd near 17 but skewed. I think you may have to subdivide the ydev distribution into, say, 20 five percentile wide ‘bars’. The about 50 cases within each bar are randomly numbered but the number of cases retained from each bar varies in such a way that more cases are selected from bars near the mean but fewer the further you move away the mean. I’d guess the number to be selected is a ratio of the midpoint height (or area) of the bar for a normal distribution with a sd of 15 to the midpoint height (or area) of the bar for a normal distribution with a sd of 17. This won’t be too easy to code but it won’t be too hard either. I’d try that and see what I get and hope that there really is an optimal solution that smarter people know about. Gene Maguin From: huang jialin [mailto:[hidden email]] Hi everyone, Thanks for your reply. Let me elaborate what I am planning to do. I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7. I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated. Thank you again. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote: Should have said you do that in syntax. From data editor: File > New > Syntax . . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc. Email: [hidden email] Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47 From: John F Hall [mailto:[hidden email]] You can sample in SPSS with: sample <n> from <N> where n is the sample size you want and N is the number of cases in the data set, or you can use: sample <p> where p is the proportion you want to sample expressed as a decimal. John Hall Email: [hidden email] Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin Hi, I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it? Thank you for your attention. Sincerely, Jialin Huang |
In reply to this post by Michael Kruger
Mr. Kruger,
I tried to use the command. It ends up different sample from I expected. I need the cases with exact mean and sd I want. Thank you. Sincerely,
Jialin Huang
On Tue, Jan 24, 2012 at 2:17 PM, Michael Kruger <[hidden email]> wrote: Huang, |
In reply to this post by huang jialin
Hi,
Thanks for your helps. I appreciate it. I will try to see whether they work. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 2:23 PM, John F Hall <[hidden email]> wrote:
|
In reply to this post by huang jialin
First of all, note that it might not even
be possible to exactly match the mean and sd with the existing cases.
Now is this the only variable involved? If so, just do this. 1. Draw your sample of 300 at random and then compute the mean and sd. 2. Add the difference between the sample mean and the exact mean to each case. You can make a similar adjust to the sd by another linear transform. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: huang jialin <[hidden email]> To: [hidden email] Date: 01/24/2012 01:26 PM Subject: Re: [SPSSX-L] sampling with fixed mean and SD Sent by: "SPSSX(r) Discussion" <[hidden email]> Mr. Kruger, I tried to use the command. It ends up different sample from I expected. I need the cases with exact mean and sd I want. Thank you. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 2:17 PM, Michael Kruger <aa3657@...> wrote: Huang, You don't even have to use syntax! From the menu, 'Data, Select Cases, Random Sample of Cases, Exactly (and then specify no. of cases you want to select)...' |
Administrator
|
In reply to this post by huang jialin
<BEGIN PROCESS: Opening rusty can of worms with sharp rock!>
*WHY*: This sounds very close to manufacturing data. Your sample is what it is. FWIW: You would be restricting sampling away from the higher end of the distribution. ergo, you would be *REDUCING* the variability, not increasing it as you seem to request. Sounds *FISHY* . -------- <Retiring sharp rock>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Jon K Peck
Hi Jon,
Thanks for your suggestion. Unfortunately, I did not only deal with only one variable. Sincerely, Jialin Huang
On Tue, Jan 24, 2012 at 2:39 PM, Jon K Peck <[hidden email]> wrote: First of all, note that it might not even be possible to exactly match the mean and sd with the existing cases. |
In reply to this post by David Marso
David,
I think you are totally right. My bad of choosing a wrong word. What I am trying to do is to see the effects of range restriction. That is why I have the exact mean and sd.
Thanks. Sincerely, Jialin Huang On Tue, Jan 24, 2012 at 3:01 PM, David Marso <[hidden email]> wrote: <BEGIN PROCESS: Opening rusty can of worms with sharp rock!> |
In reply to this post by huang jialin
If you want to be sure that separate "strata" are represented,
you can do stratified sampling - N1 from strata 1, N2 from strata 2, and so on. Do you have a good reason *not* to use the whole sample? The main reason for randomizing a whole selection is that the next step of necessary data collection is too expensive. When your file has all the data you will use, that is what you should use, almost all of the time, except for validation strategies. If you are doing cross-validation, a common practice is to draw *many* random samples... and then, to report their variability. You might want to read up on cross-validation, or on "bootstrap" samples. If you do, in fact, achieve *exactly* the original mean and SD on some criterion variable, drawing your sample from a population, you will be in a position where every statistician who reads your work will be (rightfully) highly skeptical. Unless you hide that achievement. An exact match is not a likely outcome for a randomized sample. You may need to adjust what you expect, or to adjust the expectations of whoever is requesting the analysis. -- Rich Ulrich Date: Tue, 24 Jan 2012 13:51:11 -0600 From: [hidden email] Subject: Re: sampling with fixed mean and SD To: [hidden email] Rick, |
In reply to this post by David Marso
I agree with David that this is a strange request and
it would be very difficult to obtain a sample with a specific mean and SD from a larger sample/population. That being said, let's change perspectives on the problem. Let's say you have 3000 cases in the larger sample/pop. First, convert them to z-scores and rank order them from smallest to largest value. Assuming you have a symmetric distribution that is more or less normal. If you want a sample of 300 cases, then select 150 cases with a negative z-score and 150 cases with positive z-scores such that absolute value(sum(negative z-scores) = sum(positive z-scores) The sum of deviations around the mean is zero, so when the absolute value of the sum of negative deviations equals the sum of the positive deviations, you have a sample of N=300 that will produce the specified mean. Reconvert to original scale by using a formula like: original scale score = z-score*(SD) + Mean You should now have a sample whose mean is equal to the specified mean. Note that if you have 150 pairs of z-scores that are the same in absolute value but one is positive and the other is negative, then the sample of 300 would reproduce the desired mean. But this might be overly restrictive. I'm less clear on how to make sure that your sample has the same SD or variance as the larger sample/pop but maybe someone else will have an idea. -MIke Palij New York University [hidden email] On Tue, Jan 24, 2012 at 4:01 PM, David Marso <[hidden email]> wrote: > <BEGIN PROCESS: Opening rusty can of worms with sharp rock!> > *WHY*: This sounds very close to manufacturing data. > Your sample is what it is. > FWIW: You would be restricting sampling away from the higher end of the > distribution. ergo, you would be *REDUCING* the variability, not increasing > it as you seem to request. > Sounds *FISHY* . > -------- > <Retiring sharp rock> > > > > huang jialin wrote >> >> Hi, >> >> I am planning to sample cases from a known dataset with fixed mean and SD. >> The sample size is from 300-500. The replacement is not allowed. Can I do >> it in SPSS? If so, how can I do it? >> >> Thank you for your attention. >> >> Sincerely, >> Jialin Huang >> > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5386749.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Huang,
Probably this question should have been asked earlier. What is the full purpose of this project? I think I saw something about range but the statement seemed like a comment in passing. And, I think you commented to Jon that more than one variable is involved. Please elaborate on this part of the project as well. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij Sent: Tuesday, January 24, 2012 4:32 PM To: [hidden email] Subject: Re: sampling with fixed mean and SD I agree with David that this is a strange request and it would be very difficult to obtain a sample with a specific mean and SD from a larger sample/population. That being said, let's change perspectives on the problem. Let's say you have 3000 cases in the larger sample/pop. First, convert them to z-scores and rank order them from smallest to largest value. Assuming you have a symmetric distribution that is more or less normal. If you want a sample of 300 cases, then select 150 cases with a negative z-score and 150 cases with positive z-scores such that absolute value(sum(negative z-scores) = sum(positive z-scores) The sum of deviations around the mean is zero, so when the absolute value of the sum of negative deviations equals the sum of the positive deviations, you have a sample of N=300 that will produce the specified mean. Reconvert to original scale by using a formula like: original scale score = z-score*(SD) + Mean You should now have a sample whose mean is equal to the specified mean. Note that if you have 150 pairs of z-scores that are the same in absolute value but one is positive and the other is negative, then the sample of 300 would reproduce the desired mean. But this might be overly restrictive. I'm less clear on how to make sure that your sample has the same SD or variance as the larger sample/pop but maybe someone else will have an idea. -MIke Palij New York University [hidden email] On Tue, Jan 24, 2012 at 4:01 PM, David Marso <[hidden email]> wrote: > <BEGIN PROCESS: Opening rusty can of worms with sharp rock!> > *WHY*: This sounds very close to manufacturing data. > Your sample is what it is. > FWIW: You would be restricting sampling away from the higher end of the > distribution. ergo, you would be *REDUCING* the variability, not increasing > it as you seem to request. > Sounds *FISHY* . > -------- > <Retiring sharp rock> > > > > huang jialin wrote >> >> Hi, >> >> I am planning to sample cases from a known dataset with fixed mean and >> The sample size is from 300-500. The replacement is not allowed. Can I do >> it in SPSS? If so, how can I do it? >> >> Thank you for your attention. >> >> Sincerely, >> Jialin Huang >> > > > -- > View this message in context: D-tp5315312p5386749.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |