SPSSX Discussion

sampling with fixed mean and SD

Classic

List

Threaded

29 messages Options

huang jialin

sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

Maguin, Eugene

Re: sampling with fixed mean and SD

Jialin,

As I remember, Spss has a specific command, Sample, for sampling cases. If that command is inadequate, please explain the significance of ‘fixed mean and SD’ and ‘sample size is from 300-500’. I understand replacement is not allowed.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: Tuesday, January 24, 2012 11:46 AM
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

Sarraf, Shimon Aaron

Automatic reply: sampling with fixed mean and SD

I will be out of the office until Friday, January 27. If you need immediate assistance, please call 812-856-5824. I will respond to your e-mail when I return to the office.

Thank you,

Shimon Sarraf

Center for Postsecondary Research

Indiana University Bloomington

John F Hall

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: (+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

John F Hall

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

Should have said you do that in syntax.

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: (+33) (0) 2.33.45.91.47

From: John F Hall [mailto:[hidden email]]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: (+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

huang jialin

Re: sampling with fixed mean and SD

Hi everyone,

Thanks for your reply. Let me elaborate what I am planning to do.

I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7.

I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated.

Thank you again.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote:

Should have said you do that in syntax.

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147" target="_blank">(+33) (0) 2.33.45.91.47

From: John F Hall [mailto:[hidden email]]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample 

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147" target="_blank">(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

Rick Oliver-3

Re: sampling with fixed mean and SD

SAMPLE 300 FROM 1000.

Depending on your definition of "around", the mean and standard deviation will probably meet your requirement.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]
Phone: 312.893.4922 | T/L: 206-4922

From: huang jialin <[hidden email]>
To: [hidden email]
Date: 01/24/2012 01:30 PM
Subject: Re: sampling with fixed mean and SD
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi everyone,

Thanks for your reply. Let me elaborate what I am planning to do.

I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7.

I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated.

Thank you again.

Sincerely,
Jialin Huang

On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@...> wrote:
Should have said you do that in syntax.

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: johnfhall@...

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47

From: John F Hall [mailto:johnfhall@...]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: johnfhall@...

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

huang jialin

Re: sampling with fixed mean and SD

Rick,

Thanks for your response. Actually, it would be nice to have the exact mean and sd as listed. I am concerned that random sampling may not be able to pull out the lower distribution cases.

Thanks.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 1:45 PM, Rick Oliver <[hidden email]> wrote:

SAMPLE 300 FROM 1000.

Depending on your definition of "around", the mean and standard deviation will probably meet your requirement.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]
Phone: <a href="tel:312.893.4922" value="+13128934922" target="_blank">312.893.4922 | T/L: 206-4922

From: huang jialin <[hidden email]>
To: [hidden email]
Date: 01/24/2012 01:30 PM
Subject: Re: sampling with fixed mean and SD
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi everyone,

Thanks for your reply. Let me elaborate what I am planning to do.

I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7.

I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated.

Thank you again.

Sincerely,
Jialin Huang

On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote:
Should have said you do that in syntax.

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: [hidden email]
Website: www.surveyresearch.weebly.com
Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47

From: John F Hall [mailto:[hidden email]]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample 

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: [hidden email]
Website: www.surveyresearch.weebly.com
Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,
Jialin Huang

Rick Oliver-3

Re: sampling with fixed mean and SD

Someone who knows something about complex samples might have some suggestions.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]
Phone: 312.893.4922 | T/L: 206-4922

From: huang jialin <[hidden email]>
To: Rick Oliver/Chicago/IBM@IBMUS
Cc: [hidden email]
Date: 01/24/2012 01:52 PM
Subject: Re: sampling with fixed mean and SD

Rick,

Thanks for your response. Actually, it would be nice to have the exact mean and sd as listed. I am concerned that random sampling may not be able to pull out the lower distribution cases.

Thanks.

Sincerely,
Jialin Huang

On Tue, Jan 24, 2012 at 1:45 PM, Rick Oliver <oliverr@...> wrote:
SAMPLE 300 FROM 1000.

Depending on your definition of "around", the mean and standard deviation will probably meet your requirement.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: oliverr@...
Phone: <a href=tel:312.893.4922 target=_blank>312.893.4922 | T/L: 206-4922

From: huang jialin <huangpsych@...>
To: [hidden email]
Date: 01/24/2012 01:30 PM
Subject: Re: sampling with fixed mean and SD
Sent by: "SPSSX(r) Discussion" <[hidden email]>

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: johnfhall@...

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47

From: John F Hall [mailto:johnfhall@...]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: johnfhall@...

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: <a href=tel:%28%2B33%29%20%280%29%202.33.45.91.47 target=_blank>(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

Michael Kruger

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

Huang,

You don't even have to use syntax! From the menu, 'Data, Select Cases,
Random Sample of Cases, Exactly (and then specify no. of cases you want
to select)...'

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

Ok, that’s what I figured you had in mind. There might be some sort of optimal solution to your question but I don’t know what it is; perhaps, others do. In lieu of that optimal solution, this is my first round suggestion. I’ll assume your dataset of 1000 records of a variable y, with mean=17, sd=5.1.

Compute ydev=abs(y-15).

Compute rannum=uniform(1).

Sort cases by rannum.

Select if ($casenum le 300).

Descriptives y.

This will give you a sample of 300 cases randomly selected without replacement. I think the mean will be near 15. The standard deviation will not be near 5.7, probably near 5.1. And the distribution probably will be skewed, if it is not already. As an experiment, you could also compute the squared deviation (ydev**2), randomly order it and select 300 cases from that. I think the result will be similar, i.e., mean near 15 and sd near 17 but skewed.

I think you may have to subdivide the ydev distribution into, say, 20 five percentile wide ‘bars’. The about 50 cases within each bar are randomly numbered but the number of cases retained from each bar varies in such a way that more cases are selected from bars near the mean but fewer the further you move away the mean. I’d guess the number to be selected is a ratio of the midpoint height (or area) of the bar for a normal distribution with a sd of 15 to the midpoint height (or area) of the bar for a normal distribution with a sd of 17. This won’t be too easy to code but it won’t be too hard either.

I’d try that and see what I get and hope that there really is an optimal solution that smarter people know about.

Gene Maguin

From: huang jialin [mailto:[hidden email]]
Sent: Tuesday, January 24, 2012 2:27 PM
To: John F Hall; Gene Maguin
Cc: [hidden email]
Subject: Re: sampling with fixed mean and SD

Hi everyone,

Thanks for your reply. Let me elaborate what I am planning to do.

I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7.

I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated.

Thank you again.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote:

Should have said you do that in syntax.

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47

From: John F Hall [mailto:[hidden email]]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,

Jialin Huang

huang jialin

Re: sampling with fixed mean and SD

In reply to this post by Michael Kruger

Mr. Kruger,

I tried to use the command. It ends up different sample from I expected. I need the cases with exact mean and sd I want.

Thank you.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 2:17 PM, Michael Kruger <[hidden email]> wrote:

Huang,

You don't even have to use syntax! From the menu, 'Data, Select Cases, Random Sample of Cases, Exactly (and then specify no. of cases you want to select)...'

huang jialin

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

Hi,

Thanks for your helps. I appreciate it. I will try to see whether they work.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 2:23 PM, John F Hall <[hidden email]> wrote:

Jialin

I’m not quite sure what you are doing, but I used to do something like this when I was teaching. I had data from a sample survey with 3100 cases (British Social Attitudes series) and wanted to demonstrate the idea of sampling from a population. I used the “sample” as the population and got students to draw successive samples of size n from N to demonstrate sampling variation of the mean, proportions etc. In those days we were limited by technology and classroom time, so even with 24 students by 2 samples each, it wasn’t always possible to show that the sampling variation of the mean was approximately normal. I think SPSS always started with the same seed, so to avoid all students getting the same sample I got them to SET the SEED to a very high integer, usually their date of birth in yymmdd format. In one class I had three students with the same birth date!

I also discovered that you need pretty large samples of eg 400 or 500 from 3100 to get anywhere near the results I needed: 100 from 3100 produced some very erratic means and percentages, but the students learned a lot about sampling variation.

You could try using TEMPORARY to select successive samples, but you may need to set the seed first.

SET SEED 401207 .

TEMP.
SAMPLE 300 FROM 1000 .

~ ~ ~ ~ ~
TEMP.

SAMPLE 300 FROM 1000 .
~ ~ ~ ~ ~

TEMP.
SAMPLE 300 FROM 1000 .

~ ~ ~ ~ ~

Happy sampling

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147" target="_blank">(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 20:51
To: [hidden email]

Subject: Re: sampling with fixed mean and SD

Rick,

Thanks for your response. Actually, it would be nice to have the exact mean and sd as listed. I am concerned that random sampling may not be able to pull out the lower distribution cases.

Thanks.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 1:45 PM, Rick Oliver <[hidden email]> wrote:

SAMPLE 300 FROM 1000.

Depending on your definition of "around", the mean and standard deviation will probably meet your requirement.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]
Phone: <a href="tel:312.893.4922" target="_blank">312.893.4922 | T/L: 206-4922

From: huang jialin <[hidden email]>
To: [hidden email]
Date: 01/24/2012 01:30 PM
Subject: Re: sampling with fixed mean and SD
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi everyone,

Thanks for your reply. Let me elaborate what I am planning to do.

I have a dataset of 1000 cases, considering it as a population. M= 17, SD = 5.1. I am trying to pull out a sample size of roughly 300 cases, but the mean need to be around 15, and SD is around 5.7.

I was wondering whether SPSS has any syntax that I can use. Your helps are very appreciated.

Thank you again.

Sincerely,
Jialin Huang

On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote:
Should have said you do that in syntax.

From data editor:

File > New > Syntax

. . to open a new syntax file. Write the command, but make sure you put a full stop (period) at the end of it, then press the green triangle etc.

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47

From: John F Hall [mailto:[hidden email]]
Sent: 24 January 2012 18:43
To: 'huang jialin'; '[hidden email]'
Subject: RE: sampling with fixed mean and SD

You can sample in SPSS with:

sample <n> from <N>

where n is the sample size you want and N is the number of cases in the data set, or you can use:

sample 

where p is the proportion you want to sample expressed as a decimal.

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1
Phone: <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 24 January 2012 17:46
To: [hidden email]
Subject: sampling with fixed mean and SD

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD. The sample size is from 300-500. The replacement is not allowed. Can I do it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,
Jialin Huang

Jon K Peck

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

First of all, note that it might not even be possible to exactly match the mean and sd with the existing cases.

Now is this the only variable involved? If so, just do this.
1. Draw your sample of 300 at random and then compute the mean and sd.
2. Add the difference between the sample mean and the exact mean to each case.

You can make a similar adjust to the sd by another linear transform.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: huang jialin <[hidden email]>
To: [hidden email]
Date: 01/24/2012 01:26 PM
Subject: Re: [SPSSX-L] sampling with fixed mean and SD
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Mr. Kruger,

I tried to use the command. It ends up different sample from I expected. I need the cases with exact mean and sd I want.

Thank you.

Sincerely,
Jialin Huang

On Tue, Jan 24, 2012 at 2:17 PM, Michael Kruger <aa3657@...> wrote:
Huang,

You don't even have to use syntax! From the menu, 'Data, Select Cases, Random Sample of Cases, Exactly (and then specify no. of cases you want to select)...'

David Marso

Re: sampling with fixed mean and SD

Administrator

In reply to this post by huang jialin

<BEGIN PROCESS: Opening rusty can of worms with sharp rock!>
*WHY*: This sounds very close to manufacturing data.
Your sample is what it is.
FWIW: You would be restricting sampling away from the higher end of the distribution. ergo, you would be *REDUCING* the variability, not increasing it as you seem to request.
Sounds *FISHY* .
--------
<Retiring sharp rock>

huang jialin wrote

Hi,

I am planning to sample cases from a known dataset with fixed mean and SD.
The sample size is from 300-500. The replacement is not allowed. Can I do
it in SPSS? If so, how can I do it?

Thank you for your attention.

Sincerely,
Jialin Huang

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

huang jialin

Re: sampling with fixed mean and SD

In reply to this post by Jon K Peck

Hi Jon,

Thanks for your suggestion. Unfortunately, I did not only deal with only one variable.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 2:39 PM, Jon K Peck <[hidden email]> wrote:

First of all, note that it might not even be possible to exactly match the mean and sd with the existing cases.

Now is this the only variable involved? If so, just do this.
1. Draw your sample of 300 at random and then compute the mean and sd.
2. Add the difference between the sample mean and the exact mean to each case.

You can make a similar adjust to the sd by another linear transform.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: <a href="tel:720-342-5621" value="+17203425621" target="_blank">720-342-5621

From: huang jialin <[hidden email]>
To: [hidden email]
Date: 01/24/2012 01:26 PM
Subject: Re: [SPSSX-L] sampling with fixed mean and SD
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Mr. Kruger,

I tried to use the command. It ends up different sample from I expected. I need the cases with exact mean and sd I want.

Thank you.

Sincerely,
Jialin Huang

On Tue, Jan 24, 2012 at 2:17 PM, Michael Kruger <[hidden email]> wrote:
Huang,

You don't even have to use syntax! From the menu, 'Data, Select Cases, Random Sample of Cases, Exactly (and then specify no. of cases you want to select)...'

huang jialin

Re: sampling with fixed mean and SD

In reply to this post by David Marso

David,

I think you are totally right. My bad of choosing a wrong word.

What I am trying to do is to see the effects of range restriction. That is why I have the exact mean and sd.

Thanks.

Sincerely,

Jialin Huang

On Tue, Jan 24, 2012 at 3:01 PM, David Marso <[hidden email]> wrote:

<BEGIN PROCESS: Opening rusty can of worms with sharp rock!>
*WHY*: This sounds very close to manufacturing data.
Your sample is what it is.
FWIW: You would be restricting sampling away from the higher end of the
distribution. ergo, you would be *REDUCING* the variability, not increasing
it as you seem to request.
Sounds *FISHY* .
--------
<Retiring sharp rock>

huang jialin wrote

>
> Hi,
>
> I am planning to sample cases from a known dataset with fixed mean and SD.
> The sample size is from 300-500. The replacement is not allowed. Can I do
> it in SPSS? If so, how can I do it?
>
> Thank you for your attention.
>
> Sincerely,
> Jialin Huang
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5386749.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: sampling with fixed mean and SD

In reply to this post by huang jialin

If you want to be sure that separate "strata" are represented,
you can do stratified sampling - N1 from strata 1, N2 from strata 2,
and so on.

Do you have a good reason *not* to use the whole sample?
The main reason for randomizing a whole selection is that
the next step of necessary data collection is too expensive.
When your file has all the data you will use, that is what you
should use, almost all of the time, except for validation strategies.

If you are doing cross-validation, a common practice is to draw
*many* random samples... and then, to report their variability.
You might want to read up on cross-validation, or on "bootstrap"
samples.

If you do, in fact, achieve *exactly* the original mean and SD on
some criterion variable, drawing your sample from a population,
you will be in a position where every statistician who reads your
work will be (rightfully) highly skeptical. Unless you hide that
achievement. An exact match is not a likely outcome for a
randomized sample. You may need to adjust what you expect, or
to adjust the expectations of whoever is requesting the analysis.

--
Rich Ulrich

Date: Tue, 24 Jan 2012 13:51:11 -0600
From: [hidden email]
Subject: Re: sampling with fixed mean and SD
To: [hidden email]

Rick,

Thanks for your response. Actually, it would be nice to have the exact mean and sd as listed. I am concerned that random sampling may not be able to pull out the lower distribution cases.

[snip, previous]

Mike

Re: sampling with fixed mean and SD

In reply to this post by David Marso

I agree with David that this is a strange request and
it would be very difficult to obtain a sample with a specific
mean and SD from a larger sample/population.

That being said, let's change perspectives on the problem.
Let's say you have 3000 cases in the larger sample/pop.
First, convert them to z-scores and rank order them from
smallest to largest value. Assuming you have a symmetric
distribution that is more or less normal.

If you want a sample of 300 cases, then select 150 cases
with a negative z-score and 150 cases with positive z-scores
such that

absolute value(sum(negative z-scores) = sum(positive z-scores)

The sum of deviations around the mean is zero, so when the
absolute value of the sum of negative deviations equals the
sum of the positive deviations, you have a sample of N=300 that
will produce the specified mean. Reconvert to original scale
by using a formula like:

original scale score = z-score*(SD) + Mean

You should now have a sample whose mean is equal to the
specified mean.

Note that if you have 150 pairs of z-scores that are the
same in absolute value but one is positive and the other
is negative, then the sample of 300 would reproduce the
desired mean. But this might be overly restrictive.

I'm less clear on how to make sure that your sample has the
same SD or variance as the larger sample/pop but maybe
someone else will have an idea.

-MIke Palij
New York University
[hidden email]

On Tue, Jan 24, 2012 at 4:01 PM, David Marso <[hidden email]> wrote:

> <BEGIN PROCESS: Opening rusty can of worms with sharp rock!>
> *WHY*: This sounds very close to manufacturing data.
> Your sample is what it is.
> FWIW: You would be restricting sampling away from the higher end of the
> distribution. ergo, you would be *REDUCING* the variability, not increasing
> it as you seem to request.
> Sounds *FISHY* .
> --------
> <Retiring sharp rock>
>
>
>
> huang jialin wrote
>>
>> Hi,
>>
>> I am planning to sample cases from a known dataset with fixed mean and SD.
>> The sample size is from 300-500. The replacement is not allowed. Can I do
>> it in SPSS? If so, how can I do it?
>>
>> Thank you for your attention.
>>
>> Sincerely,
>> Jialin Huang
>>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5386749.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: sampling with fixed mean and SD

Huang,

Probably this question should have been asked earlier. What is the full
purpose of this project? I think I saw something about range but the
statement seemed like a comment in passing. And, I think you commented to
Jon that more than one variable is involved. Please elaborate on this part
of the project as well.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Michael Palij
Sent: Tuesday, January 24, 2012 4:32 PM
To: [hidden email]
Subject: Re: sampling with fixed mean and SD

I agree with David that this is a strange request and
it would be very difficult to obtain a sample with a specific
mean and SD from a larger sample/population.

That being said, let's change perspectives on the problem.
Let's say you have 3000 cases in the larger sample/pop.
First, convert them to z-scores and rank order them from
smallest to largest value. Assuming you have a symmetric
distribution that is more or less normal.

If you want a sample of 300 cases, then select 150 cases
with a negative z-score and 150 cases with positive z-scores
such that

absolute value(sum(negative z-scores) = sum(positive z-scores)

The sum of deviations around the mean is zero, so when the
absolute value of the sum of negative deviations equals the
sum of the positive deviations, you have a sample of N=300 that
will produce the specified mean. Reconvert to original scale
by using a formula like:

original scale score = z-score*(SD) + Mean

You should now have a sample whose mean is equal to the
specified mean.

Note that if you have 150 pairs of z-scores that are the
same in absolute value but one is positive and the other
is negative, then the sample of 300 would reproduce the
desired mean. But this might be overly restrictive.

I'm less clear on how to make sure that your sample has the
same SD or variance as the larger sample/pop but maybe
someone else will have an idea.

-MIke Palij
New York University
[hidden email]

On Tue, Jan 24, 2012 at 4:01 PM, David Marso <[hidden email]> wrote:
> <BEGIN PROCESS: Opening rusty can of worms with sharp rock!>
> *WHY*: This sounds very close to manufacturing data.
> Your sample is what it is.
> FWIW: You would be restricting sampling away from the higher end of the
> distribution. ergo, you would be *REDUCING* the variability, not
increasing

> it as you seem to request.
> Sounds *FISHY* .
> --------
> <Retiring sharp rock>
>
>
>
> huang jialin wrote
>>
>> Hi,
>>
>> I am planning to sample cases from a known dataset with fixed mean and

SD.

>> The sample size is from 300-500. The replacement is not allowed. Can I do
>> it in SPSS? If so, how can I do it?
>>
>> Thank you for your attention.
>>
>> Sincerely,
>> Jialin Huang
>>
>
>
> --
> View this message in context:

http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-S
D-tp5315312p5386749.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD