sampling with fixed mean and SD

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

huang jialin
Hi Mike,

Thanks for your response. I will try to see how it goes.

Thank you.

Sincerely,
Jialin Huang


On Tue, Jan 24, 2012 at 3:31 PM, Michael Palij <[hidden email]> wrote:
I agree with David that this is a strange request and
it would be very difficult to obtain a sample with a specific
mean and SD from a larger sample/population.

That being said, let's change perspectives on the problem.
Let's say you have 3000 cases in the larger sample/pop.
First, convert them to z-scores and rank order them from
smallest to largest value.  Assuming you have a symmetric
distribution that is more or less normal.

If you want a sample of 300 cases, then select 150 cases
with a negative z-score and 150 cases with positive z-scores
such that

absolute value(sum(negative z-scores) = sum(positive z-scores)

The sum of deviations around the mean is zero, so when the
absolute value of the sum of negative deviations equals the
sum of the positive deviations, you have a sample of N=300 that
will produce the specified mean.  Reconvert to original scale
by using a formula like:

original scale score = z-score*(SD) + Mean

You should now have a sample whose mean is equal to the
specified mean.

Note that if you have 150 pairs of z-scores that are the
same in absolute value but one is positive and the other
is negative, then the sample of 300 would reproduce the
desired mean.  But this might be overly restrictive.

I'm less clear on how to make sure that your sample has the
same SD or variance as the larger sample/pop but maybe
someone else will have an idea.

-MIke Palij
New York University
[hidden email]


On Tue, Jan 24, 2012 at 4:01 PM, David Marso <[hidden email]> wrote:
> <BEGIN PROCESS: Opening rusty can of worms with sharp rock!>
> *WHY*:  This sounds very close to manufacturing data.
> Your sample is what it is.
> FWIW:  You would be restricting sampling away from the higher end of the
> distribution.  ergo, you would be *REDUCING* the variability, not increasing
> it as you seem to request.
> Sounds *FISHY* .
> --------
> <Retiring sharp rock>
>
>
>
> huang jialin wrote
>>
>> Hi,
>>
>> I am planning to sample cases from a known dataset with fixed mean and SD.
>> The sample size is from 300-500. The replacement is not allowed. Can I do
>> it in SPSS? If so, how can I do it?
>>
>> Thank you for your attention.
>>
>> Sincerely,
>> Jialin Huang
>>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5386749.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

Bruce Weaver
Administrator
In reply to this post by huang jialin
I'm still trying to work out exactly what it is you are trying to accomplish.

In the quoted message below, you described what kind of sample you want to achieve.  In a later message, you said:

"What I am trying to do is to see the effects of range restriction. That is why I have the exact mean and sd."

Looking at the effect of restricted range suggests you are working with some kind of regression model.  Is that right?  It might help if you backed up a few steps, and started by telling us what kind of analysis you are doing in the first place, what you saw in the results that made you want to "see the effects of range restriction", etc.  It may be that someone will suggest a different way of seeing those effects.

HTH.

p.s. - I agree with David's comments about this looking "fishy".


huang jialin wrote
Hi everyone,

Thanks for your reply. Let me elaborate what I am planning to do.

I have a dataset of 1000 cases, considering it as a population. M= 17, SD =
5.1. I am trying to pull out a sample size of roughly 300 cases, but the
mean need to be around 15, and SD is around 5.7.

I was wondering whether SPSS has any syntax that I can use. Your helps are
very appreciated.

Thank you again.

Sincerely,
Jialin Huang



On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <[hidden email]> wrote:

> Should have said you do that in syntax.****
>
> ** **
>
> From data editor:****
>
> ** **
>
> ** **
>
> File > New > Syntax****
>
> ** **
>
> . . to open a new syntax file.  Write the command, but make sure you put a
> full stop (period) at the end of it, then press the green triangle etc.***
> *
>
> ** **
>
> ** **
>
> Email:     [hidden email] ****
>
> Website: www.surveyresearch.weebly.com <http://surveyresearch.weebly.com/>
> ****
>
> Skype:   surveyresearcher1****
>
> Phone:    (+33) (0) 2.33.45.91.47****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* John F Hall [mailto:[hidden email]]
> *Sent:* 24 January 2012 18:43
> *To:* 'huang jialin'; '[hidden email]'
> *Subject:* RE: sampling with fixed mean and SD****
>
> ** **
>
> You can sample in SPSS with:****
>
> ** **
>
> sample <n> from <N>****
>
> ** **
>
> where n is the sample size you want and N is the number of cases in the
> data set, or you can use:****
>
> ** **
>
> sample <p> ****
>
> ** **
>
> where p is the proportion you want to sample expressed as a decimal.****
>
> ** **
>
> John Hall****
>
> ** **
>
> Email:     [hidden email] ****
>
> Website: www.surveyresearch.weebly.com <http://surveyresearch.weebly.com/>
> ****
>
> Skype:   surveyresearcher1****
>
> Phone:    (+33) (0) 2.33.45.91.47****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* SPSSX(r) Discussion [mailto:[hidden email]] *On Behalf
> Of *huang jialin
> *Sent:* 24 January 2012 17:46
> *To:* [hidden email]
> *Subject:* sampling with fixed mean and SD****
>
> ** **
>
> Hi,****
>
> ** **
>
> I am planning to sample cases from a known dataset with fixed mean and SD.
> The sample size is from 300-500. The replacement is not allowed. Can I do
> it in SPSS? If so, how can I do it? ****
>
> ** **
>
> Thank you for your attention.****
>
> ** **
>
> Sincerely,****
>
> Jialin Huang****
>
> ** **
>
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

huang jialin
Bruce,

Thanks for your suggestion. I will try to make it more clearer.

I  did a linear equating on two sets of score (S1 and S2). They were collected from the same sample. The S1 distribution used for equating is m=17 and sd=5.1. The result (the look-up table of original scores and scaled scores) was applied on a larger sample with m=15 and sd=5.7, which only has S1. What I am trying to explore is whether the equating is affected by the smaller sample, as the mean and sd showed it is restricted. However, there is no way to collect more data at the moment. 

Thus, I am planning to pull out cases from the smaller sample with exact mean and sd as the larger one. Then, re-do the equating and check the differences between samples.

Does it make sense now? What would you suggest me to do?    

Thank you very much.

Sincerely,
Jialin Huang


On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver <[hidden email]> wrote:
I'm still trying to work out exactly what it is you are trying to accomplish.

In the quoted message below, you described what kind of sample you want to
achieve.  In a later message, you said:

"What I am trying to do is to see the effects of range restriction. That is
why I have the exact mean and sd."

Looking at the effect of restricted range suggests you are working with some
kind of regression model.  Is that right?  It might help if you backed up a
few steps, and started by telling us what kind of analysis you are doing in
the first place, what you saw in the results that made you want to "see the
effects of range restriction", etc.  It may be that someone will suggest a
different way of seeing those effects.

HTH.

p.s. - I agree with David's comments about this looking "fishy".



huang jialin wrote
>
> Hi everyone,
>
> Thanks for your reply. Let me elaborate what I am planning to do.
>
> I have a dataset of 1000 cases, considering it as a population. M= 17, SD
> =
> 5.1. I am trying to pull out a sample size of roughly 300 cases, but the
> mean need to be around 15, and SD is around 5.7.
>
> I was wondering whether SPSS has any syntax that I can use. Your helps are
> very appreciated.
>
> Thank you again.
>
> Sincerely,
> Jialin Huang
>
>
>
> On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@> wrote:
>
>> Should have said you do that in syntax.****
>>
>> ** **
>>
>> From data editor:****
>>
>> ** **
>>
>> ** **
>>
>> File > New > Syntax****
>>
>> ** **
>>
>> . . to open a new syntax file.  Write the command, but make sure you put
>> a
>> full stop (period) at the end of it, then press the green triangle
>> etc.***
>> *
>>
>> ** **
>>
>> ** **
>>
>> Email:     johnfhall@ ****
>>
>> Website: www.surveyresearch.weebly.com
>> <http://surveyresearch.weebly.com/>
>> ****
>>
>> Skype:   surveyresearcher1****
>>
>> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147">(+33) (0) 2.33.45.91.47****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* John F Hall [mailto:[hidden email]]
>> *Sent:* 24 January 2012 18:43
>> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
>> *Subject:* RE: sampling with fixed mean and SD****
>>
>> ** **
>>
>> You can sample in SPSS with:****
>>
>> ** **
>>
>> sample <n> from <N>****
>>
>> ** **
>>
>> where n is the sample size you want and N is the number of cases in the
>> data set, or you can use:****
>>
>> ** **
>>
>> sample <p> ****
>>
>> ** **
>>
>> where p is the proportion you want to sample expressed as a decimal.****
>>
>> ** **
>>
>> John Hall****
>>
>> ** **
>>
>> Email:     johnfhall@ ****
>>
>> Website: www.surveyresearch.weebly.com
>> <http://surveyresearch.weebly.com/>
>> ****
>>
>> Skype:   surveyresearcher1****
>>
>> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147">(+33) (0) 2.33.45.91.47****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* SPSSX(r) Discussion [mailto:[hidden email].UGA] *On Behalf
>> Of *huang jialin
>> *Sent:* 24 January 2012 17:46
>> *To:* SPSSX-L@.UGA
>> *Subject:* sampling with fixed mean and SD****
>>
>> ** **
>>
>> Hi,****
>>
>> ** **
>>
>> I am planning to sample cases from a known dataset with fixed mean and
>> SD.
>> The sample size is from 300-500. The replacement is not allowed. Can I do
>> it in SPSS? If so, how can I do it? ****
>>
>> ** **
>>
>> Thank you for your attention.****
>>
>> ** **
>>
>> Sincerely,****
>>
>> Jialin Huang****
>>
>> ** **
>>
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

David Marso
Administrator
"Thus, I am planning to pull out cases from the smaller sample with exact mean and sd as the larger one. Then, re-do the equating and check the differences between samples."

I submit the following with some hesitation and do not plan to provide exact code but only a general description.  Say you have 2 files: file1 and file2.  File1 contains the sample you wish to replicate the distribution from the cases in file2.

ADD FILES / FILE 'file1' /IN=FLAG/ FILE='file2'.
COMPUTE TAKEN=0.
**BARELY TESTED/ YMMV/ GOOD LUCK...
* Repeat the following code as needed*.
SORT CASES BY TAKEN Y.
COMPUTE GRAB=NOT(FLAG) AND LAG(FLAG).
CREATE GRABFROM=LEAD(GRAB,1).
COMPUTE TAKEN=TAKEN OR GRAB OR GRABFROM.

Gotta Go, It's Miller time!
--
HTH, David



huang jialin wrote
Bruce,

Thanks for your suggestion. I will try to make it more clearer.

I  did a linear equating on two sets of score (S1 and S2). They were
collected from the same sample. The S1 distribution used for equating is
m=17 and sd=5.1. The result (the look-up table of original scores and
scaled scores) was applied on a larger sample with m=15 and sd=5.7, which
only has S1. What I am trying to explore is whether the equating is
affected by the smaller sample, as the mean and sd showed it is restricted.
However, there is no way to collect more data at the moment.

Thus, I am planning to pull out cases from the smaller sample with exact
mean and sd as the larger one. Then, re-do the equating and check the
differences between samples.

Does it make sense now? What would you suggest me to do?

Thank you very much.

Sincerely,
Jialin Huang


On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver <[hidden email]>wrote:

> I'm still trying to work out exactly what it is you are trying to
> accomplish.
>
> In the quoted message below, you described what kind of sample you want to
> achieve.  In a later message, you said:
>
> "What I am trying to do is to see the effects of range restriction. That is
> why I have the exact mean and sd."
>
> Looking at the effect of restricted range suggests you are working with
> some
> kind of regression model.  Is that right?  It might help if you backed up a
> few steps, and started by telling us what kind of analysis you are doing in
> the first place, what you saw in the results that made you want to "see the
> effects of range restriction", etc.  It may be that someone will suggest a
> different way of seeing those effects.
>
> HTH.
>
> p.s. - I agree with David's comments about this looking "fishy".
>
>
>
> huang jialin wrote
> >
> > Hi everyone,
> >
> > Thanks for your reply. Let me elaborate what I am planning to do.
> >
> > I have a dataset of 1000 cases, considering it as a population. M= 17, SD
> > =
> > 5.1. I am trying to pull out a sample size of roughly 300 cases, but the
> > mean need to be around 15, and SD is around 5.7.
> >
> > I was wondering whether SPSS has any syntax that I can use. Your helps
> are
> > very appreciated.
> >
> > Thank you again.
> >
> > Sincerely,
> > Jialin Huang
> >
> >
> >
> > On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@> wrote:
> >
> >> Should have said you do that in syntax.****
> >>
> >> ** **
> >>
> >> From data editor:****
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> File > New > Syntax****
> >>
> >> ** **
> >>
> >> . . to open a new syntax file.  Write the command, but make sure you put
> >> a
> >> full stop (period) at the end of it, then press the green triangle
> >> etc.***
> >> *
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> Email:     johnfhall@ ****
> >>
> >> Website: www.surveyresearch.weebly.com
> >> <http://surveyresearch.weebly.com/>
> >> ****
> >>
> >> Skype:   surveyresearcher1****
> >>
> >> Phone:    (+33) (0) 2.33.45.91.47****
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> *From:* John F Hall [mailto:johnfhall@]
> >> *Sent:* 24 January 2012 18:43
> >> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
> >> *Subject:* RE: sampling with fixed mean and SD****
> >>
> >> ** **
> >>
> >> You can sample in SPSS with:****
> >>
> >> ** **
> >>
> >> sample <n> from <N>****
> >>
> >> ** **
> >>
> >> where n is the sample size you want and N is the number of cases in the
> >> data set, or you can use:****
> >>
> >> ** **
> >>
> >> sample <p> ****
> >>
> >> ** **
> >>
> >> where p is the proportion you want to sample expressed as a decimal.****
> >>
> >> ** **
> >>
> >> John Hall****
> >>
> >> ** **
> >>
> >> Email:     johnfhall@ ****
> >>
> >> Website: www.surveyresearch.weebly.com
> >> <http://surveyresearch.weebly.com/>
> >> ****
> >>
> >> Skype:   surveyresearcher1****
> >>
> >> Phone:    (+33) (0) 2.33.45.91.47****
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf
> >> Of *huang jialin
> >> *Sent:* 24 January 2012 17:46
> >> *To:* SPSSX-L@.UGA
> >> *Subject:* sampling with fixed mean and SD****
> >>
> >> ** **
> >>
> >> Hi,****
> >>
> >> ** **
> >>
> >> I am planning to sample cases from a known dataset with fixed mean and
> >> SD.
> >> The sample size is from 300-500. The replacement is not allowed. Can I
> do
> >> it in SPSS? If so, how can I do it? ****
> >>
> >> ** **
> >>
> >> Thank you for your attention.****
> >>
> >> ** **
> >>
> >> Sincerely,****
> >>
> >> Jialin Huang****
> >>
> >> ** **
> >>
> >
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

David Marso
Administrator
Somewhat different approach which will possibly work better for discrete (integer valued) distributions.
Again, YMMV and I make *NO* guarantee re the validity or advisability of this approach since I really have no freaking idea of what the hell you are really up to, the specifics of your distributions etc...
If you can't figure out what the code is doing then refer to the manual.  If that fails, DELETE it and pretend that it doesn't exist!!!
---
input program.
loop I=1 to 300.
compute Y=RND(RV.NORMAL(15,5.7)).
end case.
end loop.
end file.
end input program.
SAVE OUTFILE "test.sav".
DESC Y.
FREQ Y.
AGGREGATE OUTFILE "testAGG.sav" / BREAK Y / N=N.

input program.
loop I=1 to 1000.
compute Y=RND(RV.NORMAL(17,5.1)).
COMPUTE OTHERDAT=UNIFORM(1).
end case.
end loop.
end file.
end input program.
DESC Y.
FREQ Y.

COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY Y SCRAMBLE.
MATCH FILES / FILE * / IN=RAW / FILE="testAGG.sav" /IN=TAB/ BY Y.
COMPUTE COUNTER=SUM(LAG(COUNTER)*(Y EQ LAG(Y)),  RAW).
COMPUTE ReqN=N.
IF SYSMIS(ReqN) ReqN=LAG(ReqN).
COMPUTE DRAWN=RAW AND COUNTER LE ReqN.
TEMPORARY.
SELECT IF DRAWN.
DESC Y.


David Marso wrote
"Thus, I am planning to pull out cases from the smaller sample with exact mean and sd as the larger one. Then, re-do the equating and check the differences between samples."

I submit the following with some hesitation and do not plan to provide exact code but only a general description.  Say you have 2 files: file1 and file2.  File1 contains the sample you wish to replicate the distribution from the cases in file2.

ADD FILES / FILE 'file1' /IN=FLAG/ FILE='file2'.
COMPUTE TAKEN=0.
**BARELY TESTED/ YMMV/ GOOD LUCK...
* Repeat the following code as needed*.
SORT CASES BY TAKEN Y.
COMPUTE GRAB=NOT(FLAG) AND LAG(FLAG).
CREATE GRABFROM=LEAD(GRAB,1).
COMPUTE TAKEN=TAKEN OR GRAB OR GRABFROM.

Gotta Go, It's Miller time!
--
HTH, David



huang jialin wrote
Bruce,

Thanks for your suggestion. I will try to make it more clearer.

I  did a linear equating on two sets of score (S1 and S2). They were
collected from the same sample. The S1 distribution used for equating is
m=17 and sd=5.1. The result (the look-up table of original scores and
scaled scores) was applied on a larger sample with m=15 and sd=5.7, which
only has S1. What I am trying to explore is whether the equating is
affected by the smaller sample, as the mean and sd showed it is restricted.
However, there is no way to collect more data at the moment.

Thus, I am planning to pull out cases from the smaller sample with exact
mean and sd as the larger one. Then, re-do the equating and check the
differences between samples.

Does it make sense now? What would you suggest me to do?

Thank you very much.

Sincerely,
Jialin Huang


On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver <[hidden email]>wrote:

> I'm still trying to work out exactly what it is you are trying to
> accomplish.
>
> In the quoted message below, you described what kind of sample you want to
> achieve.  In a later message, you said:
>
> "What I am trying to do is to see the effects of range restriction. That is
> why I have the exact mean and sd."
>
> Looking at the effect of restricted range suggests you are working with
> some
> kind of regression model.  Is that right?  It might help if you backed up a
> few steps, and started by telling us what kind of analysis you are doing in
> the first place, what you saw in the results that made you want to "see the
> effects of range restriction", etc.  It may be that someone will suggest a
> different way of seeing those effects.
>
> HTH.
>
> p.s. - I agree with David's comments about this looking "fishy".
>
>
>
> huang jialin wrote
> >
> > Hi everyone,
> >
> > Thanks for your reply. Let me elaborate what I am planning to do.
> >
> > I have a dataset of 1000 cases, considering it as a population. M= 17, SD
> > =
> > 5.1. I am trying to pull out a sample size of roughly 300 cases, but the
> > mean need to be around 15, and SD is around 5.7.
> >
> > I was wondering whether SPSS has any syntax that I can use. Your helps
> are
> > very appreciated.
> >
> > Thank you again.
> >
> > Sincerely,
> > Jialin Huang
> >
> >
> >
> > On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@> wrote:
> >
> >> Should have said you do that in syntax.****
> >>
> >> ** **
> >>
> >> From data editor:****
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> File > New > Syntax****
> >>
> >> ** **
> >>
> >> . . to open a new syntax file.  Write the command, but make sure you put
> >> a
> >> full stop (period) at the end of it, then press the green triangle
> >> etc.***
> >> *
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> Email:     johnfhall@ ****
> >>
> >> Website: www.surveyresearch.weebly.com
> >> <http://surveyresearch.weebly.com/>
> >> ****
> >>
> >> Skype:   surveyresearcher1****
> >>
> >> Phone:    (+33) (0) 2.33.45.91.47****
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> *From:* John F Hall [mailto:johnfhall@]
> >> *Sent:* 24 January 2012 18:43
> >> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
> >> *Subject:* RE: sampling with fixed mean and SD****
> >>
> >> ** **
> >>
> >> You can sample in SPSS with:****
> >>
> >> ** **
> >>
> >> sample <n> from <N>****
> >>
> >> ** **
> >>
> >> where n is the sample size you want and N is the number of cases in the
> >> data set, or you can use:****
> >>
> >> ** **
> >>
> >> sample <p> ****
> >>
> >> ** **
> >>
> >> where p is the proportion you want to sample expressed as a decimal.****
> >>
> >> ** **
> >>
> >> John Hall****
> >>
> >> ** **
> >>
> >> Email:     johnfhall@ ****
> >>
> >> Website: www.surveyresearch.weebly.com
> >> <http://surveyresearch.weebly.com/>
> >> ****
> >>
> >> Skype:   surveyresearcher1****
> >>
> >> Phone:    (+33) (0) 2.33.45.91.47****
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> ** **
> >>
> >> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf
> >> Of *huang jialin
> >> *Sent:* 24 January 2012 17:46
> >> *To:* SPSSX-L@.UGA
> >> *Subject:* sampling with fixed mean and SD****
> >>
> >> ** **
> >>
> >> Hi,****
> >>
> >> ** **
> >>
> >> I am planning to sample cases from a known dataset with fixed mean and
> >> SD.
> >> The sample size is from 300-500. The replacement is not allowed. Can I
> do
> >> it in SPSS? If so, how can I do it? ****
> >>
> >> ** **
> >>
> >> Thank you for your attention.****
> >>
> >> ** **
> >>
> >> Sincerely,****
> >>
> >> Jialin Huang****
> >>
> >> ** **
> >>
> >
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

huang jialin
David,

Thanks all the same.

Jialin Huang


On Thu, Jan 26, 2012 at 3:38 AM, David Marso <[hidden email]> wrote:
Somewhat different approach which will possibly work better for discrete
(integer valued) distributions.
Again, YMMV and I make *NO* guarantee re the validity or advisability of
this approach since I really have no freaking idea of what the hell you are
really up to, the specifics of your distributions etc...
If you can't figure out what the code is doing then refer to the manual.  If
that fails, DELETE it and pretend that it doesn't exist!!!
---
input program.
loop I=1 to 300.
compute Y=RND(RV.NORMAL(15,5.7)).
end case.
end loop.
end file.
end input program.
SAVE OUTFILE "test.sav".
DESC Y.
FREQ Y.
AGGREGATE OUTFILE "testAGG.sav" / BREAK Y / N=N.

input program.
loop I=1 to 1000.
compute Y=RND(RV.NORMAL(17,5.1)).
COMPUTE OTHERDAT=UNIFORM(1).
end case.
end loop.
end file.
end input program.
DESC Y.
FREQ Y.

COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY Y SCRAMBLE.
MATCH FILES / FILE * / IN=RAW / FILE="testAGG.sav" /IN=TAB/ BY Y.
COMPUTE COUNTER=SUM(LAG(COUNTER)*(Y EQ LAG(Y)),  RAW).
COMPUTE ReqN=N.
IF SYSMIS(ReqN) ReqN=LAG(ReqN).
COMPUTE DRAWN=RAW AND COUNTER LE ReqN.
TEMPORARY.
SELECT IF DRAWN.
DESC Y.



David Marso wrote
>
> "Thus, I am planning to pull out cases from the smaller sample with exact
> mean and sd as the larger one. Then, re-do the equating and check the
> differences between samples."
>
> I submit the following with some hesitation and do not plan to provide
> exact code but only a general description.  Say you have 2 files: file1
> and file2.  File1 contains the sample you wish to replicate the
> distribution from the cases in file2.
>
> ADD FILES / FILE 'file1' /IN=FLAG/ FILE='file2'.
> COMPUTE TAKEN=0.
> **BARELY TESTED/ YMMV/ GOOD LUCK...
> * Repeat the following code as needed*.
> SORT CASES BY TAKEN Y.
> COMPUTE GRAB=NOT(FLAG) AND LAG(FLAG).
> CREATE GRABFROM=LEAD(GRAB,1).
> COMPUTE TAKEN=TAKEN OR GRAB OR GRABFROM.
>
> Gotta Go, It's Miller time!
> --
> HTH, David
>
>
>
>
> huang jialin wrote
>>
>> Bruce,
>>
>> Thanks for your suggestion. I will try to make it more clearer.
>>
>> I  did a linear equating on two sets of score (S1 and S2). They were
>> collected from the same sample. The S1 distribution used for equating is
>> m=17 and sd=5.1. The result (the look-up table of original scores and
>> scaled scores) was applied on a larger sample with m=15 and sd=5.7, which
>> only has S1. What I am trying to explore is whether the equating is
>> affected by the smaller sample, as the mean and sd showed it is
>> restricted.
>> However, there is no way to collect more data at the moment.
>>
>> Thus, I am planning to pull out cases from the smaller sample with exact
>> mean and sd as the larger one. Then, re-do the equating and check the
>> differences between samples.
>>
>> Does it make sense now? What would you suggest me to do?
>>
>> Thank you very much.
>>
>> Sincerely,
>> Jialin Huang
>>
>>
>> On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver &lt;bruce.weaver@&gt;wrote:
>>
>>> I'm still trying to work out exactly what it is you are trying to
>>> accomplish.
>>>
>>> In the quoted message below, you described what kind of sample you want
>>> to
>>> achieve.  In a later message, you said:
>>>
>>> "What I am trying to do is to see the effects of range restriction. That
>>> is
>>> why I have the exact mean and sd."
>>>
>>> Looking at the effect of restricted range suggests you are working with
>>> some
>>> kind of regression model.  Is that right?  It might help if you backed
>>> up a
>>> few steps, and started by telling us what kind of analysis you are doing
>>> in
>>> the first place, what you saw in the results that made you want to "see
>>> the
>>> effects of range restriction", etc.  It may be that someone will suggest
>>> a
>>> different way of seeing those effects.
>>>
>>> HTH.
>>>
>>> p.s. - I agree with David's comments about this looking "fishy".
>>>
>>>
>>>
>>> huang jialin wrote
>>> >
>>> > Hi everyone,
>>> >
>>> > Thanks for your reply. Let me elaborate what I am planning to do.
>>> >
>>> > I have a dataset of 1000 cases, considering it as a population. M= 17,
>>> SD
>>> > =
>>> > 5.1. I am trying to pull out a sample size of roughly 300 cases, but
>>> the
>>> > mean need to be around 15, and SD is around 5.7.
>>> >
>>> > I was wondering whether SPSS has any syntax that I can use. Your helps
>>> are
>>> > very appreciated.
>>> >
>>> > Thank you again.
>>> >
>>> > Sincerely,
>>> > Jialin Huang
>>> >
>>> >
>>> >
>>> > On Tue, Jan 24, 2012 at 11:54 AM, John F Hall &lt;johnfhall@&gt;
>>> wrote:
>>> >
>>> >> Should have said you do that in syntax.****
>>> >>
>>> >> ** **
>>> >>
>>> >> From data editor:****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> File > New > Syntax****
>>> >>
>>> >> ** **
>>> >>
>>> >> . . to open a new syntax file.  Write the command, but make sure you
>>> put
>>> >> a
>>> >> full stop (period) at the end of it, then press the green triangle
>>> >> etc.***
>>> >> *
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> &lt;http://surveyresearch.weebly.com/&gt;
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147">(+33) (0) 2.33.45.91.47****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* John F Hall [mailto:[hidden email]]
>>> >> *Sent:* 24 January 2012 18:43
>>> >> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
>>> >> *Subject:* RE: sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> You can sample in SPSS with:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <n> from <N>****
>>> >>
>>> >> ** **
>>> >>
>>> >> where n is the sample size you want and N is the number of cases in
>>> the
>>> >> data set, or you can use:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <p> ****
>>> >>
>>> >> ** **
>>> >>
>>> >> where p is the proportion you want to sample expressed as a
>>> decimal.****
>>> >>
>>> >> ** **
>>> >>
>>> >> John Hall****
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> &lt;http://surveyresearch.weebly.com/&gt;
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" value="+33233459147">(+33) (0) 2.33.45.91.47****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* SPSSX(r) Discussion [mailto:[hidden email].UGA] *On Behalf
>>> >> Of *huang jialin
>>> >> *Sent:* 24 January 2012 17:46
>>> >> *To:* SPSSX-L@.UGA
>>> >> *Subject:* sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> Hi,****
>>> >>
>>> >> ** **
>>> >>
>>> >> I am planning to sample cases from a known dataset with fixed mean
>>> and
>>> >> SD.
>>> >> The sample size is from 300-500. The replacement is not allowed. Can
>>> I
>>> do
>>> >> it in SPSS? If so, how can I do it? ****
>>> >>
>>> >> ** **
>>> >>
>>> >> Thank you for your attention.****
>>> >>
>>> >> ** **
>>> >>
>>> >> Sincerely,****
>>> >>
>>> >> Jialin Huang****
>>> >>
>>> >> ** **
>>> >>
>>> >
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> bweaver@
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5432396.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

Garry Gelade

Similar idea to David.

 

Generate a file of 300 random numbers using COMPUTE Y= RV.NORMAL(15,5.7) as he has described (but no need to round)

Then use the SPSS FUZZY command to pull out cases from your database that approximately match on Y.

This is just a suggestion, which I have not tried, but perhaps worth looking into.  You will need to install Python and the SPSS FUZZY extension if you haven’t already done so.

 

Garry Gelade

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 26 January 2012 16:42
To: [hidden email]
Subject: Re: sampling with fixed mean and SD

 

David,

 

Thanks all the same.

 

Jialin Huang

 

On Thu, Jan 26, 2012 at 3:38 AM, David Marso <[hidden email]> wrote:

Somewhat different approach which will possibly work better for discrete
(integer valued) distributions.
Again, YMMV and I make *NO* guarantee re the validity or advisability of
this approach since I really have no freaking idea of what the hell you are
really up to, the specifics of your distributions etc...
If you can't figure out what the code is doing then refer to the manual.  If
that fails, DELETE it and pretend that it doesn't exist!!!
---
input program.
loop I=1 to 300.
compute Y=RND(RV.NORMAL(15,5.7)).
end case.
end loop.
end file.
end input program.
SAVE OUTFILE "test.sav".
DESC Y.
FREQ Y.
AGGREGATE OUTFILE "testAGG.sav" / BREAK Y / N=N.

input program.
loop I=1 to 1000.
compute Y=RND(RV.NORMAL(17,5.1)).
COMPUTE OTHERDAT=UNIFORM(1).
end case.
end loop.
end file.
end input program.
DESC Y.
FREQ Y.

COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY Y SCRAMBLE.
MATCH FILES / FILE * / IN=RAW / FILE="testAGG.sav" /IN=TAB/ BY Y.
COMPUTE COUNTER=SUM(LAG(COUNTER)*(Y EQ LAG(Y)),  RAW).
COMPUTE ReqN=N.
IF SYSMIS(ReqN) ReqN=LAG(ReqN).
COMPUTE DRAWN=RAW AND COUNTER LE ReqN.
TEMPORARY.
SELECT IF DRAWN.
DESC Y.



David Marso wrote

>


> "Thus, I am planning to pull out cases from the smaller sample with exact
> mean and sd as the larger one. Then, re-do the equating and check the
> differences between samples."
>
> I submit the following with some hesitation and do not plan to provide
> exact code but only a general description.  Say you have 2 files: file1
> and file2.  File1 contains the sample you wish to replicate the
> distribution from the cases in file2.
>
> ADD FILES / FILE 'file1' /IN=FLAG/ FILE='file2'.
> COMPUTE TAKEN=0.
> **BARELY TESTED/ YMMV/ GOOD LUCK...
> * Repeat the following code as needed*.
> SORT CASES BY TAKEN Y.
> COMPUTE GRAB=NOT(FLAG) AND LAG(FLAG).
> CREATE GRABFROM=LEAD(GRAB,1).
> COMPUTE TAKEN=TAKEN OR GRAB OR GRABFROM.
>
> Gotta Go, It's Miller time!
> --
> HTH, David
>
>
>
>
> huang jialin wrote
>>
>> Bruce,
>>
>> Thanks for your suggestion. I will try to make it more clearer.
>>
>> I  did a linear equating on two sets of score (S1 and S2). They were
>> collected from the same sample. The S1 distribution used for equating is
>> m=17 and sd=5.1. The result (the look-up table of original scores and
>> scaled scores) was applied on a larger sample with m=15 and sd=5.7, which
>> only has S1. What I am trying to explore is whether the equating is
>> affected by the smaller sample, as the mean and sd showed it is
>> restricted.
>> However, there is no way to collect more data at the moment.
>>
>> Thus, I am planning to pull out cases from the smaller sample with exact
>> mean and sd as the larger one. Then, re-do the equating and check the
>> differences between samples.
>>
>> Does it make sense now? What would you suggest me to do?
>>
>> Thank you very much.
>>
>> Sincerely,
>> Jialin Huang
>>
>>
>> On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver &lt;bruce.weaver@&gt;wrote:
>>
>>> I'm still trying to work out exactly what it is you are trying to
>>> accomplish.
>>>
>>> In the quoted message below, you described what kind of sample you want
>>> to
>>> achieve.  In a later message, you said:
>>>
>>> "What I am trying to do is to see the effects of range restriction. That
>>> is
>>> why I have the exact mean and sd."
>>>
>>> Looking at the effect of restricted range suggests you are working with
>>> some
>>> kind of regression model.  Is that right?  It might help if you backed
>>> up a
>>> few steps, and started by telling us what kind of analysis you are doing
>>> in
>>> the first place, what you saw in the results that made you want to "see
>>> the
>>> effects of range restriction", etc.  It may be that someone will suggest
>>> a
>>> different way of seeing those effects.
>>>
>>> HTH.
>>>
>>> p.s. - I agree with David's comments about this looking "fishy".
>>>
>>>
>>>
>>> huang jialin wrote
>>> >
>>> > Hi everyone,
>>> >
>>> > Thanks for your reply. Let me elaborate what I am planning to do.
>>> >
>>> > I have a dataset of 1000 cases, considering it as a population. M= 17,
>>> SD
>>> > =
>>> > 5.1. I am trying to pull out a sample size of roughly 300 cases, but
>>> the
>>> > mean need to be around 15, and SD is around 5.7.
>>> >
>>> > I was wondering whether SPSS has any syntax that I can use. Your helps
>>> are
>>> > very appreciated.
>>> >
>>> > Thank you again.
>>> >
>>> > Sincerely,
>>> > Jialin Huang
>>> >
>>> >
>>> >
>>> > On Tue, Jan 24, 2012 at 11:54 AM, John F Hall &lt;johnfhall@&gt;
>>> wrote:
>>> >
>>> >> Should have said you do that in syntax.****
>>> >>
>>> >> ** **
>>> >>
>>> >> From data editor:****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> File > New > Syntax****
>>> >>
>>> >> ** **
>>> >>
>>> >> . . to open a new syntax file.  Write the command, but make sure you
>>> put
>>> >> a
>>> >> full stop (period) at the end of it, then press the green triangle
>>> >> etc.***
>>> >> *
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> &lt;http://surveyresearch.weebly.com/&gt;
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47">(+33) (0) 2.33.45.91.47****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* John F Hall [mailto:[hidden email]]
>>> >> *Sent:* 24 January 2012 18:43
>>> >> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
>>> >> *Subject:* RE: sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> You can sample in SPSS with:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <n> from <N>****
>>> >>
>>> >> ** **
>>> >>
>>> >> where n is the sample size you want and N is the number of cases in
>>> the
>>> >> data set, or you can use:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <p> ****
>>> >>
>>> >> ** **
>>> >>
>>> >> where p is the proportion you want to sample expressed as a
>>> decimal.****
>>> >>
>>> >> ** **
>>> >>
>>> >> John Hall****
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> &lt;http://surveyresearch.weebly.com/&gt;
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47">(+33) (0) 2.33.45.91.47****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* SPSSX(r) Discussion [mailto:[hidden email].UGA] *On Behalf
>>> >> Of *huang jialin
>>> >> *Sent:* 24 January 2012 17:46
>>> >> *To:* SPSSX-L@.UGA
>>> >> *Subject:* sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> Hi,****
>>> >>
>>> >> ** **
>>> >>
>>> >> I am planning to sample cases from a known dataset with fixed mean
>>> and
>>> >> SD.
>>> >> The sample size is from 300-500. The replacement is not allowed. Can
>>> I
>>> do
>>> >> it in SPSS? If so, how can I do it? ****
>>> >>
>>> >> ** **
>>> >>
>>> >> Thank you for your attention.****
>>> >>
>>> >> ** **
>>> >>
>>> >> Sincerely,****
>>> >>
>>> >> Jialin Huang****
>>> >>
>>> >> ** **
>>> >>
>>> >
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> bweaver@
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>


--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5432396.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

 

Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

huang jialin
Garry,

Thanks for your response. 

Sincerely,
Jialin Huang


On Fri, Jan 27, 2012 at 8:57 AM, Garry Gelade <[hidden email]> wrote:

Similar idea to David.

 

Generate a file of 300 random numbers using COMPUTE Y= RV.NORMAL(15,5.7) as he has described (but no need to round)

Then use the SPSS FUZZY command to pull out cases from your database that approximately match on Y.

This is just a suggestion, which I have not tried, but perhaps worth looking into.  You will need to install Python and the SPSS FUZZY extension if you haven’t already done so.

 

Garry Gelade

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 26 January 2012 16:42
To: [hidden email]
Subject: Re: sampling with fixed mean and SD

 

David,

 

Thanks all the same.

 

Jialin Huang

 

On Thu, Jan 26, 2012 at 3:38 AM, David Marso <[hidden email]> wrote:

Somewhat different approach which will possibly work better for discrete
(integer valued) distributions.
Again, YMMV and I make *NO* guarantee re the validity or advisability of
this approach since I really have no freaking idea of what the hell you are
really up to, the specifics of your distributions etc...
If you can't figure out what the code is doing then refer to the manual.  If
that fails, DELETE it and pretend that it doesn't exist!!!
---
input program.
loop I=1 to 300.
compute Y=RND(RV.NORMAL(15,5.7)).
end case.
end loop.
end file.
end input program.
SAVE OUTFILE "test.sav".
DESC Y.
FREQ Y.
AGGREGATE OUTFILE "testAGG.sav" / BREAK Y / N=N.

input program.
loop I=1 to 1000.
compute Y=RND(RV.NORMAL(17,5.1)).
COMPUTE OTHERDAT=UNIFORM(1).
end case.
end loop.
end file.
end input program.
DESC Y.
FREQ Y.

COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY Y SCRAMBLE.
MATCH FILES / FILE * / IN=RAW / FILE="testAGG.sav" /IN=TAB/ BY Y.
COMPUTE COUNTER=SUM(LAG(COUNTER)*(Y EQ LAG(Y)),  RAW).
COMPUTE ReqN=N.
IF SYSMIS(ReqN) ReqN=LAG(ReqN).
COMPUTE DRAWN=RAW AND COUNTER LE ReqN.
TEMPORARY.
SELECT IF DRAWN.
DESC Y.



David Marso wrote

>
> "Thus, I am planning to pull out cases from the smaller sample with exact
> mean and sd as the larger one. Then, re-do the equating and check the
> differences between samples."
>
> I submit the following with some hesitation and do not plan to provide
> exact code but only a general description.  Say you have 2 files: file1
> and file2.  File1 contains the sample you wish to replicate the
> distribution from the cases in file2.
>
> ADD FILES / FILE 'file1' /IN=FLAG/ FILE='file2'.
> COMPUTE TAKEN=0.
> **BARELY TESTED/ YMMV/ GOOD LUCK...
> * Repeat the following code as needed*.
> SORT CASES BY TAKEN Y.
> COMPUTE GRAB=NOT(FLAG) AND LAG(FLAG).
> CREATE GRABFROM=LEAD(GRAB,1).
> COMPUTE TAKEN=TAKEN OR GRAB OR GRABFROM.
>
> Gotta Go, It's Miller time!
> --
> HTH, David


>
>
>
>
> huang jialin wrote
>>
>> Bruce,
>>
>> Thanks for your suggestion. I will try to make it more clearer.
>>
>> I  did a linear equating on two sets of score (S1 and S2). They were
>> collected from the same sample. The S1 distribution used for equating is
>> m=17 and sd=5.1. The result (the look-up table of original scores and
>> scaled scores) was applied on a larger sample with m=15 and sd=5.7, which
>> only has S1. What I am trying to explore is whether the equating is
>> affected by the smaller sample, as the mean and sd showed it is
>> restricted.
>> However, there is no way to collect more data at the moment.
>>
>> Thus, I am planning to pull out cases from the smaller sample with exact
>> mean and sd as the larger one. Then, re-do the equating and check the
>> differences between samples.
>>
>> Does it make sense now? What would you suggest me to do?
>>
>> Thank you very much.
>>
>> Sincerely,
>> Jialin Huang
>>
>>
>> On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver &lt;bruce.weaver@&gt;wrote:
>>
>>> I'm still trying to work out exactly what it is you are trying to
>>> accomplish.
>>>
>>> In the quoted message below, you described what kind of sample you want
>>> to
>>> achieve.  In a later message, you said:
>>>
>>> "What I am trying to do is to see the effects of range restriction. That
>>> is
>>> why I have the exact mean and sd."
>>>
>>> Looking at the effect of restricted range suggests you are working with
>>> some
>>> kind of regression model.  Is that right?  It might help if you backed
>>> up a
>>> few steps, and started by telling us what kind of analysis you are doing
>>> in
>>> the first place, what you saw in the results that made you want to "see
>>> the
>>> effects of range restriction", etc.  It may be that someone will suggest
>>> a
>>> different way of seeing those effects.
>>>
>>> HTH.
>>>
>>> p.s. - I agree with David's comments about this looking "fishy".
>>>
>>>
>>>
>>> huang jialin wrote
>>> >
>>> > Hi everyone,
>>> >
>>> > Thanks for your reply. Let me elaborate what I am planning to do.
>>> >
>>> > I have a dataset of 1000 cases, considering it as a population. M= 17,
>>> SD
>>> > =
>>> > 5.1. I am trying to pull out a sample size of roughly 300 cases, but
>>> the
>>> > mean need to be around 15, and SD is around 5.7.
>>> >
>>> > I was wondering whether SPSS has any syntax that I can use. Your helps
>>> are
>>> > very appreciated.
>>> >
>>> > Thank you again.
>>> >
>>> > Sincerely,
>>> > Jialin Huang
>>> >
>>> >
>>> >
>>> > On Tue, Jan 24, 2012 at 11:54 AM, John F Hall &lt;johnfhall@&gt;
>>> wrote:
>>> >
>>> >> Should have said you do that in syntax.****
>>> >>
>>> >> ** **
>>> >>
>>> >> From data editor:****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> File > New > Syntax****
>>> >>
>>> >> ** **
>>> >>
>>> >> . . to open a new syntax file.  Write the command, but make sure you
>>> put
>>> >> a
>>> >> full stop (period) at the end of it, then press the green triangle
>>> >> etc.***
>>> >> *
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> &lt;http://surveyresearch.weebly.com/&gt;
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* John F Hall [mailto:[hidden email]]
>>> >> *Sent:* 24 January 2012 18:43
>>> >> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
>>> >> *Subject:* RE: sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> You can sample in SPSS with:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <n> from <N>****
>>> >>
>>> >> ** **
>>> >>
>>> >> where n is the sample size you want and N is the number of cases in
>>> the
>>> >> data set, or you can use:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <p> ****
>>> >>
>>> >> ** **
>>> >>
>>> >> where p is the proportion you want to sample expressed as a
>>> decimal.****
>>> >>
>>> >> ** **
>>> >>
>>> >> John Hall****
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> &lt;http://surveyresearch.weebly.com/&gt;
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    <a href="tel:%28%2B33%29%20%280%29%202.33.45.91.47" target="_blank">(+33) (0) 2.33.45.91.47****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* SPSSX(r) Discussion [mailto:[hidden email].UGA] *On Behalf
>>> >> Of *huang jialin
>>> >> *Sent:* 24 January 2012 17:46
>>> >> *To:* SPSSX-L@.UGA
>>> >> *Subject:* sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> Hi,****
>>> >>
>>> >> ** **
>>> >>
>>> >> I am planning to sample cases from a known dataset with fixed mean
>>> and
>>> >> SD.
>>> >> The sample size is from 300-500. The replacement is not allowed. Can
>>> I
>>> do
>>> >> it in SPSS? If so, how can I do it? ****
>>> >>
>>> >> ** **
>>> >>
>>> >> Thank you for your attention.****
>>> >>
>>> >> ** **
>>> >>
>>> >> Sincerely,****
>>> >>
>>> >> Jialin Huang****
>>> >>
>>> >> ** **
>>> >>
>>> >
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> bweaver@
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>


--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5432396.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

 


Reply | Threaded
Open this post in threaded view
|

Re: sampling with fixed mean and SD

David Marso
Administrator
In reply to this post by Garry Gelade
The intention of my posting was to use the available data as the source table for the sampling ;-)
No sim needed!
Garry Gelade wrote
Similar idea to David.

 

Generate a file of 300 random numbers using COMPUTE Y= RV.NORMAL(15,5.7) as he has described (but no need to round)

Then use the SPSS FUZZY command to pull out cases from your database that approximately match on Y.

This is just a suggestion, which I have not tried, but perhaps worth looking into.  You will need to install Python and the SPSS FUZZY extension if you haven’t already done so.

 

Garry Gelade

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of huang jialin
Sent: 26 January 2012 16:42
To: [hidden email]
Subject: Re: sampling with fixed mean and SD

 

David,

 

Thanks all the same.

 

Jialin Huang

 

On Thu, Jan 26, 2012 at 3:38 AM, David Marso <[hidden email]> wrote:

Somewhat different approach which will possibly work better for discrete
(integer valued) distributions.
Again, YMMV and I make *NO* guarantee re the validity or advisability of
this approach since I really have no freaking idea of what the hell you are
really up to, the specifics of your distributions etc...
If you can't figure out what the code is doing then refer to the manual.  If
that fails, DELETE it and pretend that it doesn't exist!!!
---
input program.
loop I=1 to 300.
compute Y=RND(RV.NORMAL(15,5.7)).
end case.
end loop.
end file.
end input program.
SAVE OUTFILE "test.sav".
DESC Y.
FREQ Y.
AGGREGATE OUTFILE "testAGG.sav" / BREAK Y / N=N.

input program.
loop I=1 to 1000.
compute Y=RND(RV.NORMAL(17,5.1)).
COMPUTE OTHERDAT=UNIFORM(1).
end case.
end loop.
end file.
end input program.
DESC Y.
FREQ Y.

COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY Y SCRAMBLE.
MATCH FILES / FILE * / IN=RAW / FILE="testAGG.sav" /IN=TAB/ BY Y.
COMPUTE COUNTER=SUM(LAG(COUNTER)*(Y EQ LAG(Y)),  RAW).
COMPUTE ReqN=N.
IF SYSMIS(ReqN) ReqN=LAG(ReqN).
COMPUTE DRAWN=RAW AND COUNTER LE ReqN.
TEMPORARY.
SELECT IF DRAWN.
DESC Y.



David Marso wrote

>
> "Thus, I am planning to pull out cases from the smaller sample with exact
> mean and sd as the larger one. Then, re-do the equating and check the
> differences between samples."
>
> I submit the following with some hesitation and do not plan to provide
> exact code but only a general description.  Say you have 2 files: file1
> and file2.  File1 contains the sample you wish to replicate the
> distribution from the cases in file2.
>
> ADD FILES / FILE 'file1' /IN=FLAG/ FILE='file2'.
> COMPUTE TAKEN=0.
> **BARELY TESTED/ YMMV/ GOOD LUCK...
> * Repeat the following code as needed*.
> SORT CASES BY TAKEN Y.
> COMPUTE GRAB=NOT(FLAG) AND LAG(FLAG).
> CREATE GRABFROM=LEAD(GRAB,1).
> COMPUTE TAKEN=TAKEN OR GRAB OR GRABFROM.
>
> Gotta Go, It's Miller time!
> --
> HTH, David
>
>
>
>
> huang jialin wrote
>>
>> Bruce,
>>
>> Thanks for your suggestion. I will try to make it more clearer.
>>
>> I  did a linear equating on two sets of score (S1 and S2). They were
>> collected from the same sample. The S1 distribution used for equating is
>> m=17 and sd=5.1. The result (the look-up table of original scores and
>> scaled scores) was applied on a larger sample with m=15 and sd=5.7, which
>> only has S1. What I am trying to explore is whether the equating is
>> affected by the smaller sample, as the mean and sd showed it is
>> restricted.
>> However, there is no way to collect more data at the moment.
>>
>> Thus, I am planning to pull out cases from the smaller sample with exact
>> mean and sd as the larger one. Then, re-do the equating and check the
>> differences between samples.
>>
>> Does it make sense now? What would you suggest me to do?
>>
>> Thank you very much.
>>
>> Sincerely,
>> Jialin Huang
>>
>>
>> On Tue, Jan 24, 2012 at 4:55 PM, Bruce Weaver <bruce.weaver@>wrote:
>>
>>> I'm still trying to work out exactly what it is you are trying to
>>> accomplish.
>>>
>>> In the quoted message below, you described what kind of sample you want
>>> to
>>> achieve.  In a later message, you said:
>>>
>>> "What I am trying to do is to see the effects of range restriction. That
>>> is
>>> why I have the exact mean and sd."
>>>
>>> Looking at the effect of restricted range suggests you are working with
>>> some
>>> kind of regression model.  Is that right?  It might help if you backed
>>> up a
>>> few steps, and started by telling us what kind of analysis you are doing
>>> in
>>> the first place, what you saw in the results that made you want to "see
>>> the
>>> effects of range restriction", etc.  It may be that someone will suggest
>>> a
>>> different way of seeing those effects.
>>>
>>> HTH.
>>>
>>> p.s. - I agree with David's comments about this looking "fishy".
>>>
>>>
>>>
>>> huang jialin wrote
>>> >
>>> > Hi everyone,
>>> >
>>> > Thanks for your reply. Let me elaborate what I am planning to do.
>>> >
>>> > I have a dataset of 1000 cases, considering it as a population. M= 17,
>>> SD
>>> > =
>>> > 5.1. I am trying to pull out a sample size of roughly 300 cases, but
>>> the
>>> > mean need to be around 15, and SD is around 5.7.
>>> >
>>> > I was wondering whether SPSS has any syntax that I can use. Your helps
>>> are
>>> > very appreciated.
>>> >
>>> > Thank you again.
>>> >
>>> > Sincerely,
>>> > Jialin Huang
>>> >
>>> >
>>> >
>>> > On Tue, Jan 24, 2012 at 11:54 AM, John F Hall <johnfhall@>
>>> wrote:
>>> >
>>> >> Should have said you do that in syntax.****
>>> >>
>>> >> ** **
>>> >>
>>> >> From data editor:****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> File > New > Syntax****
>>> >>
>>> >> ** **
>>> >>
>>> >> . . to open a new syntax file.  Write the command, but make sure you
>>> put
>>> >> a
>>> >> full stop (period) at the end of it, then press the green triangle
>>> >> etc.***
>>> >> *
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> <http://surveyresearch.weebly.com/ <http://surveyresearch.weebly.com/&gt> >
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    (+33) (0) 2.33.45.91.47 <tel:%28%2B33%29%20%280%29%202.33.45.91.47> ****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* John F Hall [mailto:johnfhall@]
>>> >> *Sent:* 24 January 2012 18:43
>>> >> *To:* 'huang jialin'; 'SPSSX-L@.UGA'
>>> >> *Subject:* RE: sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> You can sample in SPSS with:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <n> from <N>****
>>> >>
>>> >> ** **
>>> >>
>>> >> where n is the sample size you want and N is the number of cases in
>>> the
>>> >> data set, or you can use:****
>>> >>
>>> >> ** **
>>> >>
>>> >> sample <p> ****
>>> >>
>>> >> ** **
>>> >>
>>> >> where p is the proportion you want to sample expressed as a
>>> decimal.****
>>> >>
>>> >> ** **
>>> >>
>>> >> John Hall****
>>> >>
>>> >> ** **
>>> >>
>>> >> Email:     johnfhall@ ****
>>> >>
>>> >> Website: www.surveyresearch.weebly.com
>>> >> <http://surveyresearch.weebly.com/ <http://surveyresearch.weebly.com/&gt> >
>>> >> ****
>>> >>
>>> >> Skype:   surveyresearcher1****
>>> >>
>>> >> Phone:    (+33) (0) 2.33.45.91.47 <tel:%28%2B33%29%20%280%29%202.33.45.91.47> ****
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> ** **
>>> >>
>>> >> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf
>>> >> Of *huang jialin
>>> >> *Sent:* 24 January 2012 17:46
>>> >> *To:* SPSSX-L@.UGA
>>> >> *Subject:* sampling with fixed mean and SD****
>>> >>
>>> >> ** **
>>> >>
>>> >> Hi,****
>>> >>
>>> >> ** **
>>> >>
>>> >> I am planning to sample cases from a known dataset with fixed mean
>>> and
>>> >> SD.
>>> >> The sample size is from 300-500. The replacement is not allowed. Can
>>> I
>>> do
>>> >> it in SPSS? If so, how can I do it? ****
>>> >>
>>> >> ** **
>>> >>
>>> >> Thank you for your attention.****
>>> >>
>>> >> ** **
>>> >>
>>> >> Sincerely,****
>>> >>
>>> >> Jialin Huang****
>>> >>
>>> >> ** **
>>> >>
>>> >
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> bweaver@
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5428999.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>


--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/sampling-with-fixed-mean-and-SD-tp5315312p5432396.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
12