random sample within variable values

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

random sample within variable values

Greg
Hi everyone,

I have a binary variable and want to select a random sample of cases for one
value. The variable i'm referring to is the following:

Variable One
A: 126,728
B: 10,997

I want to draw a random sample of value A cases limiting them to 10,000 and
leave the cases of value B as is. My goal is for the new variable to look
like this:

Variable One recoded
A: 10,000 (random sample from the original 126,728 cases)
B: 10,997 (the same as original variable)


I appreciate your help!!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

Maguin, Eugene
I'm not understanding what you're wanting to do. Is it that you have two files (A with 126,728 cases and B with 10,997 cases) and you want to draw a sample of exactly 10000 cases from A. And then what about B? How about making a little sample dataset to show what you want to do.
Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Greg
Sent: Monday, January 8, 2018 9:00 AM
To: [hidden email]
Subject: random sample within variable values

Hi everyone,

I have a binary variable and want to select a random sample of cases for one value. The variable i'm referring to is the following:

Variable One
A: 126,728
B: 10,997

I want to draw a random sample of value A cases limiting them to 10,000 and leave the cases of value B as is. My goal is for the new variable to look like this:

Variable One recoded
A: 10,000 (random sample from the original 126,728 cases)
B: 10,997 (the same as original variable)


I appreciate your help!!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

Andy W
In reply to this post by Greg
Example below -- you can just use a DO IF plus the SAMPLE function to do what
you describe. You may also want to check out the FUZZY extension if you want
to match your random folks based on other characteristics.

******************************************************.
*Example data.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO (126728 + 10997).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE V = (Id > 10997).
FREQ V.

*Simple as this.
DO IF V = 1.
SAMPLE 10000 FROM 126728.
END IF.
FREQ V.
******************************************************.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

Jon Peck
Note that Data > Select Cases > Random sample of cases can do this as long as your data are sorted by Variable one.

On Mon, Jan 8, 2018 at 9:55 AM, Andy W <[hidden email]> wrote:
Example below -- you can just use a DO IF plus the SAMPLE function to do what
you describe. You may also want to check out the FUZZY extension if you want
to match your random folks based on other characteristics.

******************************************************.
*Example data.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO (126728 + 10997).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE V = (Id > 10997).
FREQ V.

*Simple as this.
DO IF V = 1.
SAMPLE 10000 FROM 126728.
END IF.
FREQ V.
******************************************************.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

Greg
In reply to this post by Maguin, Eugene
I apologize for the confusion. One have one file (total N size: 137,725). The
variable I’m interested in transforming is Variable A, which is binary, with
category 1: 126,728 cases and category 2: 10,997 cases. I want to draw a
random sample of the above variable where the new categories will be:

New Variable A
category 1: 10,000
Category 2: 10,997

In other words, category 2 of the new variable will remain the same, but
category 1 will change. (Category 1 of the new Variable will be a random
sample totaling only to 10,000 cases).

Hope this is not confusing.




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

David Marso
Administrator
Did you bother to run Andy's code?
You only need the last four lines.
The top part was data simulation.
---

Greg wrote

> I apologize for the confusion. One have one file (total N size: 137,725).
> The
> variable I’m interested in transforming is Variable A, which is binary,
> with
> category 1: 126,728 cases and category 2: 10,997 cases. I want to draw a
> random sample of the above variable where the new categories will be:
>
> New Variable A
> category 1: 10,000
> Category 2: 10,997
>
> In other words, category 2 of the new variable will remain the same, but
> category 1 will change. (Category 1 of the new Variable will be a random
> sample totaling only to 10,000 cases).
>
> Hope this is not confusing.
>
>
>
>
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

Greg
Yes, it worked! Sorry for my delayed response - for some odd reason I could
not reply from my phone.



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: random sample within variable values

Greg
In reply to this post by Andy W
Thank you!



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD