How to create a large data set with varying proportions for different categories

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to create a large data set with varying proportions for different categories

msherman

Dear List:  I am in the process of creating a dummy data set of 10,000 cases. The different categories need to be apportioned according to the following weights.

Proportion of 1’s   = .13

Proportion of 2’s   = .14

Proportion of 3’s   = .13

Proportion of 4’s   = .24

Proportion of 5’s   =  .20

Proportion of 6’s  = .16

That is, in the end I want to have 1300  1’s,   1400 2’s,  1300 3’s,    2400 4’s, 2000 5’s and 1600 6’s.

If someone knows how to do this without much bother I would appreciate. Thanks,  mfs

`

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of  Masters Education in Psychology: Thesis Track

Maryland

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

410-617-2417

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: How to create a large data set with varying proportions for different categories

Maguin, Eugene

The main question is whether you want EXACT proportions or expected proportions. If exact proportions, then you just recode the looping variable in your Input program code. If expected proportions, then add a new variable that is computed as the result of a draw from a uniform distribution and recode that variable as in

Compute y=rv.uniform(0,1).

Recode y(…

 

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: Wednesday, February 20, 2013 11:03 AM
To: [hidden email]
Subject: How to create a large data set with varying proportions for different categories

 

Dear List:  I am in the process of creating a dummy data set of 10,000 cases. The different categories need to be apportioned according to the following weights.

Proportion of 1’s   = .13

Proportion of 2’s   = .14

Proportion of 3’s   = .13

Proportion of 4’s   = .24

Proportion of 5’s   =  .20

Proportion of 6’s  = .16

That is, in the end I want to have 1300  1’s,   1400 2’s,  1300 3’s,    2400 4’s, 2000 5’s and 1600 6’s.

If someone knows how to do this without much bother I would appreciate. Thanks,  mfs

`

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of  Masters Education in Psychology: Thesis Track

Maryland

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

410-617-2417

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: How to create a large data set with varying proportions for different categories

David Marso
Administrator
Don't even require an INPUT PROGRAM.
MATRIX is becoming my IP of random choice these days (for small files such as this) ;-)
--
MATRIX.
COMPUTE X=RND({.13;.14;.13;.24;.20;.16} * 10000).
COMPUTE ##=NROW(X).
LOOP #=1 TO ##.
COMPUTE X={X;MAKE(X(#),1,#)}.
END LOOP.
SAVE X((##+1):NROW(X)) / OUTFILE */VARIABLES X.
END MATRIX.
FREQ X.

Maguin, Eugene wrote
The main question is whether you want EXACT proportions or expected proportions. If exact proportions, then you just recode the looping variable in your Input program code. If expected proportions, then add a new variable that is computed as the result of a draw from a uniform distribution and recode that variable as in
Compute y=rv.uniform(0,1).
Recode y(...

Gene Maguin


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: Wednesday, February 20, 2013 11:03 AM
To: [hidden email]
Subject: How to create a large data set with varying proportions for different categories

Dear List:  I am in the process of creating a dummy data set of 10,000 cases. The different categories need to be apportioned according to the following weights.
Proportion of 1's   = .13
Proportion of 2's   = .14
Proportion of 3's   = .13
Proportion of 4's   = .24
Proportion of 5's   =  .20
Proportion of 6's  = .16
That is, in the end I want to have 1300  1's,   1400 2's,  1300 3's,    2400 4's, 2000 5's and 1600 6's.
If someone knows how to do this without much bother I would appreciate. Thanks,  mfs
`

Martin F. Sherman, Ph.D.
Professor of Psychology
Director of  Masters Education in Psychology: Thesis Track
Maryland
Department of Psychology
222 B Beatty Hall
4501 North Charles Street
Baltimore, MD 21210

410-617-2417
[hidden email]<mailto:[hidden email]>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: How to create a large data set with varying proportions for different categories

Andy W
In reply to this post by Maguin, Eugene
FYI Gene's advice about generating a random variable and recoding can be generalized to give exact proportion distributions as well, it basically involves ranking any random variable and then transforming the ranks into the expected distribution. Example below;

***********************************************.
input program.
loop #i = 1 to 10000.
compute rec = #i.
end case.
end loop.
end file.
end input program.

*Need to make a random variable to transform.
compute ran = RV.UNIFORM(0,1).
sort cases by ran.

*Here I specify the cumulative ranking ranges and then assign 1 to 6 if within range.
compute #cum_rank1 = 0.
compute #check = $casenum/10000.
do repeat rank = 1 to 6 /prop = .13 .14 .13 .24 .20 .16.
compute #cum_rank0 = #cum_rank1.
compute #cum_rank1 = #cum_rank1 + prop.
if #check > #cum_rank0 and #check <= #cum_rank1 new = rank.
end repeat.
freq var new.
delete variables ran.
***********************************************.

Also David's MATRIX program is more concise and avoids sorting, and I didn't even know about the reread command (so thanks Tony!)
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/