SPSSX Discussion

Quickly Create a Dataset Consisting of Two Variables

Classic

List

Threaded

14 messages Options

Jim Moffitt

Quickly Create a Dataset Consisting of Two Variables

How would one write the syntax to quickly create a 300-case dataset
containing an ID variable where each case has a unique ID ranging from 1
to 300 and a second variable named VBX for which each case receives a
random value ranging from 1 to 25?

I found some code written by Michael Roberts (see below) that will
create the id variable, but how do I create the second variable (VBX)?

Here's Michael's code:

INPUT PROGRAM.
LOOP #I=1 TO 300.
COMPUTE id=#I.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

Thanks.

Marks, Jim

Re: Quickly Create a Dataset Consisting of Two Variables

You can add other commands after compute #id, and the loop will cycle
through them as well.

** tested.
set seed = 2007020511.

INPUT PROGRAM.
LOOP #I=1 TO 300 .
COMPUTE id=#I.
COMPUTE vbx =mod( rnd(uniform(1)*100),25)+1.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

I used the uniform distribution rounded to a whole number, then took the
modulus (25) of that. (The set seed allows you to reproduce the results.

--jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Jim Moffitt
Sent: Monday, February 05, 2007 4:55 PM
To: [hidden email]
Subject: Quickly Create a Dataset Consisting of Two Variables

How would one write the syntax to quickly create a 300-case dataset
containing an ID variable where each case has a unique ID ranging from 1
to 300 and a second variable named VBX for which each case receives a
random value ranging from 1 to 25?

I found some code written by Michael Roberts (see below) that will
create the id variable, but how do I create the second variable (VBX)?

Here's Michael's code:

INPUT PROGRAM.
LOOP #I=1 TO 300.
COMPUTE id=#I.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

Thanks.

Art Kendall-2

Re: Quickly Create a Dataset Consisting of Two Variables

In reply to this post by Jim Moffitt

INPUT PROGRAM.
LOOP id=1 TO 300.
COMPUTE vbx = rv.unifrom(1,25).
COMPUTE IVBX =RND(VBX).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0) vbx (f7.4) IVBX(F2).
FREQUENCIES VARS= IVBX.

Is this what you need?

Art Kendall
Social Research Consultants

Jim Moffitt wrote:

> How would one write the syntax to quickly create a 300-case dataset
> containing an ID variable where each case has a unique ID ranging from 1
> to 300 and a second variable named VBX for which each case receives a
> random value ranging from 1 to 25?
>
> I found some code written by Michael Roberts (see below) that will
> create the id variable, but how do I create the second variable (VBX)?
>
> Here's Michael's code:
>
> INPUT PROGRAM.
> LOOP #I=1 TO 300.
> COMPUTE id=#I.
> END CASE.
> END LOOP.
> END FILE.
> END INPUT PROGRAM.
> FORMATS id (F3.0).
> EXECUTE.
>
> Thanks.
>
>
>

Oliver, Richard

Re: Quickly Create a Dataset Consisting of Two Variables

In reply to this post by Jim Moffitt

There are a variety of random number functions, for example:

compute vbx=rv.uniform(1,25).

will create continuous random uniform values between 1 and 25.

If you want to restrict the values to integers:

compute vbx=rnd(uniform(1,25)).

You can include this command in the input program (before the End Case command) or after the input program.

For more information on random number functions, search for "random number functions" or "random variable functions" in the help.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jim Moffitt
Sent: Monday, February 05, 2007 4:55 PM
To: [hidden email]
Subject: Quickly Create a Dataset Consisting of Two Variables

How would one write the syntax to quickly create a 300-case dataset
containing an ID variable where each case has a unique ID ranging from 1
to 300 and a second variable named VBX for which each case receives a
random value ranging from 1 to 25?

I found some code written by Michael Roberts (see below) that will
create the id variable, but how do I create the second variable (VBX)?

Here's Michael's code:

INPUT PROGRAM.
LOOP #I=1 TO 300.
COMPUTE id=#I.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

Thanks.

Oliver, Richard

Re: Quickly Create a Dataset Consisting of Two Variables

In reply to this post by Marks, Jim

I think compute vbx=rnd(rv.uniform(1,25)) will produce the same result with slightly less code.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marks, Jim
Sent: Monday, February 05, 2007 5:10 PM
To: [hidden email]
Subject: Re: Quickly Create a Dataset Consisting of Two Variables

You can add other commands after compute #id, and the loop will cycle
through them as well.

** tested.
set seed = 2007020511.

INPUT PROGRAM.
LOOP #I=1 TO 300 .
COMPUTE id=#I.
COMPUTE vbx =mod( rnd(uniform(1)*100),25)+1.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

I used the uniform distribution rounded to a whole number, then took the
modulus (25) of that. (The set seed allows you to reproduce the results.

--jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Jim Moffitt
Sent: Monday, February 05, 2007 4:55 PM
To: [hidden email]
Subject: Quickly Create a Dataset Consisting of Two Variables

How would one write the syntax to quickly create a 300-case dataset
containing an ID variable where each case has a unique ID ranging from 1
to 300 and a second variable named VBX for which each case receives a
random value ranging from 1 to 25?

I found some code written by Michael Roberts (see below) that will
create the id variable, but how do I create the second variable (VBX)?

Here's Michael's code:

INPUT PROGRAM.
LOOP #I=1 TO 300.
COMPUTE id=#I.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

Thanks.

Jim Moffitt

Re: Quickly Create a Dataset Consisting of Two Variables

In reply to this post by Art Kendall-2

Thanks, Art this will do the trick.

-----Original Message-----
From: Art Kendall [mailto:[hidden email]]
Sent: Monday, February 05, 2007 5:14 PM
To: Moffitt, James (West)
Cc: [hidden email]
Subject: Re: Quickly Create a Dataset Consisting of Two Variables

INPUT PROGRAM.
LOOP id=1 TO 300.
COMPUTE vbx = rv.unifrom(1,25).
COMPUTE IVBX =RND(VBX).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0) vbx (f7.4) IVBX(F2).
FREQUENCIES VARS= IVBX.

Is this what you need?

Art Kendall
Social Research Consultants

Jim Moffitt wrote:
> How would one write the syntax to quickly create a 300-case dataset
> containing an ID variable where each case has a unique ID ranging from

> 1 to 300 and a second variable named VBX for which each case receives
> a random value ranging from 1 to 25?
>
> I found some code written by Michael Roberts (see below) that will
> create the id variable, but how do I create the second variable (VBX)?
>
> Here's Michael's code:
>
> INPUT PROGRAM.
> LOOP #I=1 TO 300.
> COMPUTE id=#I.
> END CASE.
> END LOOP.
> END FILE.
> END INPUT PROGRAM.
> FORMATS id (F3.0).
> EXECUTE.
>
> Thanks.
>
>
>

Jim Moffitt

Re: Quickly Create a Dataset Consisting of Two Variables

In reply to this post by Marks, Jim

Thanks, Jim. This works fine.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marks, Jim
Sent: Monday, February 05, 2007 5:10 PM
To: [hidden email]
Subject: Re: Quickly Create a Dataset Consisting of Two Variables

You can add other commands after compute #id, and the loop will cycle
through them as well.

** tested.
set seed = 2007020511.

INPUT PROGRAM.
LOOP #I=1 TO 300 .
COMPUTE id=#I.
COMPUTE vbx =mod( rnd(uniform(1)*100),25)+1.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

I used the uniform distribution rounded to a whole number, then took the
modulus (25) of that. (The set seed allows you to reproduce the results.

--jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Jim Moffitt
Sent: Monday, February 05, 2007 4:55 PM
To: [hidden email]
Subject: Quickly Create a Dataset Consisting of Two Variables

How would one write the syntax to quickly create a 300-case dataset
containing an ID variable where each case has a unique ID ranging from 1
to 300 and a second variable named VBX for which each case receives a
random value ranging from 1 to 25?

I found some code written by Michael Roberts (see below) that will
create the id variable, but how do I create the second variable (VBX)?

Here's Michael's code:

INPUT PROGRAM.
LOOP #I=1 TO 300.
COMPUTE id=#I.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0).
EXECUTE.

Thanks.

Gary Oliver

If and Count syntax

In reply to this post by Jim Moffitt

Colleagues

I am struggling to combine IF and COUNT. My objective is to create a
separate new variable to record the frequency of a score as 0, 1 or 2
that may occur across 30 existing variables.

The syntax I tried fails at the first line and no rewrites this morning
have overcome the problem.
IF ( A01_ROUND THRU A30_ROUND = 0) A_ROUND_RESPONSEO1 = COUNT A01_ROUND
THRU A30_ROUND (0) .
IF ( A01_ROUND THRU A30_ROUND = 1) A_ROUND_RESPONSEO1 = COUNT A01_ROUND
THRU A30_ROUND (1) .
IF ( A01_ROUND THRU A30_ROUND = 2) A_ROUND_RESPONSEO1 = COUNT A01_ROUND
THRU A30_ROUND (2) .
EXECUTE .

Suggestions greatly appreciated.

Warm regards/gary

Judith Saebel

Re: If and Count syntax

Garry,

I'd do (not tested):

Do IF ( A01_ROUND THRU A30_ROUND = 0.
count A_ROUND_RESPONSEO1 = A01_ROUND THRU A30_ROUND (0) .
end if.

Do IF ( A01_ROUND THRU A30_ROUND = 1).
count A_ROUND_RESPONSEO1 = A01_ROUND THRU A30_ROUND (1).
End if.

Do IF ( A01_ROUND THRU A30_ROUND = 2).
count A_ROUND_RESPONSEO1 = A01_ROUND THRU A30_ROUND (2).
End if.

HTH,

Judith
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gary Oliver
Sent: Tuesday, 6 February 2007 11:32
To: [hidden email]
Subject: If and Count syntax

Colleagues

I am struggling to combine IF and COUNT. My objective is to create a
separate new variable to record the frequency of a score as 0, 1 or 2
that may occur across 30 existing variables.

The syntax I tried fails at the first line and no rewrites this morning
have overcome the problem.
IF ( A01_ROUND THRU A30_ROUND = 0) A_ROUND_RESPONSEO1 = COUNT A01_ROUND
THRU A30_ROUND (0) .
IF ( A01_ROUND THRU A30_ROUND = 1) A_ROUND_RESPONSEO1 = COUNT A01_ROUND
THRU A30_ROUND (1) .
IF ( A01_ROUND THRU A30_ROUND = 2) A_ROUND_RESPONSEO1 = COUNT A01_ROUND
THRU A30_ROUND (2) .
EXECUTE .

Suggestions greatly appreciated.

Warm regards/gary

Richard Ristow

Re: If and Count syntax

In reply to this post by Gary Oliver

At 08:01 PM 2/5/2007, Gary Oliver wrote:

>The syntax I tried [reformatted, below] fails at the first line and no
>rewrites this morning have overcome the problem.
>IF ( A01_ROUND THRU A30_ROUND = 0)
> A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (0) .
>IF ( A01_ROUND THRU A30_ROUND = 1)
> A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (1) .
>IF ( A01_ROUND THRU A30_ROUND = 2)
> A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (2) .

Skipping the syntax for a moment, I'm not sure what you do want.

You write,
>To create a to record the frequency of a score as 0, 1 or 2 that may
>occur across 30 existing variables.

If you mean you want, separately, the number of 0's, 1's, and 2's in
the list of variables, then (a) you need the results in three
variables, not the one variable A_ROUND_RESPONSEO1; and (b) you don't
need IF at all. Something like this (not tested); note each COMPUTE
calculates a different variable:

COMPUTE A_ROUND_RESPONSEO0 = COUNT A01_ROUND THRU A30_ROUND (0) .
COMPUTE A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (1) .
COMPUTE A_ROUND_RESPONSEO2 = COUNT A01_ROUND THRU A30_ROUND (2) .

Your syntax problem is, I think, the expressions like
"A01_ROUND THRU A30_ROUND = 0".

That isn't valid, but you've found that out. But if the above isn't
what you wanted (so you don't need IF at all), I'm at a loss what you
mean, so I can't say how to fix it.

It looks like you're trying to say "all of the 30 variables are 0" or
"any one of the 30 variables is 0". For both of these, see below. But I
don't see your computations making sense in either case.

If you mean "all the variables are 0", etc., variable
A_ROUND_RESPONSEO1 will be 30 if all 30 variables have the same one of
the values 0, 1, or 2, and will be unchanged (probably SYSMIS)
otherwise.

If you mean "any of the variables is 0", etc., variable
A_ROUND_RESPONSEO1 will be the number of 0's, 1's, or 2's in your list
of 30 variables; but it will be the count of the last (largest) of
these values which occurs at allk with no indication whether it's 0's,
1's, or 2's that are counted.
..............................
APPENDIX (code not tested):

Test that *all* of A01_ROUND THRU A30_ROUND are 1:

IF ( MIN(A01_ROUND THRU A30_ROUND) EQ 1
AND MIN(A01_ROUND THRU A30_ROUND) EQ 1)

(Thanks, Jan Spousta.) Be careful: this will test 'true' if any of the
30 variables are missing, as long as all the non-missing ones have
value 0. If you want the test to be 'false' if there missing values,
change to

IF ( MIN (A01_ROUND THRU A30_ROUND) EQ 1
AND MIN (A01_ROUND THRU A30_ROUND) EQ 1
AND NMISS(A01_ROUND THRU A30_ROUND) EQ 0)

Test that *any* of A01_ROUND THRU A30_ROUND is 1:

IF ANY(1,A01_ROUND THRU A30_ROUND)

Max Jasper

Re: If and Count syntax

COUNT freq0 = A01_ROUND THRU A30_ROUND (0).
COUNT freq1 = A01_ROUND THRU A30_ROUND (1).
COUNT freq2 = A01_ROUND THRU A30_ROUND (2).

|
|>The syntax I tried [reformatted, below] fails at the first
|line and no
|>rewrites this morning have overcome the problem. IF ( A01_ROUND THRU
|>A30_ROUND = 0)
|> A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (0) . IF (
|>A01_ROUND THRU A30_ROUND = 1)
|> A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (1) . IF (
|>A01_ROUND THRU A30_ROUND = 2)
|> A_ROUND_RESPONSEO1 = COUNT A01_ROUND THRU A30_ROUND (2) .
|

Richard Ristow

Random from 1 to 25 (was, re: Quickly Create a Dataset...)

In reply to this post by Jim Moffitt

At 05:54 PM 2/5/2007, Jim Moffitt wrote:

>How would one create a 300-case dataset where each case has a unique
>ID Ranging from 1 to 300 and a second variable named VBX which
>receives a random value ranging from 1 to 25?

Responders solved this as a programming problem, adding the second
variable in the INPUT PROGRAM.

The following computations were suggested by different posters for
calculating the "random values ranging from 1 to 25." All posters
assumed integers were wanted, which I accept as well.

"Random ... ranging from 1 to 25" in principle allows any distribution,
but is commonly taken to mean equi-distribution, that all values are
equally likely.

One suggested calculation doesn't meet this criterion:
. compute vbx=rnd(RV.uniform(1,25)).
Values 1 and 25 are only half as likely to be selected as are the other
23 values.

Another has, I think, the same problem to lesser degree:
. COMPUTE vbx =mod( rnd(uniform(1)*100),25)+1.
That's the above logic, but applied to the range 1-100 and then
'wrapped', using MOD function, down to 1-25. Each value in range 1-25
corresponds to four values in range 1-100, and only two of those (1 and
100) have low probability of being selected; so the effect is diluted.

I recommend, as being simple and giving the desired equi-distribution,
. COMPUTE VBX = TRUNC(RV.UNIFORM(1,26)).

The following is SPSSX 14 draft output, with some hand editing:

NEW FILE.
INPUT PROGRAM.
. COMPUTE #N_CASES = 10000.
. NUMERIC ID (F5)
/VBX_MOD VBX_RND VBX_TRNC (F3).
. LOOP ID = 1 TO #N_CASES.
. COMPUTE VBX_MOD = mod( rnd(uniform(1)*100),25)+1.
. COMPUTE VBX_RND = rnd(RV.uniform(1,25)).
. COMPUTE VBX_TRNC = TRUNC(RV.UNIFORM(1,26)).
. END CASE.
. END LOOP.
END FILE.
END INPUT PROGRAM.
FREQUENCIES VBX_MOD VBX_RND VBX_TRNC.

Frequencies
|--------------------------|------------------------|
|Output Created |06-FEB-2007 01:49:34 |
|--------------------------|------------------------|
Statistics [suppressed - no missing data]

Frequency Table

VBX_MOD
|-----|-----|---------|-------|-------------|---------------|
| | |Frequency|Percent|Valid Percent|Cumulative |
| | | | | |Percent |
|-----|-----|---------|-------|-------------|---------------|
|Valid|1 |385 |3.9 |3.9 |3.9 |
| |2 |382 |3.8 |3.8 |7.7 |
| |3 |416 |4.2 |4.2 |11.8 |
| |4 |377 |3.8 |3.8 |15.6 |
| |5 |401 |4.0 |4.0 |19.6 |
| |-----|---------|-------|-------------|---------------|
| |6 |393 |3.9 |3.9 |23.5 |
| |7 |424 |4.2 |4.2 |27.8 |
| |8 |404 |4.0 |4.0 |31.8 |
| |9 |398 |4.0 |4.0 |35.8 |
| |10 |420 |4.2 |4.2 |40.0 |
| |-----|---------|-------|-------------|---------------|
| |11 |375 |3.8 |3.8 |43.8 |
| |12 |450 |4.5 |4.5 |48.3 |
| |13 |428 |4.3 |4.3 |52.5 |
| |14 |396 |4.0 |4.0 |56.5 |
| |15 |364 |3.6 |3.6 |60.1 |
| |-----|---------|-------|-------------|---------------|
| |16 |412 |4.1 |4.1 |64.3 |
| |17 |402 |4.0 |4.0 |68.3 |
| |18 |381 |3.8 |3.8 |72.1 |
| |19 |419 |4.2 |4.2 |76.3 |
| |20 |376 |3.8 |3.8 |80.0 |
| |-----|---------|-------|-------------|---------------|
| |21 |397 |4.0 |4.0 |84.0 |
| |22 |411 |4.1 |4.1 |88.1 |
| |23 |420 |4.2 |4.2 |92.3 |
| |24 |353 |3.5 |3.5 |95.8 |
| |25 |416 |4.2 |4.2 |100.0 |
| |-----|---------|-------|-------------|---------------|
| |Total|10000 |100.0 |100.0 | |
|-----|-----|---------|-------|-------------|---------------|

VBX_RND
|-----|-----|---------|-------|-------------|---------------|
| | |Frequency|Percent|Valid Percent|Cumulative |
| | | | | |Percent |
|-----|-----|---------|-------|-------------|---------------|
|Valid|1 |207 |2.1 |2.1 |2.1 |
| |2 |426 |4.3 |4.3 |6.3 |
| |3 |432 |4.3 |4.3 |10.7 |
| |4 |399 |4.0 |4.0 |14.6 |
| |5 |418 |4.2 |4.2 |18.8 |
| |-----|---------|-------|-------------|---------------|
| |6 |422 |4.2 |4.2 |23.0 |
| |7 |420 |4.2 |4.2 |27.2 |
| |8 |415 |4.2 |4.2 |31.4 |
| |9 |421 |4.2 |4.2 |35.6 |
| |10 |424 |4.2 |4.2 |39.8 |
| |-----|---------|-------|-------------|---------------|
| |11 |403 |4.0 |4.0 |43.9 |
| |12 |385 |3.9 |3.9 |47.7 |
| |13 |417 |4.2 |4.2 |51.9 |
| |14 |436 |4.4 |4.4 |56.3 |
| |15 |397 |4.0 |4.0 |60.2 |
| |-----|---------|-------|-------------|---------------|
| |16 |413 |4.1 |4.1 |64.4 |
| |17 |411 |4.1 |4.1 |68.5 |
| |18 |393 |3.9 |3.9 |72.4 |
| |19 |408 |4.1 |4.1 |76.5 |
| |20 |419 |4.2 |4.2 |80.7 |
| |-----|---------|-------|-------------|---------------|
| |21 |422 |4.2 |4.2 |84.9 |
| |22 |387 |3.9 |3.9 |88.8 |
| |23 |434 |4.3 |4.3 |93.1 |
| |24 |464 |4.6 |4.6 |97.7 |
| |25 |227 |2.3 |2.3 |100.0 |
| |-----|---------|-------|-------------|---------------|
| |Total|10000 |100.0 |100.0 | |
|-----|-----|---------|-------|-------------|---------------|

VBX_TRNC
|-----|-----|---------|-------|-------------|---------------|
| | |Frequency|Percent|Valid Percent|Cumulative |
| | | | | |Percent |
|-----|-----|---------|-------|-------------|---------------|
|Valid|1 |391 |3.9 |3.9 |3.9 |
| |2 |441 |4.4 |4.4 |8.3 |
| |3 |407 |4.1 |4.1 |12.4 |
| |4 |407 |4.1 |4.1 |16.5 |
| |5 |399 |4.0 |4.0 |20.5 |
| |-----|---------|-------|-------------|---------------|
| |6 |369 |3.7 |3.7 |24.1 |
| |7 |393 |3.9 |3.9 |28.1 |
| |8 |404 |4.0 |4.0 |32.1 |
| |9 |390 |3.9 |3.9 |36.0 |
| |10 |416 |4.2 |4.2 |40.2 |
| |-----|---------|-------|-------------|---------------|
| |11 |398 |4.0 |4.0 |44.2 |
| |12 |390 |3.9 |3.9 |48.1 |
| |13 |408 |4.1 |4.1 |52.1 |
| |14 |396 |4.0 |4.0 |56.1 |
| |15 |392 |3.9 |3.9 |60.0 |
| |-----|---------|-------|-------------|---------------|
| |16 |425 |4.3 |4.3 |64.3 |
| |17 |435 |4.4 |4.4 |68.6 |
| |18 |379 |3.8 |3.8 |72.4 |
| |19 |418 |4.2 |4.2 |76.6 |
| |20 |401 |4.0 |4.0 |80.6 |
| |-----|---------|-------|-------------|---------------|
| |21 |413 |4.1 |4.1 |84.7 |
| |22 |385 |3.9 |3.9 |88.6 |
| |23 |355 |3.6 |3.6 |92.1 |
| |24 |404 |4.0 |4.0 |96.2 |
| |25 |384 |3.8 |3.8 |100.0 |
| |-----|---------|-------|-------------|---------------|
| |Total|10000 |100.0 |100.0 | |
|-----|-----|---------|-------|-------------|---------------|

Art Kendall-2

Re: Random from 1 to 25 (was, re: Quickly Create a Dataset...)

Richard is right. It should be
COMPUTE IVBX = TRUNC(RV.UNIFORM(1,26)).

Art

Richard Ristow wrote:

> At 05:54 PM 2/5/2007, Jim Moffitt wrote:
>
>> How would one create a 300-case dataset where each case has a unique
>> ID Ranging from 1 to 300 and a second variable named VBX which
>> receives a random value ranging from 1 to 25?
>
> Responders solved this as a programming problem, adding the second
> variable in the INPUT PROGRAM.
>
> The following computations were suggested by different posters for
> calculating the "random values ranging from 1 to 25." All posters
> assumed integers were wanted, which I accept as well.
>
> "Random ... ranging from 1 to 25" in principle allows any
> distribution, but is commonly taken to mean equi-distribution, that
> all values are equally likely.
>
> One suggested calculation doesn't meet this criterion:
> . compute vbx=rnd(RV.uniform(1,25)).
> Values 1 and 25 are only half as likely to be selected as are the
> other 23 values.
>
> Another has, I think, the same problem to lesser degree:
> . COMPUTE vbx =mod( rnd(uniform(1)*100),25)+1.
> That's the above logic, but applied to the range 1-100 and then
> 'wrapped', using MOD function, down to 1-25. Each value in range 1-25
> corresponds to four values in range 1-100, and only two of those (1
> and 100) have low probability of being selected; so the effect is
> diluted.
>
> I recommend, as being simple and giving the desired equi-distribution,
> . COMPUTE VBX = TRUNC(RV.UNIFORM(1,26)).
>
> The following is SPSSX 14 draft output, with some hand editing:
>
> NEW FILE.
> INPUT PROGRAM.
> . COMPUTE #N_CASES = 10000.
> . NUMERIC ID (F5)
> /VBX_MOD VBX_RND VBX_TRNC (F3).
> . LOOP ID = 1 TO #N_CASES.
> . COMPUTE VBX_MOD = mod( rnd(uniform(1)*100),25)+1.
> . COMPUTE VBX_RND = rnd(RV.uniform(1,25)).
> . COMPUTE VBX_TRNC = TRUNC(RV.UNIFORM(1,26)).
> . END CASE.
> . END LOOP.
> END FILE.
> END INPUT PROGRAM.
> FREQUENCIES VBX_MOD VBX_RND VBX_TRNC.
>
> Frequencies
> |--------------------------|------------------------|
> |Output Created |06-FEB-2007 01:49:34 |
> |--------------------------|------------------------|
> Statistics [suppressed - no missing data]
>
> Frequency Table
>
> VBX_MOD
> |-----|-----|---------|-------|-------------|---------------|
> | | |Frequency|Percent|Valid Percent|Cumulative |
> | | | | | |Percent |
> |-----|-----|---------|-------|-------------|---------------|
> |Valid|1 |385 |3.9 |3.9 |3.9 |
> | |2 |382 |3.8 |3.8 |7.7 |
> | |3 |416 |4.2 |4.2 |11.8 |
> | |4 |377 |3.8 |3.8 |15.6 |
> | |5 |401 |4.0 |4.0 |19.6 |
> | |-----|---------|-------|-------------|---------------|
> | |6 |393 |3.9 |3.9 |23.5 |
> | |7 |424 |4.2 |4.2 |27.8 |
> | |8 |404 |4.0 |4.0 |31.8 |
> | |9 |398 |4.0 |4.0 |35.8 |
> | |10 |420 |4.2 |4.2 |40.0 |
> | |-----|---------|-------|-------------|---------------|
> | |11 |375 |3.8 |3.8 |43.8 |
> | |12 |450 |4.5 |4.5 |48.3 |
> | |13 |428 |4.3 |4.3 |52.5 |
> | |14 |396 |4.0 |4.0 |56.5 |
> | |15 |364 |3.6 |3.6 |60.1 |
> | |-----|---------|-------|-------------|---------------|
> | |16 |412 |4.1 |4.1 |64.3 |
> | |17 |402 |4.0 |4.0 |68.3 |
> | |18 |381 |3.8 |3.8 |72.1 |
> | |19 |419 |4.2 |4.2 |76.3 |
> | |20 |376 |3.8 |3.8 |80.0 |
> | |-----|---------|-------|-------------|---------------|
> | |21 |397 |4.0 |4.0 |84.0 |
> | |22 |411 |4.1 |4.1 |88.1 |
> | |23 |420 |4.2 |4.2 |92.3 |
> | |24 |353 |3.5 |3.5 |95.8 |
> | |25 |416 |4.2 |4.2 |100.0 |
> | |-----|---------|-------|-------------|---------------|
> | |Total|10000 |100.0 |100.0 | |
> |-----|-----|---------|-------|-------------|---------------|
>
> VBX_RND
> |-----|-----|---------|-------|-------------|---------------|
> | | |Frequency|Percent|Valid Percent|Cumulative |
> | | | | | |Percent |
> |-----|-----|---------|-------|-------------|---------------|
> |Valid|1 |207 |2.1 |2.1 |2.1 |
> | |2 |426 |4.3 |4.3 |6.3 |
> | |3 |432 |4.3 |4.3 |10.7 |
> | |4 |399 |4.0 |4.0 |14.6 |
> | |5 |418 |4.2 |4.2 |18.8 |
> | |-----|---------|-------|-------------|---------------|
> | |6 |422 |4.2 |4.2 |23.0 |
> | |7 |420 |4.2 |4.2 |27.2 |
> | |8 |415 |4.2 |4.2 |31.4 |
> | |9 |421 |4.2 |4.2 |35.6 |
> | |10 |424 |4.2 |4.2 |39.8 |
> | |-----|---------|-------|-------------|---------------|
> | |11 |403 |4.0 |4.0 |43.9 |
> | |12 |385 |3.9 |3.9 |47.7 |
> | |13 |417 |4.2 |4.2 |51.9 |
> | |14 |436 |4.4 |4.4 |56.3 |
> | |15 |397 |4.0 |4.0 |60.2 |
> | |-----|---------|-------|-------------|---------------|
> | |16 |413 |4.1 |4.1 |64.4 |
> | |17 |411 |4.1 |4.1 |68.5 |
> | |18 |393 |3.9 |3.9 |72.4 |
> | |19 |408 |4.1 |4.1 |76.5 |
> | |20 |419 |4.2 |4.2 |80.7 |
> | |-----|---------|-------|-------------|---------------|
> | |21 |422 |4.2 |4.2 |84.9 |
> | |22 |387 |3.9 |3.9 |88.8 |
> | |23 |434 |4.3 |4.3 |93.1 |
> | |24 |464 |4.6 |4.6 |97.7 |
> | |25 |227 |2.3 |2.3 |100.0 |
> | |-----|---------|-------|-------------|---------------|
> | |Total|10000 |100.0 |100.0 | |
> |-----|-----|---------|-------|-------------|---------------|
>
> VBX_TRNC
> |-----|-----|---------|-------|-------------|---------------|
> | | |Frequency|Percent|Valid Percent|Cumulative |
> | | | | | |Percent |
> |-----|-----|---------|-------|-------------|---------------|
> |Valid|1 |391 |3.9 |3.9 |3.9 |
> | |2 |441 |4.4 |4.4 |8.3 |
> | |3 |407 |4.1 |4.1 |12.4 |
> | |4 |407 |4.1 |4.1 |16.5 |
> | |5 |399 |4.0 |4.0 |20.5 |
> | |-----|---------|-------|-------------|---------------|
> | |6 |369 |3.7 |3.7 |24.1 |
> | |7 |393 |3.9 |3.9 |28.1 |
> | |8 |404 |4.0 |4.0 |32.1 |
> | |9 |390 |3.9 |3.9 |36.0 |
> | |10 |416 |4.2 |4.2 |40.2 |
> | |-----|---------|-------|-------------|---------------|
> | |11 |398 |4.0 |4.0 |44.2 |
> | |12 |390 |3.9 |3.9 |48.1 |
> | |13 |408 |4.1 |4.1 |52.1 |
> | |14 |396 |4.0 |4.0 |56.1 |
> | |15 |392 |3.9 |3.9 |60.0 |
> | |-----|---------|-------|-------------|---------------|
> | |16 |425 |4.3 |4.3 |64.3 |
> | |17 |435 |4.4 |4.4 |68.6 |
> | |18 |379 |3.8 |3.8 |72.4 |
> | |19 |418 |4.2 |4.2 |76.6 |
> | |20 |401 |4.0 |4.0 |80.6 |
> | |-----|---------|-------|-------------|---------------|
> | |21 |413 |4.1 |4.1 |84.7 |
> | |22 |385 |3.9 |3.9 |88.6 |
> | |23 |355 |3.6 |3.6 |92.1 |
> | |24 |404 |4.0 |4.0 |96.2 |
> | |25 |384 |3.8 |3.8 |100.0 |
> | |-----|---------|-------|-------------|---------------|
> | |Total|10000 |100.0 |100.0 | |
> |-----|-----|---------|-------|-------------|---------------|
>
>
>
>
>

Art Kendall

Re: Random from 1 to 25 (was, re: Quickly Create a Dataset...)

Just to show that there is more than one way to skin a cat, try.
INPUT PROGRAM.
LOOP id=1 TO 30000.
COMPUTE IVBX = rnd(rv.unifrom(.5,25.5).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS id (F3.0) vbx (f7.4) IVBX(F2).
FREQUENCIES VARS= IVBX.

Art Kendall
Social Research Consultants

Art Kendall wrote:

> Richard is right. It should be
> COMPUTE IVBX = TRUNC(RV.UNIFORM(1,26)).
>
> Art
>
>
> Richard Ristow wrote:
>> At 05:54 PM 2/5/2007, Jim Moffitt wrote:
>>
>>> How would one create a 300-case dataset where each case has a unique
>>> ID Ranging from 1 to 300 and a second variable named VBX which
>>> receives a random value ranging from 1 to 25?
>>
>> Responders solved this as a programming problem, adding the second
>> variable in the INPUT PROGRAM.
>>
>> The following computations were suggested by different posters for
>> calculating the "random values ranging from 1 to 25." All posters
>> assumed integers were wanted, which I accept as well.
>>
>> "Random ... ranging from 1 to 25" in principle allows any
>> distribution, but is commonly taken to mean equi-distribution, that
>> all values are equally likely.
>>
>> One suggested calculation doesn't meet this criterion:
>> . compute vbx=rnd(RV.uniform(1,25)).
>> Values 1 and 25 are only half as likely to be selected as are the
>> other 23 values.
>>
>> Another has, I think, the same problem to lesser degree:
>> . COMPUTE vbx =mod( rnd(uniform(1)*100),25)+1.
>> That's the above logic, but applied to the range 1-100 and then
>> 'wrapped', using MOD function, down to 1-25. Each value in range 1-25
>> corresponds to four values in range 1-100, and only two of those (1
>> and 100) have low probability of being selected; so the effect is
>> diluted.
>>
>> I recommend, as being simple and giving the desired equi-distribution,
>> . COMPUTE VBX = TRUNC(RV.UNIFORM(1,26)).
>>
>> The following is SPSSX 14 draft output, with some hand editing:
>>
>> NEW FILE.
>> INPUT PROGRAM.
>> . COMPUTE #N_CASES = 10000.
>> . NUMERIC ID (F5)
>> /VBX_MOD VBX_RND VBX_TRNC (F3).
>> . LOOP ID = 1 TO #N_CASES.
>> . COMPUTE VBX_MOD = mod( rnd(uniform(1)*100),25)+1.
>> . COMPUTE VBX_RND = rnd(RV.uniform(1,25)).
>> . COMPUTE VBX_TRNC = TRUNC(RV.UNIFORM(1,26)).
>> . END CASE.
>> . END LOOP.
>> END FILE.
>> END INPUT PROGRAM.
>> FREQUENCIES VBX_MOD VBX_RND VBX_TRNC.
>>
>> Frequencies
>> |--------------------------|------------------------|
>> |Output Created |06-FEB-2007 01:49:34 |
>> |--------------------------|------------------------|
>> Statistics [suppressed - no missing data]
>>
>> Frequency Table
>>
>> VBX_MOD
>> |-----|-----|---------|-------|-------------|---------------|
>> | | |Frequency|Percent|Valid Percent|Cumulative |
>> | | | | | |Percent |
>> |-----|-----|---------|-------|-------------|---------------|
>> |Valid|1 |385 |3.9 |3.9 |3.9 |
>> | |2 |382 |3.8 |3.8 |7.7 |
>> | |3 |416 |4.2 |4.2 |11.8 |
>> | |4 |377 |3.8 |3.8 |15.6 |
>> | |5 |401 |4.0 |4.0 |19.6 |
>> | |-----|---------|-------|-------------|---------------|
>> | |6 |393 |3.9 |3.9 |23.5 |
>> | |7 |424 |4.2 |4.2 |27.8 |
>> | |8 |404 |4.0 |4.0 |31.8 |
>> | |9 |398 |4.0 |4.0 |35.8 |
>> | |10 |420 |4.2 |4.2 |40.0 |
>> | |-----|---------|-------|-------------|---------------|
>> | |11 |375 |3.8 |3.8 |43.8 |
>> | |12 |450 |4.5 |4.5 |48.3 |
>> | |13 |428 |4.3 |4.3 |52.5 |
>> | |14 |396 |4.0 |4.0 |56.5 |
>> | |15 |364 |3.6 |3.6 |60.1 |
>> | |-----|---------|-------|-------------|---------------|
>> | |16 |412 |4.1 |4.1 |64.3 |
>> | |17 |402 |4.0 |4.0 |68.3 |
>> | |18 |381 |3.8 |3.8 |72.1 |
>> | |19 |419 |4.2 |4.2 |76.3 |
>> | |20 |376 |3.8 |3.8 |80.0 |
>> | |-----|---------|-------|-------------|---------------|
>> | |21 |397 |4.0 |4.0 |84.0 |
>> | |22 |411 |4.1 |4.1 |88.1 |
>> | |23 |420 |4.2 |4.2 |92.3 |
>> | |24 |353 |3.5 |3.5 |95.8 |
>> | |25 |416 |4.2 |4.2 |100.0 |
>> | |-----|---------|-------|-------------|---------------|
>> | |Total|10000 |100.0 |100.0 | |
>> |-----|-----|---------|-------|-------------|---------------|
>>
>> VBX_RND
>> |-----|-----|---------|-------|-------------|---------------|
>> | | |Frequency|Percent|Valid Percent|Cumulative |
>> | | | | | |Percent |
>> |-----|-----|---------|-------|-------------|---------------|
>> |Valid|1 |207 |2.1 |2.1 |2.1 |
>> | |2 |426 |4.3 |4.3 |6.3 |
>> | |3 |432 |4.3 |4.3 |10.7 |
>> | |4 |399 |4.0 |4.0 |14.6 |
>> | |5 |418 |4.2 |4.2 |18.8 |
>> | |-----|---------|-------|-------------|---------------|
>> | |6 |422 |4.2 |4.2 |23.0 |
>> | |7 |420 |4.2 |4.2 |27.2 |
>> | |8 |415 |4.2 |4.2 |31.4 |
>> | |9 |421 |4.2 |4.2 |35.6 |
>> | |10 |424 |4.2 |4.2 |39.8 |
>> | |-----|---------|-------|-------------|---------------|
>> | |11 |403 |4.0 |4.0 |43.9 |
>> | |12 |385 |3.9 |3.9 |47.7 |
>> | |13 |417 |4.2 |4.2 |51.9 |
>> | |14 |436 |4.4 |4.4 |56.3 |
>> | |15 |397 |4.0 |4.0 |60.2 |
>> | |-----|---------|-------|-------------|---------------|
>> | |16 |413 |4.1 |4.1 |64.4 |
>> | |17 |411 |4.1 |4.1 |68.5 |
>> | |18 |393 |3.9 |3.9 |72.4 |
>> | |19 |408 |4.1 |4.1 |76.5 |
>> | |20 |419 |4.2 |4.2 |80.7 |
>> | |-----|---------|-------|-------------|---------------|
>> | |21 |422 |4.2 |4.2 |84.9 |
>> | |22 |387 |3.9 |3.9 |88.8 |
>> | |23 |434 |4.3 |4.3 |93.1 |
>> | |24 |464 |4.6 |4.6 |97.7 |
>> | |25 |227 |2.3 |2.3 |100.0 |
>> | |-----|---------|-------|-------------|---------------|
>> | |Total|10000 |100.0 |100.0 | |
>> |-----|-----|---------|-------|-------------|---------------|
>>
>> VBX_TRNC
>> |-----|-----|---------|-------|-------------|---------------|
>> | | |Frequency|Percent|Valid Percent|Cumulative |
>> | | | | | |Percent |
>> |-----|-----|---------|-------|-------------|---------------|
>> |Valid|1 |391 |3.9 |3.9 |3.9 |
>> | |2 |441 |4.4 |4.4 |8.3 |
>> | |3 |407 |4.1 |4.1 |12.4 |
>> | |4 |407 |4.1 |4.1 |16.5 |
>> | |5 |399 |4.0 |4.0 |20.5 |
>> | |-----|---------|-------|-------------|---------------|
>> | |6 |369 |3.7 |3.7 |24.1 |
>> | |7 |393 |3.9 |3.9 |28.1 |
>> | |8 |404 |4.0 |4.0 |32.1 |
>> | |9 |390 |3.9 |3.9 |36.0 |
>> | |10 |416 |4.2 |4.2 |40.2 |
>> | |-----|---------|-------|-------------|---------------|
>> | |11 |398 |4.0 |4.0 |44.2 |
>> | |12 |390 |3.9 |3.9 |48.1 |
>> | |13 |408 |4.1 |4.1 |52.1 |
>> | |14 |396 |4.0 |4.0 |56.1 |
>> | |15 |392 |3.9 |3.9 |60.0 |
>> | |-----|---------|-------|-------------|---------------|
>> | |16 |425 |4.3 |4.3 |64.3 |
>> | |17 |435 |4.4 |4.4 |68.6 |
>> | |18 |379 |3.8 |3.8 |72.4 |
>> | |19 |418 |4.2 |4.2 |76.6 |
>> | |20 |401 |4.0 |4.0 |80.6 |
>> | |-----|---------|-------|-------------|---------------|
>> | |21 |413 |4.1 |4.1 |84.7 |
>> | |22 |385 |3.9 |3.9 |88.6 |
>> | |23 |355 |3.6 |3.6 |92.1 |
>> | |24 |404 |4.0 |4.0 |96.2 |
>> | |25 |384 |3.8 |3.8 |100.0 |
>> | |-----|---------|-------|-------------|---------------|
>> | |Total|10000 |100.0 |100.0 | |
>> |-----|-----|---------|-------|-------------|---------------|
>>
>>
>>
>>
>>
>
>
>

Art Kendall
Social Research Consultants