How to generate 1000 samples?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

How to generate 1000 samples?

L T-3
Hi!
I would like to find out how I can generate 100 samples. Each sample has
1000 cases with 2 columns of data (RV.N(0,1) and a Filter variable).

Below is the syntax for 1 sample which generates 2 columns. How can I
create 200 (100 samples x 2) columns of data?

INPUT PROGRAM.
    LOOP #Case = 1 to 1000.
      COMPUTE X = RV.NORMAL(0,1).
      END CASE.
    END LOOP.
  END FILE.
END INPUT PROGRAM.
do if $casenum=1.
  compute #s1=10.
  compute #s2=1000.
end if.
do if #s2 > 0.
  compute filter_$ = uniform(1)* #s2 < #s1.
  compute #s1 = #s1 - filter_$.
  compute #s2 = #s2 - 1.
  else.
  compute filter_$ = 0.
  end if.
EXECUTE.

Really appreciate your help!

Sincerely,
Louis
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Maguin, Eugene
Louis,

I think a small modification to your syntax will give you what you want.


INPUT PROGRAM.
   LOOP SAMPLE=1 TO 100.
      LOOP #Case = 1 to 1000.
         COMPUTE X = RV.NORMAL(0,1).
         END CASE.
      END LOOP.
   END LOOP.
   END FILE.
END INPUT PROGRAM.

You mention a filter variable. I'm confused by this. What is the filter
variable supposed to do? Please explain further.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Richard Ristow
In reply to this post by L T-3
At 02:54 PM 7/2/2007, Louis wrote:

>I would like to find out how I can generate 100 samples. Each sample
>has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter
>variable).
>
>How can I create 200 (100 samples x 2) columns of data?

I think Gene's on top of the syntax on this one; but I, also, am
curious what you want.

When you say "200 columns of data", it sounds like you mean 200
variables. ("Columns" isn't an SPSS concept, but if you've come from
spreadsheets, 'columns' and 'variables' look similar.) But you also
said, "1000 cases with 2 columns of data (RV.N(0,1) and a Filter
variable)," which looks like it might be the same thing, but in 'long'
organization (many cases, few variables) rather than 'wide' (the
reverse).

And, of course, the 'filter' variable: should it go from 1 to 100, for
the 100 samples in the file, or what?

-Best wishes,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

L T-3
Hi, Thanks so much for the replies. As you can tell, I am new to syntax in
SPSS.

I am interested in drawing 100 samples of 1000. But at the same time I would
like to have an indicator variable (filter variable) of X number of people
that have some aberrant characteristics which I will overlay later on. The
filter variable is just a convenient way to obtain such an indicator.

Perhaps I do not need to have 100 x 2 columns. But the main obstacle I have
with using the " LOOP SAMPLE=1 TO 100."   is how to create an indicator
variable to randomly identify X number of people for each sample of 1000.

Best,
Louis


subject
RE: How to generate 1000 samples?
mailed-by
buffalo.edu
Louis,

I'm confused about what you want to do. Ok. So you draw 100 samples of 1000
cases per sample. So, 100,000 cases total. Now you want to select 10 cases
without replacement from each of the 100 samples. My question is why would
you do it this way rather than drawing 100 samples of 10 cases each?

Is there some larger project going on that this is just a small piece of?
Something else?

Gene Maguin


On 7/2/07, Richard Ristow <[hidden email]> wrote:

>
> At 02:54 PM 7/2/2007, Louis wrote:
>
> >I would like to find out how I can generate 100 samples. Each sample
> >has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter
> >variable).
> >
> >How can I create 200 (100 samples x 2) columns of data?
>
> I think Gene's on top of the syntax on this one; but I, also, am
> curious what you want.
>
> When you say "200 columns of data", it sounds like you mean 200
> variables. ("Columns" isn't an SPSS concept, but if you've come from
> spreadsheets, 'columns' and 'variables' look similar.) But you also
> said, "1000 cases with 2 columns of data (RV.N(0,1) and a Filter
> variable)," which looks like it might be the same thing, but in 'long'
> organization (many cases, few variables) rather than 'wide' (the
> reverse).
>
> And, of course, the 'filter' variable: should it go from 1 to 100, for
> the 100 samples in the file, or what?
>
> -Best wishes,
>   Richard
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Maguin, Eugene
Louis,

Ok, now I understand much better what you want to do. This should work.
There may be a much more elegant way to do it but I don't know what it is.
Let me pose this question: Why not just take the first m cases in the set of
1000 as your 'selected' cases. There is nothing special about them.

Gene Maguin

INPUT PROGRAM.
VECTOR FILTER(100).
+  LOOP SAMPLE=1 TO 100.
+     LOOP #Case = 1 to 1000.
+        COMPUTE X = RV.NORMAL(0,1).
*  HERE IS THE KEY STATEMENT. 10 IS THE NUMBER OF CASES TO BE SELECTED.
+        IF (SAMPLE LE 10) FILTER(SAMPLE)=1.
+        END CASE.
+     END LOOP.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Richard Ristow
Gene -

At 10:06 AM 7/3/2007, you wrote:

>Ok, now I understand much better what you want to do. This should
>work.
>
>INPUT PROGRAM.
>VECTOR FILTER(100).
>+  LOOP SAMPLE=1 TO 100.
>+     LOOP #Case = 1 to 1000.
>+        COMPUTE X = RV.NORMAL(0,1).
>*  HERE IS THE KEY STATEMENT. 10 IS THE NUMBER OF CASES TO BE
>SELECTED.
>+        IF (SAMPLE LE 10) FILTER(SAMPLE)=1.
>+        END CASE.
>+     END LOOP.
>+  END LOOP.
>+  END FILE.
>END INPUT PROGRAM.

I'm not quite clear about the problem, but is what this gives you, what
you had in mind?

It gives 100,000 cases, which is a reasonable interpretation of "100
samples of 1000". All have the variable X, drawn from the normal
distribution.

Variable SAMPLE has values in 100 cases: 1 in case 1, 2 in case 1001, 3
in case 2001, etc.  You likely wanted SAMPLE to be 1 in cases 1-1000, 2
in cases 1001-2000, etc., and ran into a known glitch with LOOP and
INPUT PROGRAM. If you specify "LEAVE" for SAMPLE, you'll get the above.

I'm not sure about the FILTERxxx variables. You logic will never
populate any but the first 10 of those. Those 10 are essentially
surrogates for SAMPLE: with SAMPLE fully populated, FILTER1 is 1 in the
1,000 cases of the first sample, system-missing otherwise; FILTER1 is 1
in the 1,000 cases of the second sample, system-missing otherwise; etc.

If you want to mark 10 cases in each sample, which *sounds* like what
was wanted, you can use a single variable FILTER, and set it to 1 if
your counter #Case is 10 or less, 0 otherwise; that marks the first 10.
Or of course there are ways of distributing FILTER randomly through the
samples, probably by k/n logic; but, as you say, there's no obvious
reason for doing that.

That gives something like this, though I'm still not sure it's what's
desired. SPSS 15 draft output (WRR:not saved separately):


INPUT PROGRAM.
+  LOOP SAMPLE=1 TO 100.
.     LEAVE SAMPLE.
.     FORMATS SAMPLE(F4).
.     NUMERIC FILTER(F2).
+     LOOP #Case = 1 to 1000.
+        COMPUTE X = RV.NORMAL(0,1).
*  HERE IS THE KEY STATEMENT. 10 IS THE NUMBER OF CASES TO BE SELECTED.
*  ----- IF (SAMPLE LE 10) FILTER(SAMPLE)=1.  ---- .
.        RECODE #Case
                (1 THRU 10 = 1) (ELSE = 0)
                INTO FILTER.
+        END CASE.
+     END LOOP.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

LIST FORMAT = NUMBERED/CASES FROM 1 TO 15.

List
|-----------------------------|---------------------------|
|Output Created               |03-JUL-2007 11:40:08       |
|-----------------------------|---------------------------|
        SAMPLE FILTER        X

      1     1     1      -1.45
      2     1     1       1.51
      3     1     1       -.86
      4     1     1       1.08
      5     1     1       1.19
      6     1     1       -.59
      7     1     1       -.26
      8     1     1        .65
      9     1     1      -1.01
     10     1     1       -.62
     11     1     0       1.39
     12     1     0        .66
     13     1     0       -.52
     14     1     0        .12
     15     1     0        .81

Number of cases read:  15    Number of cases listed:  15


CROSSTABS
   /TABLES=SAMPLE  BY FILTER
   /FORMAT= AVALUE TABLES
   /CELLS= COUNT
   /COUNT ROUND CELL .


Crosstabs
|-----------------------------|---------------------------|
|Output Created               |03-JUL-2007 11:40:08       |
|-----------------------------|---------------------------|
Case Processing Summary [suppressed - no missing data]

SAMPLE * FILTER Crosstabulation
Count
|------|---|----------|------|
|      |   |FILTER    |Total |
|      |   |-----|----|------|
|      |   |0    |1   |0     |
|------|---|-----|----|------|
|SAMPLE|1  |990  |10  |1000  |
|      |---|-----|----|------|
|      |2  |990  |10  |1000  |
|      |---|-----|----|------|
|      |3  |990  |10  |1000  |
|      |---|-----|----|------|
|      |4  |990  |10  |1000  |
|      |---|-----|----|------|
|      |5  |990  |10  |1000  |
|      |---|-----|----|------|
|      |6  |990  |10  |1000  |
|      |---|-----|----|------|
|      |7  |990  |10  |1000  |
|      |---|-----|----|------|
|      |8  |990  |10  |1000  |
|      |---|-----|----|------|
|      |9  |990  |10  |1000  |
|      |---|-----|----|------|
|      |10 |990  |10  |1000  |
|      |---|-----|----|------|
|      |11 |990  |10  |1000  |
|      |---|-----|----|------|
|      |12 |990  |10  |1000  |
|      |---|-----|----|------|
|      |13 |990  |10  |1000  |
|      |---|-----|----|------|
|      |14 |990  |10  |1000  |
|      |---|-----|----|------|
|      |15 |990  |10  |1000  |
|      |---|-----|----|------|
|      |16 |990  |10  |1000  |
|      |---|-----|----|------|
|      |17 |990  |10  |1000  |
|      |---|-----|----|------|
|      |18 |990  |10  |1000  |
|      |---|-----|----|------|
|      |19 |990  |10  |1000  |
|      |---|-----|----|------|
|      |20 |990  |10  |1000  |
|      |---|-----|----|------|
|      |21 |990  |10  |1000  |
|      |---|-----|----|------|
|      |22 |990  |10  |1000  |
|      |---|-----|----|------|
|      |23 |990  |10  |1000  |
|      |---|-----|----|------|
|      |24 |990  |10  |1000  |
|      |---|-----|----|------|
|      |25 |990  |10  |1000  |
|      |---|-----|----|------|
|      |26 |990  |10  |1000  |
|      |---|-----|----|------|
|      |27 |990  |10  |1000  |
|      |---|-----|----|------|
|      |28 |990  |10  |1000  |
|      |---|-----|----|------|
|      |29 |990  |10  |1000  |
|      |---|-----|----|------|
|      |30 |990  |10  |1000  |
|      |---|-----|----|------|
|      |31 |990  |10  |1000  |
|      |---|-----|----|------|
|      |32 |990  |10  |1000  |
|      |---|-----|----|------|
|      |33 |990  |10  |1000  |
|      |---|-----|----|------|
|      |34 |990  |10  |1000  |
|      |---|-----|----|------|
|      |35 |990  |10  |1000  |
|      |---|-----|----|------|
|      |36 |990  |10  |1000  |
|      |---|-----|----|------|
|      |37 |990  |10  |1000  |
|      |---|-----|----|------|
|      |38 |990  |10  |1000  |
|      |---|-----|----|------|
|      |39 |990  |10  |1000  |
|      |---|-----|----|------|
|      |40 |990  |10  |1000  |
|      |---|-----|----|------|
|      |41 |990  |10  |1000  |
|      |---|-----|----|------|
|      |42 |990  |10  |1000  |
|      |---|-----|----|------|
|      |43 |990  |10  |1000  |
|      |---|-----|----|------|
|      |44 |990  |10  |1000  |
|      |---|-----|----|------|
|      |45 |990  |10  |1000  |
|      |---|-----|----|------|
|      |46 |990  |10  |1000  |
|      |---|-----|----|------|
|      |47 |990  |10  |1000  |
|      |---|-----|----|------|
|      |48 |990  |10  |1000  |
|      |---|-----|----|------|
|      |49 |990  |10  |1000  |
|      |---|-----|----|------|
|      |50 |990  |10  |1000  |
|      |---|-----|----|------|
|      |51 |990  |10  |1000  |
|      |---|-----|----|------|
|      |52 |990  |10  |1000  |
|      |---|-----|----|------|
|      |53 |990  |10  |1000  |
|      |---|-----|----|------|
|      |54 |990  |10  |1000  |
|      |---|-----|----|------|
|      |55 |990  |10  |1000  |
|      |---|-----|----|------|
|      |56 |990  |10  |1000  |
|      |---|-----|----|------|
|      |57 |990  |10  |1000  |
|      |---|-----|----|------|
|      |58 |990  |10  |1000  |
|      |---|-----|----|------|
|      |59 |990  |10  |1000  |
|      |---|-----|----|------|
|      |60 |990  |10  |1000  |
|      |---|-----|----|------|
|      |61 |990  |10  |1000  |
|      |---|-----|----|------|
|      |62 |990  |10  |1000  |
|      |---|-----|----|------|
|      |63 |990  |10  |1000  |
|      |---|-----|----|------|
|      |64 |990  |10  |1000  |
|      |---|-----|----|------|
|      |65 |990  |10  |1000  |
|      |---|-----|----|------|
|      |66 |990  |10  |1000  |
|      |---|-----|----|------|
|      |67 |990  |10  |1000  |
|      |---|-----|----|------|
|      |68 |990  |10  |1000  |
|      |---|-----|----|------|
|      |69 |990  |10  |1000  |
|      |---|-----|----|------|
|      |70 |990  |10  |1000  |
|      |---|-----|----|------|
|      |71 |990  |10  |1000  |
|      |---|-----|----|------|
|      |72 |990  |10  |1000  |
|      |---|-----|----|------|
|      |73 |990  |10  |1000  |
|      |---|-----|----|------|
|      |74 |990  |10  |1000  |
|      |---|-----|----|------|
|      |75 |990  |10  |1000  |
|      |---|-----|----|------|
|      |76 |990  |10  |1000  |
|      |---|-----|----|------|
|      |77 |990  |10  |1000  |
|      |---|-----|----|------|
|      |78 |990  |10  |1000  |
|      |---|-----|----|------|
|      |79 |990  |10  |1000  |
|      |---|-----|----|------|
|      |80 |990  |10  |1000  |
|      |---|-----|----|------|
|      |81 |990  |10  |1000  |
|      |---|-----|----|------|
|      |82 |990  |10  |1000  |
|      |---|-----|----|------|
|      |83 |990  |10  |1000  |
|      |---|-----|----|------|
|      |84 |990  |10  |1000  |
|      |---|-----|----|------|
|      |85 |990  |10  |1000  |
|      |---|-----|----|------|
|      |86 |990  |10  |1000  |
|      |---|-----|----|------|
|      |87 |990  |10  |1000  |
|      |---|-----|----|------|
|      |88 |990  |10  |1000  |
|      |---|-----|----|------|
|      |89 |990  |10  |1000  |
|      |---|-----|----|------|
|      |90 |990  |10  |1000  |
|      |---|-----|----|------|
|      |91 |990  |10  |1000  |
|      |---|-----|----|------|
|      |92 |990  |10  |1000  |
|      |---|-----|----|------|
|      |93 |990  |10  |1000  |
|      |---|-----|----|------|
|      |94 |990  |10  |1000  |
|      |---|-----|----|------|
|      |95 |990  |10  |1000  |
|      |---|-----|----|------|
|      |96 |990  |10  |1000  |
|      |---|-----|----|------|
|      |97 |990  |10  |1000  |
|      |---|-----|----|------|
|      |98 |990  |10  |1000  |
|      |---|-----|----|------|
|      |99 |990  |10  |1000  |
|      |---|-----|----|------|
|      |100|990  |10  |1000  |
|------|---|-----|----|------|
|Total     |99000|1000|100000|
|----------|-----|----|------|






has the normally distributed variable


>There may be a much more elegant way to do it but I don't know what it
>is.
>Let me pose this question: Why not just take the first m cases in the
>set of
>1000 as your 'selected' cases. There is nothing special about them.




>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.5.476 / Virus Database: 269.9.14/885 - Release Date:
>7/3/2007 10:02 AM
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Maguin, Eugene
Richard,

Thank you very much for taking the time to look over my posting and to point
out my errors. I didn't know about the problem with Loop in an Input
program.

I understand your question about setting up 100 filter variables. Although
Louis will have to say whether this is what he wanted, he did say

Perhaps I do not need to have 100 x 2 columns. But the main obstacle I have
with using the " LOOP SAMPLE=1 TO 100."   is how to create an indicator
variable to randomly identify X number of people for each sample of 1000.

I interpreted this statement along with the syntax he originally posted to
mean he wanted to uniquely identify the set of 10 cases in each sample.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Richard Ristow
In reply to this post by L T-3
Let's see if we're getting there. At 02:54 PM 7/2/2007, Louis wrote:

>I would like to find out how I can generate 100 samples. Each sample
>has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter
>variable).
>
>Below is the syntax for 1 sample which generates 2 columns. How can I
>create 200 (100 samples x 2) columns of data?

Now, I think I see what you were going for. Something like this, if you
wanted 3 samples:

    X1  FILTER1    X2  FILTER2    X3 FILTER3
  1.21        0 -0.35        0 -1.35 1

All the X variables are chosen from the uniform distribution; the
FILTER variables are chosen to give value 1, 100 times, in each column.

First, you know syntax better than you give yourself credit for. What
you have is pretty good, though you do still have that problem of
generating 100 pairs of variables, instead of only 1 pair.

For the random selection, you have a neat implementation of the "k/n"
algorithm - your #s1 is what's normally called k, your #s2 is what's
called n. You can do that within the INPUT PROGRAM, if you like, rather
than in a separate pass.

Second: are you sure you want it to come out this way? The other way is
what Gene and I have been looking at: Three 'columns' (variables),
where the first is the sample number, and the other two correspond to
your variables X and FILTER_$. That's easier to use, for most purposes.
(It's called 'long' organization; what you were thinking of, is called
'wide'.)

Taking Gene's logic as I modified it, adding a row count within each
sample (you'll see why, later) and adding your sampling logic (I've
modified it a little, to use RV.BERNOULLI instead of UNIFORM). Here's
selecting 3 samples of 6 cases each, filtering to select 2 in each
sample. (You can expand to 100 samples of 1,000 easily.)

This is SPSS 15 draft output (WRR:not saved separately), giving output
in 'long' format:

INPUT PROGRAM.
.  NUMERIC SAMPLE ROW (F4).
.  LEAVE   SAMPLE.
+  LOOP SAMPLE=1 TO 3.
.     NUMERIC FILTER_$(F2).
.     COMPUTE #S1 = 2  /* Number of cases within filter */.
.     COMPUTE #S2 = 6  /* Size of each sample           */.
+     LOOP #Case = 1 to #S2.
.        COMPUTE ROW = #Case.
+        COMPUTE X   = RV.NORMAL(0,1).
.        compute filter_$ = RV.BERNOULLI(#s1/#s2).
.        compute #s1 = #s1 - filter_$.
.        compute #s2 = #s2 - 1.
+        END CASE.
+     END LOOP.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.
DATASET NAME     LongForm WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |03-JUL-2007 14:47:21       |
|-----------------------------|---------------------------|
[LongForm]

SAMPLE  ROW FILTER_$        X

     1     1     1       -1.37
     1     2     0        -.74
     1     3     0         .90
     1     4     1        1.13
     1     5     0         .36
     1     6     0        -.89
     2     1     0         .69
     2     2     0         .97
     2     3     0        -.22
     2     4     1        1.93
     2     5     0        -.84
     2     6     1        -.04
     3     1     1         .81
     3     2     0        1.95
     3     3     0        -.83
     3     4     0       -1.48
     3     5     1         .60
     3     6     0        1.79

Number of cases read:  18    Number of cases listed:  18

Now, to get 'wide' format, you could use VECTOR and LOOP, as you were
planning to. But I think it's easier to generate 'long' form and
convert to 'wide' with CASESTOVARS. (The DATASET commands are not
necessary; they made testing easier):

DATASET ACTIVATE LongForm WINDOW=FRONT.
DATASET COPY     WideForm.
DATASET ACTIVATE WideForm WINDOW=FRONT.

SORT CASES BY ROW SAMPLE .
CASESTOVARS
  /ID = ROW
  /INDEX = SAMPLE
  /GROUPBY = INDEX .

Cases to Variables
|----------------------------|---------------------------|
|Output Created              |03-JUL-2007 14:48:52       |
|----------------------------|---------------------------|
[WideForm]

Generated Variables
|--------|------|----------|
|Original|SAMPLE|Result    |
|Variable|      |----------|
|        |      |Name      |
|--------|------|----------|
|FILTER_$|1     |FILTER_$.1|
|        |2     |FILTER_$.2|
|        |3     |FILTER_$.3|
|--------|------|----------|
|X       |1     |X.1       |
|        |2     |X.2       |
|        |3     |X.3       |
|--------|------|----------|

Processing Statistics
|---------------|---|
|Cases In       |18 |
|Cases Out      |6  |
|---------------|---|
|Cases In/Cases |3.0|
|Out            |   |
|---------------|---|
|Variables In   |4  |
|Variables Out  |7  |
|---------------|---|
|Index Values   |3  |
|---------------|---|


LIST.


List
|-----------------------------|---------------------------|
|Output Created               |03-JUL-2007 14:48:52       |
|-----------------------------|---------------------------|
[WideForm]

  ROW FILTER_$.1      X.1 FILTER_$.2      X.2 FILTER_$.3      X.3

    1      1        -1.37      0          .69      1          .81
    2      0         -.74      0          .97      0         1.95
    3      0          .90      0         -.22      0         -.83
    4      1         1.13      1         1.93      0        -1.48
    5      0          .36      0         -.84      1          .60
    6      0         -.89      1         -.04      0         1.79

Number of cases read:  6    Number of cases listed:  6
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

L T-3
Wow, Richard and Gene,
Thanks so much for solving this problem so quickly and efficiently! I'm very
impressed with the solution. Thanks again.

Best,
Louis

On 7/3/07, Richard Ristow <[hidden email]> wrote:

>
> Let's see if we're getting there. At 02:54 PM 7/2/2007, Louis wrote:
>
> >I would like to find out how I can generate 100 samples. Each sample
> >has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter
> >variable).
> >
> >Below is the syntax for 1 sample which generates 2 columns. How can I
> >create 200 (100 samples x 2) columns of data?
>
> Now, I think I see what you were going for. Something like this, if you
> wanted 3 samples:
>
>    X1  FILTER1    X2  FILTER2    X3 FILTER3
> 1.21        0 -0.35        0 -1.35 1
>
> All the X variables are chosen from the uniform distribution; the
> FILTER variables are chosen to give value 1, 100 times, in each column.
>
> First, you know syntax better than you give yourself credit for. What
> you have is pretty good, though you do still have that problem of
> generating 100 pairs of variables, instead of only 1 pair.
>
> For the random selection, you have a neat implementation of the "k/n"
> algorithm - your #s1 is what's normally called k, your #s2 is what's
> called n. You can do that within the INPUT PROGRAM, if you like, rather
> than in a separate pass.
>
> Second: are you sure you want it to come out this way? The other way is
> what Gene and I have been looking at: Three 'columns' (variables),
> where the first is the sample number, and the other two correspond to
> your variables X and FILTER_$. That's easier to use, for most purposes.
> (It's called 'long' organization; what you were thinking of, is called
> 'wide'.)
>
> Taking Gene's logic as I modified it, adding a row count within each
> sample (you'll see why, later) and adding your sampling logic (I've
> modified it a little, to use RV.BERNOULLI instead of UNIFORM). Here's
> selecting 3 samples of 6 cases each, filtering to select 2 in each
> sample. (You can expand to 100 samples of 1,000 easily.)
>
> This is SPSS 15 draft output (WRR:not saved separately), giving output
> in 'long' format:
>
> INPUT PROGRAM.
> .  NUMERIC SAMPLE ROW (F4).
> .  LEAVE   SAMPLE.
> +  LOOP SAMPLE=1 TO 3.
> .     NUMERIC FILTER_$(F2).
> .     COMPUTE #S1 = 2  /* Number of cases within filter */.
> .     COMPUTE #S2 = 6  /* Size of each sample           */.
> +     LOOP #Case = 1 to #S2.
> .        COMPUTE ROW = #Case.
> +        COMPUTE X   = RV.NORMAL(0,1).
> .        compute filter_$ = RV.BERNOULLI(#s1/#s2).
> .        compute #s1 = #s1 - filter_$.
> .        compute #s2 = #s2 - 1.
> +        END CASE.
> +     END LOOP.
> +  END LOOP.
> +  END FILE.
> END INPUT PROGRAM.
> DATASET NAME     LongForm WINDOW=FRONT.
> LIST.
>
> List
> |-----------------------------|---------------------------|
> |Output Created               |03-JUL-2007 14:47:21       |
> |-----------------------------|---------------------------|
> [LongForm]
>
> SAMPLE  ROW FILTER_$        X
>
>     1     1     1       -1.37
>     1     2     0        -.74
>     1     3     0         .90
>     1     4     1        1.13
>     1     5     0         .36
>     1     6     0        -.89
>     2     1     0         .69
>     2     2     0         .97
>     2     3     0        -.22
>     2     4     1        1.93
>     2     5     0        -.84
>     2     6     1        -.04
>     3     1     1         .81
>     3     2     0        1.95
>     3     3     0        -.83
>     3     4     0       -1.48
>     3     5     1         .60
>     3     6     0        1.79
>
> Number of cases read:  18    Number of cases listed:  18
>
> Now, to get 'wide' format, you could use VECTOR and LOOP, as you were
> planning to. But I think it's easier to generate 'long' form and
> convert to 'wide' with CASESTOVARS. (The DATASET commands are not
> necessary; they made testing easier):
>
> DATASET ACTIVATE LongForm WINDOW=FRONT.
> DATASET COPY     WideForm.
> DATASET ACTIVATE WideForm WINDOW=FRONT.
>
> SORT CASES BY ROW SAMPLE .
> CASESTOVARS
> /ID = ROW
> /INDEX = SAMPLE
> /GROUPBY = INDEX .
>
> Cases to Variables
> |----------------------------|---------------------------|
> |Output Created              |03-JUL-2007 14:48:52       |
> |----------------------------|---------------------------|
> [WideForm]
>
> Generated Variables
> |--------|------|----------|
> |Original|SAMPLE|Result    |
> |Variable|      |----------|
> |        |      |Name      |
> |--------|------|----------|
> |FILTER_$|1     |FILTER_$.1|
> |        |2     |FILTER_$.2|
> |        |3     |FILTER_$.3|
> |--------|------|----------|
> |X       |1     |X.1       |
> |        |2     |X.2       |
> |        |3     |X.3       |
> |--------|------|----------|
>
> Processing Statistics
> |---------------|---|
> |Cases In       |18 |
> |Cases Out      |6  |
> |---------------|---|
> |Cases In/Cases |3.0|
> |Out            |   |
> |---------------|---|
> |Variables In   |4  |
> |Variables Out  |7  |
> |---------------|---|
> |Index Values   |3  |
> |---------------|---|
>
>
> LIST.
>
>
> List
> |-----------------------------|---------------------------|
> |Output Created               |03-JUL-2007 14:48:52       |
> |-----------------------------|---------------------------|
> [WideForm]
>
> ROW FILTER_$.1      X.1 FILTER_$.2      X.2 FILTER_$.3      X.3
>
>    1      1        -1.37      0          .69      1          .81
>    2      0         -.74      0          .97      0         1.95
>    3      0          .90      0         -.22      0         -.83
>    4      1         1.13      1         1.93      0        -1.48
>    5      0          .36      0         -.84      1          .60
>    6      0         -.89      1         -.04      0         1.79
>
> Number of cases read:  6    Number of cases listed:  6
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

L T-3
Hi,
Just a quick question,

+     LOOP #Case = 1 to #S2.
.        COMPUTE ROW = #Case.
+        COMPUTE X   = RV.NORMAL(0,1).
.        compute filter_$ = RV.BERNOULLI(#s1/#s2).
.        compute #s1 = #s1 - filter_$.
.        compute #s2 = #s2 - 1.
+        END CASE.
+     END LOOP.

I don't understand how 'rows' can be enumerated properly (e.g., 1, 2, 3, 4,
5, 6) when each LOOP takes #S2 (originally 6) a step down (e.g., 1 to 6, 1
to 5... and finally 1 to 1).

On 7/7/07, Sien Chieh Tay <[hidden email]> wrote:

>
> Wow, Richard and Gene,
> Thanks so much for solving this problem so quickly and efficiently! I'm
> very impressed with the solution. Thanks again.
>
> Best,
> Louis
>
>  On 7/3/07, Richard Ristow <[hidden email]> wrote:
> >
> > Let's see if we're getting there. At 02:54 PM 7/2/2007, Louis wrote:
> >
> > >I would like to find out how I can generate 100 samples. Each sample
> > >has 1000 cases with 2 columns of data (RV.N(0,1) and a Filter
> > >variable).
> > >
> > >Below is the syntax for 1 sample which generates 2 columns. How can I
> > >create 200 (100 samples x 2) columns of data?
> >
> > Now, I think I see what you were going for. Something like this, if you
> > wanted 3 samples:
> >
> >    X1  FILTER1    X2  FILTER2    X3 FILTER3
> > 1.21        0 -0.35        0 -1.35 1
> >
> > All the X variables are chosen from the uniform distribution; the
> > FILTER variables are chosen to give value 1, 100 times, in each column.
> >
> > First, you know syntax better than you give yourself credit for. What
> > you have is pretty good, though you do still have that problem of
> > generating 100 pairs of variables, instead of only 1 pair.
> >
> > For the random selection, you have a neat implementation of the "k/n"
> > algorithm - your #s1 is what's normally called k, your #s2 is what's
> > called n. You can do that within the INPUT PROGRAM, if you like, rather
> > than in a separate pass.
> >
> > Second: are you sure you want it to come out this way? The other way is
> > what Gene and I have been looking at: Three 'columns' (variables),
> > where the first is the sample number, and the other two correspond to
> > your variables X and FILTER_$. That's easier to use, for most purposes.
> > (It's called 'long' organization; what you were thinking of, is called
> > 'wide'.)
> >
> > Taking Gene's logic as I modified it, adding a row count within each
> > sample (you'll see why, later) and adding your sampling logic (I've
> > modified it a little, to use RV.BERNOULLI instead of UNIFORM). Here's
> > selecting 3 samples of 6 cases each, filtering to select 2 in each
> > sample. (You can expand to 100 samples of 1,000 easily.)
> >
> > This is SPSS 15 draft output (WRR:not saved separately), giving output
> > in 'long' format:
> >
> > INPUT PROGRAM.
> > .  NUMERIC SAMPLE ROW (F4).
> > .  LEAVE   SAMPLE.
> > +  LOOP SAMPLE=1 TO 3.
> > .     NUMERIC FILTER_$(F2).
> > .     COMPUTE #S1 = 2  /* Number of cases within filter */.
> > .     COMPUTE #S2 = 6  /* Size of each sample           */.
> > +     LOOP #Case = 1 to #S2.
> > .        COMPUTE ROW = #Case.
> > +        COMPUTE X   = RV.NORMAL(0,1).
> > .        compute filter_$ = RV.BERNOULLI(#s1/#s2).
> > .        compute #s1 = #s1 - filter_$.
> > .        compute #s2 = #s2 - 1.
> > +        END CASE.
> > +     END LOOP.
> > +  END LOOP.
> > +  END FILE.
> > END INPUT PROGRAM.
> > DATASET NAME     LongForm WINDOW=FRONT.
> > LIST.
> >
> > List
> > |-----------------------------|---------------------------|
> > |Output Created               |03-JUL-2007 14:47:21       |
> > |-----------------------------|---------------------------|
> > [LongForm]
> >
> > SAMPLE  ROW FILTER_$        X
> >
> >     1     1     1       -1.37
> >     1     2     0        -.74
> >     1     3     0         .90
> >     1     4     1        1.13
> >     1     5     0         .36
> >     1     6     0        -.89
> >     2     1     0         .69
> >     2     2     0         .97
> >     2     3     0        -.22
> >     2     4     1        1.93
> >     2     5     0        -.84
> >     2     6     1        -.04
> >     3     1     1         .81
> >     3     2     0        1.95
> >     3     3     0        -.83
> >     3     4     0       -1.48
> >     3     5     1         .60
> >     3     6     0        1.79
> >
> > Number of cases read:  18    Number of cases listed:  18
> >
> > Now, to get 'wide' format, you could use VECTOR and LOOP, as you were
> > planning to. But I think it's easier to generate 'long' form and
> > convert to 'wide' with CASESTOVARS. (The DATASET commands are not
> > necessary; they made testing easier):
> >
> > DATASET ACTIVATE LongForm WINDOW=FRONT.
> > DATASET COPY     WideForm.
> > DATASET ACTIVATE WideForm WINDOW=FRONT.
> >
> > SORT CASES BY ROW SAMPLE .
> > CASESTOVARS
> > /ID = ROW
> > /INDEX = SAMPLE
> > /GROUPBY = INDEX .
> >
> > Cases to Variables
> > |----------------------------|---------------------------|
> > |Output Created              |03-JUL-2007 14:48:52       |
> > |----------------------------|---------------------------|
> > [WideForm]
> >
> > Generated Variables
> > |--------|------|----------|
> > |Original|SAMPLE|Result    |
> > |Variable|      |----------|
> > |        |      |Name      |
> > |--------|------|----------|
> > |FILTER_$|1     |FILTER_$.1|
> > |        |2     |FILTER_$.2|
> > |        |3     |FILTER_$.3|
> > |--------|------|----------|
> > |X       |1     |X.1       |
> > |        |2     |X.2       |
> > |        |3     |X.3       |
> > |--------|------|----------|
> >
> > Processing Statistics
> > |---------------|---|
> > |Cases In       |18 |
> > |Cases Out      |6  |
> > |---------------|---|
> > |Cases In/Cases |3.0|
> > |Out            |   |
> > |---------------|---|
> > |Variables In   |4  |
> > |Variables Out  |7  |
> > |---------------|---|
> > |Index Values   |3  |
> > |---------------|---|
> >
> >
> > LIST.
> >
> >
> > List
> > |-----------------------------|---------------------------|
> > |Output Created               |03-JUL-2007 14:48:52       |
> > |-----------------------------|---------------------------|
> > [WideForm]
> >
> > ROW FILTER_$.1      X.1 FILTER_$.2      X.2 FILTER_$.3      X.3
> >
> >    1      1        -1.37      0          .69      1          .81
> >    2      0         -.74      0          .97      0         1.95
> >    3      0          .90      0         -.22      0         -.83
> >    4      1         1.13      1         1.93      0        -1.48
> >    5      0          .36      0         -.84      1          .60
> >    6      0         -.89      1         -.04      0         1.79
> >
> > Number of cases read:  6    Number of cases listed:  6
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to generate 1000 samples?

Richard Ristow
At 01:54 PM 7/7/2007, Sien Chieh Tay wrote:

>Just a quick question,
>
>+     LOOP #Case = 1 to #S2.
>.        COMPUTE ROW = #Case.
>+        COMPUTE X   = RV.NORMAL(0,1).
>.        compute filter_$ = RV.BERNOULLI(#s1/#s2).
>.        compute #s1 = #s1 - filter_$.
>.        compute #s2 = #s2 - 1.
>+        END CASE.
>+     END LOOP.
>
>I don't understand how 'rows' can be enumerated properly (e.g., 1, 2,
>3, 4, 5, 6) when each LOOP takes #S2 (originally 6) a step down (e.g.,
>1 to 6, 1 to 5... and finally 1 to 1).

In other words: how can the loop work at all, create the desired number
of records, when I'm changing the loop's upper limit with every loop
pass?

The fact is, it would have been better general coding practice *not* to
modify, or seem to modify, the loop upper limit: that is, instead of
+     LOOP #Case = 1 to #S2.
something like
+     COMPUTE #N_Cases = #S2.
+     LOOP #Case = 1 to #N_Cases.

I thought of that afterward, but I didn't re-post because what I posted
works.

It works, though it looks like it shouldn't, because SPSS implemented
loops in a slightly unusual way.

The lower and upper limits look like expressions, and the loop counter
looks like a variable; but -

When SPSS encounters a loop, it evaluates the lower and upper limits
and keeps their *values*. It uses those values, even if changes in
variables values would change the values of the expressions for the
limits.

SPSS also keeps a loop counter that is internal and invisible; the
variable that looks like the loop counter, isn't. Instead,
a.) SPSS increments the invisible loop counter at the beginning of
every loop pass;
b.) It then assigns the 'loop counter' variable the new value of the
loop counter.

Among other things, see Gene's code for your problem, posted Tue, 3 Jul
2007 <10:06:02 -0400>, which I commented on Tue, 3 Jul 2007 <11:42:46
-0400>. I wrote,

>>Variable SAMPLE has values in 100 cases: 1 in case 1, 2 in case 1001,
>>3 in case 2001, etc.  You likely wanted SAMPLE to be 1 in cases
>>1-1000, 2 in cases 1001-2000, etc., and ran into a known glitch with
>>LOOP and INPUT PROGRAM. If you specify "LEAVE" for SAMPLE, you'll get
>>the above.

Here's what's going on:

. END CASE in an INPUT PROGRAM starts a new SPSS 'case'. As in
transformation programs, at the start of a new 'case' SPSS makes all
numerical variables system-missing and all string values blank, except
those for which LEAVE has been specified.
   That's why the values of the 'loop counter' variable SAMPLE are
system-missing after the first END CASE (there are 1,000 total) within
the loop.

By that logic, *all* values of SAMPLE after the first should be
missing, instead of the correct value being set at the beginning of
each pass through the loop
+  LOOP SAMPLE=1 TO 100.

SAMPLE is set correctly then, because (remember) SAMPLE is *not* the
loop counter. That counter is incremented internally, and its new value
assigned to SAMPLE, at the beginning of each loop pass.

Is all this clear? (Right)
..........................
Documentation:

There is a little about this; see example "* Modifying the loop
iteration variable" in the Command Syntax Reference (p.934ff in the
SPSS 15 edition).

Unfortunately, there's also this, earlier in the Command Syntax
Reference:

>The program sets the *indexing variable* to the *initial value* and
>increases it by the specified
>increment each time the loop is executed for a case. When the indexing
>variable reaches the
>specified *terminal value*, the loop is terminated for that case.
>(SPSS 15 edition, p.932)

A full explanation at that point would confuse the dickens out of new
users; but the above is a bit of a pitfall for those more experienced.

-Best of luck,
  Richard