Random sampling & matrix of histograms problem

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Random sampling & matrix of histograms problem

Petro Poutanen
Dear people on the list,

I have tried to get my head around one syntax command..

My problem is:

1) I have an empirical distribution of a variable x with, say, 1000 observations

2) I want to take 100 (or n) amount of random samples (with replacement) from x so that each sample size is, for example, 10 % of the x.

3) I need those random samples as variables x1...x100 into a new data set.


and the part 4 is actually a bit another issue...

4) is there a possibility to plot several histograms (of different variables) as a matrix with set dimensions (say 3x3 matrix)? This is a common way to plot your results, but I haven't yet figured out any other way than reordering the whole data into a list with a grouping variable of the old variables and then use that "variable group" as categorical variable in a panel plot for the whole list...

Thanks a lot!

-Petro P.
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Andy W
Yes, I don't know of an easy way to random sample. Below is a thought I had awhile ago of generating a second dataset and then table matching to that. Unfortunately this approach can't be written into a MACRO (you can't include INPUT PROGRAM in a MACRO) - so I look forward to other solutions.

This just takes advantage of the random uniform sampling procedure, between 1 and n of the original sample size, and then makes 9 runs. The data isn't returned in wide format like requested, but IMO it is frequently better to have the data like this anyway and use the SPLIT procedure to return stats on the subsets.

************************************************************************************************.
*Original Dataset.
set seed = 5.
input program.
loop #i = 1 to 1600.
    compute X = RV.NORMAL(0,1).
    compute id = #i.
    end case.
end loop.
end file.
end input program.
dataset name orig.
exe.


*Making a dataset with random samples with replacement - need to know N of original dataset beforehand.
set seed = 10.
input program.
loop #iter = 1 to 9.             /*This is the number of replications */.
    loop #rand = 1 to 100.    /*This is the number of random samples with replacement */.
        compute #n = 1600.   /*You need to supply this info - this is the number of records in original database */.
        compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
        compute run = #iter.
        end case.
    end loop.
end loop.
end file.
end input program.
dataset name rand_samps.
exe.

*now just table match the orig dataset to the random samples dataset.
dataset activate rand_samps.
sort cases by id.
match files file = *
/table = 'orig'
/by id.
exe.
************************************************************************************************.

I wonder if the newer bootstrapping procedures (or maybe even the old one in NLR) can be hacked to return the needed ID's with replacement. A matrix procedure should be possible as well (which can be called in a macro), but I'm not as saavy with that to give a quick answer.

I don't know what your asking for 4, SPSS can produce small multiples if that is what your asking.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

David Marso
Administrator
"you can't include INPUT PROGRAM in a MACRO) - so I look forward to other solutions. "
Sure you can.
Err... Have you tried?
INPUT PROGRAM is perfectly usable within MACRO!!
BEGIN DATA and END DATA are not permitted (for some reason which maybe JoNoH can elaborate).

---
Andy W wrote
Yes, I don't know of an easy way to random sample. Below is a thought I had awhile ago of generating a second dataset and then table matching to that. Unfortunately this approach can't be written into a MACRO (you can't include INPUT PROGRAM in a MACRO) - so I look forward to other solutions.

This just takes advantage of the random uniform sampling procedure, between 1 and n of the original sample size, and then makes 9 runs. The data isn't returned in wide format like requested, but IMO it is frequently better to have the data like this anyway and use the SPLIT procedure to return stats on the subsets.

************************************************************************************************.
*Original Dataset.
set seed = 5.
input program.
loop #i = 1 to 1600.
    compute X = RV.NORMAL(0,1).
    compute id = #i.
    end case.
end loop.
end file.
end input program.
dataset name orig.
exe.


*Making a dataset with random samples with replacement - need to know N of original dataset beforehand.
set seed = 10.
input program.
loop #iter = 1 to 9.             /*This is the number of replications */.
    loop #rand = 1 to 100.    /*This is the number of random samples with replacement */.
        compute #n = 1600.   /*You need to supply this info - this is the number of records in original database */.
        compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
        compute run = #iter.
        end case.
    end loop.
end loop.
end file.
end input program.
dataset name rand_samps.
exe.

*now just table match the orig dataset to the random samples dataset.
dataset activate rand_samps.
sort cases by id.
match files file = *
/table = 'orig'
/by id.
exe.
************************************************************************************************.

I wonder if the newer bootstrapping procedures (or maybe even the old one in NLR) can be hacked to return the needed ID's with replacement. A matrix procedure should be possible as well (which can be called in a macro), but I'm not as saavy with that to give a quick answer.

I don't know what your asking for 4, SPSS can produce small multiples if that is what your asking.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Mike
In reply to this post by Andy W
Perhaps I have missed it but has anyone suggested the old
SPSS command "Sample"?  Here is info from the SPSS v18
syntax reference:

SAMPLE

SAMPLE {decimal value} or {n FROM m }

This command does not read the active dataset. It is stored, pending
execution with the next command that reads the dataset. For more
information, see the topic Command Order on p. 40.
Example
SAMPLE .25.

Overview

SAMPLE permanently draws a random sample of cases for
processing in all subsequent procedures.

For a temporary sample, use a TEMPORARY command before SAMPLE.

Basic Specification

The basic specification is either a decimal value between 0 and 1 or the
sample size followed by keyword FROM and the size of the active dataset.
- To select an approximate percentage of cases, specify a decimal
value between 0 and 1.
- To select an exact-size random sample, specify a positive integer that
is less than the file size, and follow it with keyword FROM and the file
size.

Operations
- SAMPLE is a permanent transformation.

- Sampling is based on a pseudo-random-number generator that
depends on a seed value that is established by the program. On
some implementations of the program, this number defaults to a
fixed integer, and a SAMPLE command that specifies n FROM m
will generate the identical sample whenever a session is rerun.
To generate a different sample each time, use the SET command
to reset SEED to a different value for each session. See the SET command
for more information.

So, I think something like:

Temporary.
Sample 100 from 1000.
save outfile= etc.

Probably embed it in a loop or other structure to generate as
many samples as one wants (probably create a new variable
ranging from 1 to 100 in all files which would allow one to
use Match files to combine them all into a single file).
Or something like that.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Andy W" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, March 07, 2013 8:54 AM
Subject: Re: Random sampling & matrix of histograms problem


> Yes, I don't know of an easy way to random sample. Below is a thought I
> had
> awhile ago of generating a second dataset and then table matching to that.
> Unfortunately this approach can't be written into a MACRO (you can't
> include
> INPUT PROGRAM in a MACRO) - so I look forward to other solutions.
>
> This just takes advantage of the random uniform sampling procedure,
> between
> 1 and n of the original sample size, and then makes 9 runs. The data isn't
> returned in wide format like requested, but IMO it is frequently better to
> have the data like this anyway and use the SPLIT procedure to return stats
> on the subsets.
>
> ************************************************************************************************.
> *Original Dataset.
> set seed = 5.
> input program.
> loop #i = 1 to 1600.
>    compute X = RV.NORMAL(0,1).
>    compute id = #i.
>    end case.
> end loop.
> end file.
> end input program.
> dataset name orig.
> exe.
>
>
> *Making a dataset with random samples with replacement - need to know N of
> original dataset beforehand.
> set seed = 10.
> input program.
> loop #iter = 1 to 9.             /*This is the number of replications */.
>    loop #rand = 1 to 100.    /*This is the number of random samples with
> replacement */.
>        compute #n = 1600.   /*You need to supply this info - this is the
> number of records in original database */.
>        compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
>        compute run = #iter.
>        end case.
>    end loop.
> end loop.
> end file.
> end input program.
> dataset name rand_samps.
> exe.
>
> *now just table match the orig dataset to the random samples dataset.
> dataset activate rand_samps.
> sort cases by id.
> match files file = *
> /table = 'orig'
> /by id.
> exe.
> ************************************************************************************************.
>
> I wonder if the newer bootstrapping procedures (or maybe even the old one
> in
> NLR) can be hacked to return the needed ID's with replacement. A matrix
> procedure should be possible as well (which can be called in a macro), but
> I'm not as saavy with that to give a quick answer.
>
> I don't know what your asking for 4, SPSS can produce small multiples if
> that is what your asking.
>
>
>
>
> -----
> Andy W
> [hidden email]
> http://andrewpwheeler.wordpress.com/
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718436.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Andy W
Hi Mike,

I slightly misspoke in my original. This would be fine (and preferable) for sampling WITHOUT replacement. If you want to sample with replacement though this doesn't work. For example, bootstrapping you typically sample with replacement, so this wouldn't work.

Andy


On Thu, Mar 7, 2013 at 9:11 AM, Mike Palij <[hidden email]> wrote:
Perhaps I have missed it but has anyone suggested the old
SPSS command "Sample"?  Here is info from the SPSS v18
syntax reference:

SAMPLE

SAMPLE {decimal value} or {n FROM m }

This command does not read the active dataset. It is stored, pending
execution with the next command that reads the dataset. For more
information, see the topic Command Order on p. 40.
Example
SAMPLE .25.

Overview

SAMPLE permanently draws a random sample of cases for
processing in all subsequent procedures.

For a temporary sample, use a TEMPORARY command before SAMPLE.

Basic Specification

The basic specification is either a decimal value between 0 and 1 or the
sample size followed by keyword FROM and the size of the active dataset.
- To select an approximate percentage of cases, specify a decimal
value between 0 and 1.
- To select an exact-size random sample, specify a positive integer that
is less than the file size, and follow it with keyword FROM and the file size.

Operations
- SAMPLE is a permanent transformation.

- Sampling is based on a pseudo-random-number generator that
depends on a seed value that is established by the program. On
some implementations of the program, this number defaults to a
fixed integer, and a SAMPLE command that specifies n FROM m
will generate the identical sample whenever a session is rerun.
To generate a different sample each time, use the SET command
to reset SEED to a different value for each session. See the SET command
for more information.

So, I think something like:

Temporary.
Sample 100 from 1000.
save outfile= etc.

Probably embed it in a loop or other structure to generate as
many samples as one wants (probably create a new variable
ranging from 1 to 100 in all files which would allow one to
use Match files to combine them all into a single file).
Or something like that.

-Mike Palij
New York University
[hidden email]


----- Original Message ----- From: "Andy W" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, March 07, 2013 8:54 AM
Subject: Re: Random sampling & matrix of histograms problem


Yes, I don't know of an easy way to random sample. Below is a thought I had
awhile ago of generating a second dataset and then table matching to that.
Unfortunately this approach can't be written into a MACRO (you can't include
INPUT PROGRAM in a MACRO) - so I look forward to other solutions.

This just takes advantage of the random uniform sampling procedure, between
1 and n of the original sample size, and then makes 9 runs. The data isn't
returned in wide format like requested, but IMO it is frequently better to
have the data like this anyway and use the SPLIT procedure to return stats
on the subsets.

************************************************************************************************.
*Original Dataset.
set seed = 5.
input program.
loop #i = 1 to 1600.
   compute X = RV.NORMAL(0,1).
   compute id = #i.
   end case.
end loop.
end file.
end input program.
dataset name orig.
exe.


*Making a dataset with random samples with replacement - need to know N of
original dataset beforehand.
set seed = 10.
input program.
loop #iter = 1 to 9.             /*This is the number of replications */.
   loop #rand = 1 to 100.    /*This is the number of random samples with
replacement */.
       compute #n = 1600.   /*You need to supply this info - this is the
number of records in original database */.
       compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
       compute run = #iter.
       end case.
   end loop.
end loop.
end file.
end input program.
dataset name rand_samps.
exe.

*now just table match the orig dataset to the random samples dataset.
dataset activate rand_samps.
sort cases by id.
match files file = *
/table = 'orig'
/by id.
exe.
************************************************************************************************.

I wonder if the newer bootstrapping procedures (or maybe even the old one in
NLR) can be hacked to return the needed ID's with replacement. A matrix
procedure should be possible as well (which can be called in a macro), but
I'm not as saavy with that to give a quick answer.

I don't know what your asking for 4, SPSS can produce small multiples if
that is what your asking.




-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718436.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




--
Andrew Wheeler
Doctoral Student
School of Criminal Justice
University at Albany, SUNY
http://andrewpwheeler.wordpress.com/
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

David Marso
Administrator
In reply to this post by Petro Poutanen
http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-td5618318.html#a5620719
--
Petro Poutanen wrote
Dear people on the list,

I have tried to get my head around one syntax command..

My problem is:

1) I have an empirical distribution of a variable x with, say, 1000
observations

2) I want to take 100 (or n) amount of random samples (with
replacement) from x so that each sample size is, for example, 10 % of the x.

3) I need those random samples as variables x1...x100 into a new data set.


and the part 4 is actually a bit another issue...

4) is there a possibility to plot several histograms (of different
variables) as a matrix with set dimensions (say 3x3 matrix)? This is a
common way to plot your results, but I haven't yet figured out any other
way than reordering the whole data into a list with a grouping variable of
the old variables and then use that "variable group" as categorical
variable in a panel plot for the whole list...

Thanks a lot!

-Petro P.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Art Kendall
In reply to this post by Petro Poutanen
this is may not be elegant, but you only have a small pop only so I would not worry about machine efficiency.


new file.
input program.
   loop id = 1 to 1000.
      compute x = rv.normal(0,1).
      end case.
   end loop.
   end file.
end input program.
formats x(f6.3).
execute.
dataset name madeup.


new file.
set seed 20130307.
input program.
   vector sampleflag (100,f1).
   loop id = 1 to 1000.
      loop #sample= 1 to 100.
         compute sampleflag(#sample) =rv.uniform(0,1) le .10.
      end loop.
      end case.
   end loop.
   end file.
end input program.
dataset name sampleflags.
descriptives vars= sampleflag1 to sampleflag100.

match file file= madeup/file=sampleflags /by id.
dataset name combined.
do repeat
   xsample = xsample1 to xsample100
   /flag = sampleflag1 to sampleflag100.
   do if not sampleflag.
      compute xsample = 99999.
   else if sampleflag.
      compute xsample = x.
   ELSE.
      print 'oops!'.
   end if.
end repeat.
formats xsample1 to xsample100 (f6.3).
missing values xsample1 to xsample100 (99999).
descriptives vars = xsample1 to xsample100.









Art Kendall
Social Research Consultants
On 3/6/2013 6:54 PM, Petro Poutanen wrote:
Dear people on the list,

I have tried to get my head around one syntax command..

My problem is:

1) I have an empirical distribution of a variable x with, say, 1000 observations

2) I want to take 100 (or n) amount of random samples (with replacement) from x so that each sample size is, for example, 10 % of the x.

3) I need those random samples as variables x1...x100 into a new data set.


and the part 4 is actually a bit another issue...

4) is there a possibility to plot several histograms (of different variables) as a matrix with set dimensions (say 3x3 matrix)? This is a common way to plot your results, but I haven't yet figured out any other way than reordering the whole data into a list with a grouping variable of the old variables and then use that "variable group" as categorical variable in a panel plot for the whole list...

Thanks a lot!

-Petro P.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Jon K Peck
In reply to this post by Andy W
Random sampling with replacement is simple to do with Complex Samples procedures.  Just set the "with replacement" button in the Sampling Wizard (Analyze > Complex Samples > Select a Sample), specify your sample size, and give it a dataset name.  The wizard generates
CSPLAN and CSSELECT commands.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email],
Date:        03/07/2013 06:56 AM
Subject:        Re: [SPSSX-L] Random sampling & matrix of histograms problem
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Yes, I don't know of an easy way to random sample. Below is a thought I had
awhile ago of generating a second dataset and then table matching to that.
Unfortunately this approach can't be written into a MACRO (you can't include
INPUT PROGRAM in a MACRO) - so I look forward to other solutions.

This just takes advantage of the random uniform sampling procedure, between
1 and n of the original sample size, and then makes 9 runs. The data isn't
returned in wide format like requested, but IMO it is frequently better to
have the data like this anyway and use the SPLIT procedure to return stats
on the subsets.

************************************************************************************************.
*Original Dataset.
set seed = 5.
input program.
loop #i = 1 to 1600.
   compute X = RV.NORMAL(0,1).
   compute id = #i.
   end case.
end loop.
end file.
end input program.
dataset name orig.
exe.


*Making a dataset with random samples with replacement - need to know N of
original dataset beforehand.
set seed = 10.
input program.
loop #iter = 1 to 9.             /*This is the number of replications */.
   loop #rand = 1 to 100.    /*This is the number of random samples with
replacement */.
       compute #n = 1600.   /*You need to supply this info - this is the
number of records in original database */.
       compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
       compute run = #iter.
       end case.
   end loop.
end loop.
end file.
end input program.
dataset name rand_samps.
exe.

*now just table match the orig dataset to the random samples dataset.
dataset activate rand_samps.
sort cases by id.
match files file = *
/table = 'orig'
/by id.
exe.
************************************************************************************************.

I wonder if the newer bootstrapping procedures (or maybe even the old one in
NLR) can be hacked to return the needed ID's with replacement. A matrix
procedure should be possible as well (which can be called in a macro), but
I'm not as saavy with that to give a quick answer.

I don't know what your asking for 4, SPSS can produce small multiples if
that is what your asking.




-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718436.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

David Marso
Administrator
Let's see... Looking around on my simple ver 21 base system I see no Complex Samples option ;-(((
OH I see, it is a separate  **ADDON** and for a mere $583.00 + tax USD it can be added to your system!
https://www-112.ibm.com/software/howtobuy/buyingtools/paexpress/Express?part_number=D0ELBLL%2CD0EE2LL%2CD0EL9LL%2CD0EDSLL&catalogLocale=en_US&Locale=null&country=USA&PT=html&TACTICS=%26S_TACT%3D%26S_CMP%3D%26brand%3D&ibm-submit=View+US+prices+%26+buy
Hey JoNoH, I thought you worked in the development group, not SALES!!!
Simple answer is build vectors and pop the caseID and rock and roll (see Art's post and the thread I alluded to previously).
I posted this sort of stuff about 15 years ago and it is utterly trivial to do SWR with a tiny bit of code!!!!!
----

Jon K Peck wrote
Random sampling with replacement is simple to do with Complex Samples
procedures.  Just set the "with replacement" button in the Sampling Wizard
(Analyze > Complex Samples > Select a Sample), specify your sample size,
and give it a dataset name.  The wizard generates
CSPLAN and CSSELECT commands.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:   Andy W <[hidden email]>
To:     [hidden email],
Date:   03/07/2013 06:56 AM
Subject:        Re: [SPSSX-L] Random sampling & matrix of histograms
problem
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Yes, I don't know of an easy way to random sample. Below is a thought I
had
awhile ago of generating a second dataset and then table matching to that.
Unfortunately this approach can't be written into a MACRO (you can't
include
INPUT PROGRAM in a MACRO) - so I look forward to other solutions.

This just takes advantage of the random uniform sampling procedure,
between
1 and n of the original sample size, and then makes 9 runs. The data isn't
returned in wide format like requested, but IMO it is frequently better to
have the data like this anyway and use the SPLIT procedure to return stats
on the subsets.

************************************************************************************************.
*Original Dataset.
set seed = 5.
input program.
loop #i = 1 to 1600.
    compute X = RV.NORMAL(0,1).
    compute id = #i.
    end case.
end loop.
end file.
end input program.
dataset name orig.
exe.


*Making a dataset with random samples with replacement - need to know N of
original dataset beforehand.
set seed = 10.
input program.
loop #iter = 1 to 9.             /*This is the number of replications */.
    loop #rand = 1 to 100.    /*This is the number of random samples with
replacement */.
        compute #n = 1600.   /*You need to supply this info - this is the
number of records in original database */.
        compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
        compute run = #iter.
        end case.
    end loop.
end loop.
end file.
end input program.
dataset name rand_samps.
exe.

*now just table match the orig dataset to the random samples dataset.
dataset activate rand_samps.
sort cases by id.
match files file = *
/table = 'orig'
/by id.
exe.
************************************************************************************************.

I wonder if the newer bootstrapping procedures (or maybe even the old one
in
NLR) can be hacked to return the needed ID's with replacement. A matrix
procedure should be possible as well (which can be called in a macro), but
I'm not as saavy with that to give a quick answer.

I don't know what your asking for 4, SPSS can produce small multiples if
that is what your asking.




-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718436.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Andy W
In reply to this post by David Marso
Ahh yes David you are correct I was confusing input program with begin-end commands, (and I do have it in some of my MACRO's even!). I should have known to search the list to see if you had already posted a solution!

Art's solution still only produces random sampling WITHOUT replacement. As Mike stated, if that is all you want probably SPSS's sample function within a (MACRO) loop will be fine.

David's matrix solution avoids building the massive dataset mine does, certainly preferable for many big data problems (or a large number of repetitions). Also I don't know, is it usual for bootstrapping to make the bootstrapped estimate sample the same size of the original dataset?

One of the annoyances with the matrix procedure though is that you can't run the more complex regression procedures (or at least I don't have the chops to write them up myself in matrix language). It seems if you have the license for complex samples it makes this all somewhat moot (although I don't have it so it is not moot for me personally!)

Andy

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Petro Poutanen
Thanks a lot for the advice! I'm sure I can figure it out by using the logic as suggested by Andy (or Art).

(And btw, I actually seem to do have an access to Complex samples procedure - so maybe to learn to use it as well..)

-P


2013/3/7 Andy W <[hidden email]>
Ahh yes David you are correct I was confusing input program with begin-end
commands, (and I do have it in some of my MACRO's even!). I should have
known to search the list to see if you had already posted a solution!

Art's solution still only produces random sampling WITHOUT replacement. As
Mike stated, if that is all you want probably SPSS's sample function within
a (MACRO) loop will be fine.

David's matrix solution avoids building the massive dataset mine does,
certainly preferable for many big data problems (or a large number of
repetitions). Also I don't know, is it usual for bootstrapping to make the
bootstrapped estimate sample the same size of the original dataset?

One of the annoyances with the matrix procedure though is that you can't run
the more complex regression procedures (or at least I don't have the chops
to write them up myself in matrix language). It seems if you have the
license for complex samples it makes this all somewhat moot (although I
don't have it so it is not moot for me personally!)

Andy
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718451.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Art Kendall
In reply to this post by Petro Poutanen
Try this.

new file.
set seed 20130407.
input program.
   vector PopX(1000,f6.3).
   loop #i = 1 to 1000.
      compute PopX(#i) = rv.normal(0,1).
   end loop.
   end case.
   end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.
* from pop of 1000 draw 100 samples of size 50 with replacement.
vector PopX = PopX1 to PopX1000.
display vector.
numeric SampledX (f6.3).
loop sample_id = 1 to 100.
   loop draw = 1 to 50.
      compute SampledX = PopX(rnd(rv.uniform(.5,1000.5))).
      xsave outfile = 'c:\project\long.sav' /keep =sample_id draw SampledX.
   end loop.
end loop.
execute.
get file= 'c:\project\long.sav'.
dataset name longy.
descriptives variables = SampledX.



Art Kendall
Social Research Consultants
On 3/6/2013 6:54 PM, Petro Poutanen wrote:
Dear people on the list,

I have tried to get my head around one syntax command..

My problem is:

1) I have an empirical distribution of a variable x with, say, 1000 observations

2) I want to take 100 (or n) amount of random samples (with replacement) from x so that each sample size is, for example, 10 % of the x.

3) I need those random samples as variables x1...x100 into a new data set.


and the part 4 is actually a bit another issue...

4) is there a possibility to plot several histograms (of different variables) as a matrix with set dimensions (say 3x3 matrix)? This is a common way to plot your results, but I haven't yet figured out any other way than reordering the whole data into a list with a grouping variable of the old variables and then use that "variable group" as categorical variable in a panel plot for the whole list...

Thanks a lot!

-Petro P.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Andy W
Art,

This is something I was trying to tell Mike as well (in a off-Nabble email correspondence). Sampling multiple times WITHOUT REPLACEMENT is not the same as sampling one time with replacement. Your code, still, is just a sample without replacement, conducted 50 times.

Please, look at the code I posted, and/or David's matrix bootstrapping procedure. You should see the potential difference in that one of the iterations you can have the same id sampled multiple times, whereas your approaches (and the sample function) can never re-sample the same record more than 1 time in any iteration.

Andy

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Zuluaga, Juan
In reply to this post by Petro Poutanen
For some things, SPSS is great. But in this case, do yourself a favor and install the R plugin.

BEGIN PROGRAM R.

numsamples = 100
samplesize = 100
x = rnorm(1000)
NewDataSet = matrix(nrow=samplesize,ncol=numsamples)
for (i in 1:numsamples)
  {  sampled.x = sample(x, size=samplesize, replace = TRUE)
     NewDataSet[,i] = sampled.x
  }

plot(NewDataSet[,23], NewDataSet[,57])

END PROGRAM.


-----Original Message-----

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

David Marso
Administrator
Here is an incomplete thought which has a tidy relationship to my other code but is a little bit oblique.
I will leave it to those who actually need to do this to contemplate its essence and adapt it to their pain.
Hint (multiply X by a factor (of your choice) and select critical cumulative cut off (CW) for inclusion).  
The weighted X will be the Bootstrap sampling frequencies for each case for number of samples .
* This might actually be one of my rare brain farts, but I think it is on the right track *.
* No time to verify if it's a gem or a train wreck waiting to happen *.
--
MATRIX.
SAVE UNIFORM(1000,100)/ OUTFILE * /VAR X001 TO X100.
END MATRIX.
VARSTOCASES  /ID=id  /MAKE X FROM X001 TO X100 /INDEX=Index1(X).
SORT CASES BY Index1 (A) X (D).
COMPUTE X=X * 'your call here....Must be > 2 since E (uniform)=.5'
SPLIT FILE BY Index1.
CREATE CW=CSUM(X).
SELECT IF CW LE 'your call here....' (probably 1000 for this example).


Zuluaga, Juan wrote
For some things, SPSS is great. But in this case, do yourself a favor and install the R plugin.

BEGIN PROGRAM R.

numsamples = 100
samplesize = 100
x = rnorm(1000)
NewDataSet = matrix(nrow=samplesize,ncol=numsamples)
for (i in 1:numsamples)
  {  sampled.x = sample(x, size=samplesize, replace = TRUE)
     NewDataSet[,i] = sampled.x
  }

plot(NewDataSet[,23], NewDataSet[,57])

END PROGRAM.


-----Original Message-----

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Random sampling & matrix of histograms problem

Fuller, Matthew
I will be out of the office until March 11th, with limited access to e-mail.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Art Kendall
In reply to this post by Andy W
this is the last set of syntax with four tweaks.
1) is screated a new variable CasePicked.
2) I added CasePicked to the list of variables written out.
3) I split the file by sample_id.
3) There is a frequencies on CasePicked for each sample.

new file.
set seed 20130407.
input program.
   vector PopX(1000,f6.3).
   loop #i = 1 to 1000.
      compute PopX(#i) = rv.normal(0,1).
   end loop.
   end case.
   end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.
* from pop of 1000 draw 100 samples of size 50 with replacement.
vector PopX = PopX1 to PopX1000.
display vector.
numeric SampledX (f6.3).
loop sample_id = 1 to 100.
   loop draw = 1 to 50.
      compute CasePicked = rnd(rv.uniform(.5,1000.5)).
      compute SampledX = PopX(CasePicked).
      xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw CasePicked SampledX.
   end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
descriptives variables = SampledX.
split file by sample_id.
frequencies vars = casepicked.



Art Kendall
Social Research Consultants
On 3/7/2013 4:28 PM, Andy W wrote:
Art,

This is something I was trying to tell Mike as well (in a off-Nabble email
correspondence). Sampling multiple times WITHOUT REPLACEMENT is not the same
as sampling one time with replacement. Your code, still, is just a sample
without replacement, conducted 50 times.

Please, look at the code I posted, and/or David's matrix bootstrapping
procedure. You should see the potential difference in that one of the
iterations you can have the same id sampled multiple times, whereas your
approaches (and the sample function) can never re-sample the same record
more than 1 time in any iteration.

Andy





-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718469.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Art Kendall
It is easier to see that some cases are picked more than once if you change the frequencies command in the last syntax I posted.
frequencies vars = casepicked /format=dfreq.

that pops the cases that are picked more than once for a sample to the top of the frequency table.
Art Kendall
Social Research Consultants
On 3/8/2013 8:12 AM, Art Kendall wrote:
this is the last set of syntax with four tweaks.
1) is screated a new variable CasePicked.
2) I added CasePicked to the list of variables written out.
3) I split the file by sample_id.
3) There is a frequencies on CasePicked for each sample.

new file.
set seed 20130407.
input program.
   vector PopX(1000,f6.3).
   loop #i = 1 to 1000.
      compute PopX(#i) = rv.normal(0,1).
   end loop.
   end case.
   end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.
* from pop of 1000 draw 100 samples of size 50 with replacement.
vector PopX = PopX1 to PopX1000.
display vector.
numeric SampledX (f6.3).
loop sample_id = 1 to 100.
   loop draw = 1 to 50.
      compute CasePicked = rnd(rv.uniform(.5,1000.5)).
      compute SampledX = PopX(CasePicked).
      xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw CasePicked SampledX.
   end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
descriptives variables = SampledX.
split file by sample_id.
frequencies vars = casepicked.



Art Kendall
Social Research Consultants
On 3/7/2013 4:28 PM, Andy W wrote:
Art,

This is something I was trying to tell Mike as well (in a off-Nabble email
correspondence). Sampling multiple times WITHOUT REPLACEMENT is not the same
as sampling one time with replacement. Your code, still, is just a sample
without replacement, conducted 50 times.

Please, look at the code I posted, and/or David's matrix bootstrapping
procedure. You should see the potential difference in that one of the
iterations you can have the same id sampled multiple times, whereas your
approaches (and the sample function) can never re-sample the same record
more than 1 time in any iteration.

Andy





-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Random-sampling-matrix-of-histograms-problem-tp5718425p5718469.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Random sampling & matrix of histograms problem

Buhi, Eric
Banned User
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Random sampling & matrix of histograms problem

Andy W
In reply to this post by Art Kendall
Yes Art I see!,

I walked through the code and this time it certainly is sampling with replacement. I tried to amend it to work with data in long format (instead of having the data being sampled in wide column format), but I was unsuccessful.

In general though I don't see why I would prefer this to the approach I posted at the onset of series of emails (feel free to enlighten me). To make your approach work you would need to flip the original data, which is an expensive procedure. You also need to externally write a file with XSAVE.

While you are right in these things aren't a big deal for small datasets, this is more code, making it intrinsically more complicated. So again, why exactly would your approach be preferable?

David,

I liked your prior MATRIX bootstrap code better than the new snippet (and the code I provided at the beginning of the post, which is almost an exact duplicate of what you wrote in 1996 holy poopers!).

Mainly I'm concerned about the VARSTOCASES when either the number of original cases is larger or the number of samples needed is larger. I wouldn't want to stack the dataset and then sample if the original OP's request was with a population of 40,000 cases and he wanted 1,000 samples (i.e. a stacked dataset of 40 million). The problem grows with the size of the original population even if the number or size of samples needed does not. It does plug away though like a charm even with 40,000 cases and 1,000 samples!

Of course, whatever procedures individuals utilize will be dependent on the nature of the task and size of the data. I believe your MATRIX procedure could be modified to work in alot of situations. Either by calculating the stats right within a MATRIX loop, or by piping out to a new dataset, calculating the stats, and iterating for the number of repetitions one wants.

I'm thinking of here problems that are too big to practically stack the data and use split file. Otherwise, I'm personally pretty cool with the solution you posted over 16 years ago!

Andy



Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
12