Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

Art Kendall
Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

David Marso
Administrator

Art,
  Not to ruin your day ;-)
"* David Marso uses MATRIX."
This is why ;-)
* Note, no writing case index file to disc!!

"Andy W has an approach with more compact syntax that works with
original in long format."

This works in Long Format ;-) .

It is presently Beer O'Clock and counting !
y'all  have a great weekend ;-)

---
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.
MATRIX.
GET Data / FILE * / VAR ALL.
COMPUTE SIndex=MAKE(NROW(DATA) *!1,3,0).
LOOP #=1 TO !1.
COMPUTE Offset=(#-1)*NROW(DATA) .
+  LOOP ##=1 TO 1000.
+    COMPUTE SIndex(Offset + ##,1)=#.
+    COMPUTE SIndex(Offset + ##,2)=##.
+  END LOOP.
END LOOP.
COMPUTE CaseInd=TRUNC(UNIFORM(NROW(DATA) *!1,1)*NROW(DATA)  + 1).
LOOP #=1 TO NROW(SIndex).
+  COMPUTE SIndex(#,3)=Data(CaseInd(#)).
END LOOP.
SAVE ({SIndex,CaseInd})/ OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .
NEW FILE.
** Call as follows ** .
* Simulates 1000 draws from Uniform *.
SIM 1000.

**Bootstrap 100 samples of Size (N: NROWS in data )from whatever active file.
Boot 100.
----
SPLIT FILE BY Sample .
Blah Blah Blah....

Art Kendall wrote
Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

David Marso
Administrator
Even more concise:

DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.

MATRIX.
+  GET Data / FILE * / VAR ALL.
+  COMPUTE p=NROW(DATA).
+  COMPUTE SIndex=MAKE(p*!1,3,0).

+  LOOP #=1 TO !1.
+    LOOP ##=1 TO p.
+      COMPUTE SIndex((#-1) * p + ##,1:2)={#,##}.
+    END LOOP.
+  END LOOP.

+  COMPUTE CaseInd=TRUNC(UNIFORM(p *!1,1)* p  + 1).
+  LOOP #=1 TO NROW(SIndex).
+    COMPUTE SIndex(#,3)=Data(CaseInd(#)).
+  END LOOP.

+  SAVE ({SIndex,CaseInd}) / OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .

NEW FILE.
SIM 1000.
Boot 100.
David Marso wrote
Art,
  Not to ruin your day ;-)
"* David Marso uses MATRIX."
This is why ;-)
* Note, no writing case index file to disc!!

"Andy W has an approach with more compact syntax that works with
original in long format."

This works in Long Format ;-) .

It is presently Beer O'Clock and counting !
y'all  have a great weekend ;-)

---
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.
MATRIX.
GET Data / FILE * / VAR ALL.
COMPUTE SIndex=MAKE(NROW(DATA) *!1,3,0).
LOOP #=1 TO !1.
COMPUTE Offset=(#-1)*NROW(DATA) .
+  LOOP ##=1 TO 1000.
+    COMPUTE SIndex(Offset + ##,1)=#.
+    COMPUTE SIndex(Offset + ##,2)=##.
+  END LOOP.
END LOOP.
COMPUTE CaseInd=TRUNC(UNIFORM(NROW(DATA) *!1,1)*NROW(DATA)  + 1).
LOOP #=1 TO NROW(SIndex).
+  COMPUTE SIndex(#,3)=Data(CaseInd(#)).
END LOOP.
SAVE ({SIndex,CaseInd})/ OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .
NEW FILE.
** Call as follows ** .
* Simulates 1000 draws from Uniform *.
SIM 1000.

**Bootstrap 100 samples of Size (N: NROWS in data )from whatever active file.
Boot 100.
----
SPLIT FILE BY Sample .
Blah Blah Blah....

Art Kendall wrote
Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

Art Kendall
How much experience/expertise does it take to understand what is going on?    Most people would have to take it on faith.
Probably very machine efficient.  But I'm a curmudgeon about readability.

In the 60's the saying was "Machines are expensive, people are cheap."

Nowadays machines are cheap.

However, I do have to admit the I learn from the eloquence of your solutions.

Of course once SPSS lets one save to a dataset rather than a file none of of solutions will have to write to the disk.

it would be interesting to hear from list members how long it took them to understand my syntax and your syntax.
Art Kendall
Social Research Consultants
On 3/8/2013 8:57 PM, David Marso wrote:
Even more concise:

DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.

MATRIX.
+  GET Data / FILE * / VAR ALL.
+  COMPUTE p=NROW(DATA).
+  COMPUTE SIndex=MAKE(p*!1,3,0).

+  LOOP #=1 TO !1.
+    LOOP ##=1 TO p.
+      COMPUTE SIndex((#-1) * p + ##,1:2)={#,##}.
+    END LOOP.
+  END LOOP.

+  COMPUTE CaseInd=TRUNC(UNIFORM(p *!1,1)* p  + 1).
+  LOOP #=1 TO NROW(SIndex).
+    COMPUTE SIndex(#,3)=Data(CaseInd(#)).
+  END LOOP.

+  SAVE ({SIndex,CaseInd}) / OUTFILE * /VARIABLES Sample Index DataY
OrigCase.
END MATRIX.
!ENDDEFINE .

NEW FILE.
SIM 1000.
Boot 100.

David Marso wrote
Art,
  Not to ruin your day ;-)
"* David Marso uses MATRIX."
This is why ;-)
* Note, no writing case index file to disc!!

"Andy W has an approach with more compact syntax that works with
original in long format."

This works in Long Format ;-) .

It is presently Beer O'Clock and counting !
y'all  have a great weekend ;-)

---
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.
MATRIX.
GET Data / FILE * / VAR ALL.
COMPUTE SIndex=MAKE(NROW(DATA) *!1,3,0).
LOOP #=1 TO !1.
COMPUTE Offset=(#-1)*NROW(DATA) .
+  LOOP ##=1 TO 1000.
+    COMPUTE SIndex(Offset + ##,1)=#.
+    COMPUTE SIndex(Offset + ##,2)=##.
+  END LOOP.
END LOOP.
COMPUTE CaseInd=TRUNC(UNIFORM(NROW(DATA) *!1,1)*NROW(DATA)  + 1).
LOOP #=1 TO NROW(SIndex).
+  COMPUTE SIndex(#,3)=Data(CaseInd(#)).
END LOOP.
SAVE ({SIndex,CaseInd})/ OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .
NEW FILE.
** Call as follows ** .
* Simulates 1000 draws from Uniform *.
SIM 1000.

**Bootstrap 100 samples of Size (N: NROWS in data )from whatever active
file.
Boot 100.
----
SPLIT FILE BY Sample .
Blah Blah Blah....
Art Kendall wrote
Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants




-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-used-in-bootstrapping-complex-sampling-etc-Demo-syntax-tp5718495p5718503.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

kwame woei
In order to answer your question it would be nice to know the background. I don't have the faintest idea of the problem underlaying the syntax. Perhaps you can send the link of the discussion on which this is based? 

Op 9 mrt. 2013 om 15:18 heeft "Art Kendall" <[hidden email]> het volgende geschreven:

How much experience/expertise does it take to understand what is going on?    Most people would have to take it on faith.
Probably very machine efficient.  But I'm a curmudgeon about readability.

In the 60's the saying was "Machines are expensive, people are cheap."

Nowadays machines are cheap.

However, I do have to admit the I learn from the eloquence of your solutions.

Of course once SPSS lets one save to a dataset rather than a file none of of solutions will have to write to the disk.

it would be interesting to hear from list members how long it took them to understand my syntax and your syntax.
Art Kendall
Social Research Consultants
On 3/8/2013 8:57 PM, David Marso wrote:
Even more concise:

DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.

MATRIX.
+  GET Data / FILE * / VAR ALL.
+  COMPUTE p=NROW(DATA).
+  COMPUTE SIndex=MAKE(p*!1,3,0).

+  LOOP #=1 TO !1.
+    LOOP ##=1 TO p.
+      COMPUTE SIndex((#-1) * p + ##,1:2)={#,##}.
+    END LOOP.
+  END LOOP.

+  COMPUTE CaseInd=TRUNC(UNIFORM(p *!1,1)* p  + 1).
+  LOOP #=1 TO NROW(SIndex).
+    COMPUTE SIndex(#,3)=Data(CaseInd(#)).
+  END LOOP.

+  SAVE ({SIndex,CaseInd}) / OUTFILE * /VARIABLES Sample Index DataY
OrigCase.
END MATRIX.
!ENDDEFINE .

NEW FILE.
SIM 1000.
Boot 100.

David Marso wrote
Art,
  Not to ruin your day ;-)
"* David Marso uses MATRIX."
This is why ;-)
* Note, no writing case index file to disc!!

"Andy W has an approach with more compact syntax that works with
original in long format."

This works in Long Format ;-) .

It is presently Beer O'Clock and counting !
y'all  have a great weekend ;-)

---
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.
MATRIX.
GET Data / FILE * / VAR ALL.
COMPUTE SIndex=MAKE(NROW(DATA) *!1,3,0).
LOOP #=1 TO !1.
COMPUTE Offset=(#-1)*NROW(DATA) .
+  LOOP ##=1 TO 1000.
+    COMPUTE SIndex(Offset + ##,1)=#.
+    COMPUTE SIndex(Offset + ##,2)=##.
+  END LOOP.
END LOOP.
COMPUTE CaseInd=TRUNC(UNIFORM(NROW(DATA) *!1,1)*NROW(DATA)  + 1).
LOOP #=1 TO NROW(SIndex).
+  COMPUTE SIndex(#,3)=Data(CaseInd(#)).
END LOOP.
SAVE ({SIndex,CaseInd})/ OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .
NEW FILE.
** Call as follows ** .
* Simulates 1000 draws from Uniform *.
SIM 1000.

**Bootstrap 100 samples of Size (N: NROWS in data )from whatever active
file.
Boot 100.
----
SPLIT FILE BY Sample .
Blah Blah Blah....
Art Kendall wrote
Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants



-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-used-in-bootstrapping-complex-sampling-etc-Demo-syntax-tp5718495p5718503.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

Art Kendall
In order to answer your question it would be nice to know the background. I don't have the faintest idea of the problem underlaying the syntax. Perhaps you can send the link of the discussion on which this is based? 

Op 9 mrt. 2013 om 15:18 heeft "Art Kendall" <[hidden email]> het volgende geschreven:

How much experience/expertise does it take to understand what is going on?    Most people would have to take it on faith.
Probably very machine efficient.  But I'm a curmudgeon about readability.

In the 60's the saying was "Machines are expensive, people are cheap."

Nowadays machines are cheap.

However, I do have to admit the I learn from the eloquence of your solutions.

Of course once SPSS lets one save to a dataset rather than a file none of of solutions will have to write to the disk.

it would be interesting to hear from list members how long it took them to understand my syntax and your syntax.
Art Kendall
Social Research Consultants
On 3/8/2013 8:57 PM, David Marso wrote:
Even more concise:

DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.

MATRIX.
+  GET Data / FILE * / VAR ALL.
+  COMPUTE p=NROW(DATA).
+  COMPUTE SIndex=MAKE(p*!1,3,0).

+  LOOP #=1 TO !1.
+    LOOP ##=1 TO p.
+      COMPUTE SIndex((#-1) * p + ##,1:2)={#,##}.
+    END LOOP.
+  END LOOP.

+  COMPUTE CaseInd=TRUNC(UNIFORM(p *!1,1)* p  + 1).
+  LOOP #=1 TO NROW(SIndex).
+    COMPUTE SIndex(#,3)=Data(CaseInd(#)).
+  END LOOP.

+  SAVE ({SIndex,CaseInd}) / OUTFILE * /VARIABLES Sample Index DataY
OrigCase.
END MATRIX.
!ENDDEFINE .

NEW FILE.
SIM 1000.
Boot 100.

David Marso wrote
Art,
  Not to ruin your day ;-)
"* David Marso uses MATRIX."
This is why ;-)
* Note, no writing case index file to disc!!

"Andy W has an approach with more compact syntax that works with
original in long format."

This works in Long Format ;-) .

It is presently Beer O'Clock and counting !
y'all  have a great weekend ;-)

---
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.
MATRIX.
GET Data / FILE * / VAR ALL.
COMPUTE SIndex=MAKE(NROW(DATA) *!1,3,0).
LOOP #=1 TO !1.
COMPUTE Offset=(#-1)*NROW(DATA) .
+  LOOP ##=1 TO 1000.
+    COMPUTE SIndex(Offset + ##,1)=#.
+    COMPUTE SIndex(Offset + ##,2)=##.
+  END LOOP.
END LOOP.
COMPUTE CaseInd=TRUNC(UNIFORM(NROW(DATA) *!1,1)*NROW(DATA)  + 1).
LOOP #=1 TO NROW(SIndex).
+  COMPUTE SIndex(#,3)=Data(CaseInd(#)).
END LOOP.
SAVE ({SIndex,CaseInd})/ OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .
NEW FILE.
** Call as follows ** .
* Simulates 1000 draws from Uniform *.
SIM 1000.

**Bootstrap 100 samples of Size (N: NROWS in data )from whatever active
file.
Boot 100.
----
SPLIT FILE BY Sample .
Blah Blah Blah....
Art Kendall wrote
Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants


-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-used-in-bootstrapping-complex-sampling-etc-Demo-syntax-tp5718495p5718503.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Sampling WITH replacement used in bootstrapping, complex sampling etc. Demo syntax.

David Marso
Administrator
In reply to this post by Art Kendall
Perhaps this simplified version will bust through some of the cranial blockage ;-)
Experience/Expertise is earned through experience and sometimes hard knocks and faith is highly over-rated when it comes to 'programming'.  
Caveat: I post my code for free.  I don't take responsibility for others use of it.  Whether they experience an epiphany, a craniogasm or relative frustration is not part of the equation.  
FWIW: I do provide training for the perplexed.  Contact me for a quote!

--
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) /!POS !TOKENS(1)).
SET MXLOOPS 1000000.
MATRIX.
+  GET Data / FILE * / VAR ALL.
+  COMPUTE p=NROW(DATA).
+  COMPUTE nrepXp=!1*p.
+  COMPUTE SIndex=MAKE(nrepXp,1,0).
+  COMPUTE CaseInd=TRUNC(UNIFORM(nrepXp,1)* p  + 1).
+  LOOP #=1 TO nrepXp.
+    COMPUTE SIndex(#)=Data(CaseInd(#)).
+  END LOOP.
+  SAVE ({CaseInd,SIndex}) / OUTFILE * /VARIABLES OrigCase DataY .
END MATRIX.
COMPUTE samplenum=TRUNC(($CASENUM-1)/!2)+1 .
EXECUTE /*Optional */.
!ENDDEFINE .

NEW FILE.
SIM 1000.
Boot 100 1000.

Art Kendall wrote
How much experience/expertise does it take to understand what is
          going on?    Most people would
              have to take it on faith.
            Probably very machine efficient.  But I'm a
                  curmudgeon about readability.
                 
                  In the 60's the saying was "Machines
                    are expensive, people are cheap."
                   
                    Nowadays machines
                        are cheap.
                       
                        However, I do have to admit the
                          I learn from the eloquence of your solutions.
                         
                          Of course once SPSS lets one
                            save to a dataset rather
                              than a file none of of solutions will have
                              to write to the disk.
                             
                              it would be interesting to hear from list members how
                                long it took them to understand my
                                syntax and your syntax.
                           
      Art Kendall
Social Research Consultants
      On 3/8/2013 8:57 PM, David Marso wrote:
   
   
      Even more concise:

DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.

MATRIX.
+  GET Data / FILE * / VAR ALL.
+  COMPUTE p=NROW(DATA).
+  COMPUTE SIndex=MAKE(p*!1,3,0).

+  LOOP #=1 TO !1.
+    LOOP ##=1 TO p.
+      COMPUTE SIndex((#-1) * p + ##,1:2)={#,##}.
+    END LOOP.
+  END LOOP.

+  COMPUTE CaseInd=TRUNC(UNIFORM(p *!1,1)* p  + 1).
+  LOOP #=1 TO NROW(SIndex).
+    COMPUTE SIndex(#,3)=Data(CaseInd(#)).
+  END LOOP.

+  SAVE ({SIndex,CaseInd}) / OUTFILE * /VARIABLES Sample Index DataY
OrigCase.
END MATRIX.
!ENDDEFINE .

NEW FILE.
SIM 1000.
Boot 100.

David Marso wrote

     
        Art,
  Not to ruin your day ;-)
"* David Marso uses MATRIX."
This is why ;-)
* Note, no writing case index file to disc!!

"Andy W has an approach with more compact syntax that works with
original in long format."

This works in Long Format ;-) .

It is presently Beer O'Clock and counting !
y'all  have a great weekend ;-)

---
DEFINE SIM (!POS !TOKENS(1)).
MATRIX.
SAVE UNIFORM(!1,1) /OUTFILE * .
END MATRIX.
!ENDDEFINE .

DEFINE Boot (!POS !TOKENS(1) ).
SET MXLOOPS 1000000.
MATRIX.
GET Data / FILE * / VAR ALL.
COMPUTE SIndex=MAKE(NROW(DATA) *!1,3,0).
LOOP #=1 TO !1.
COMPUTE Offset=(#-1)*NROW(DATA) .
+  LOOP ##=1 TO 1000.
+    COMPUTE SIndex(Offset + ##,1)=#.
+    COMPUTE SIndex(Offset + ##,2)=##.
+  END LOOP.
END LOOP.
COMPUTE CaseInd=TRUNC(UNIFORM(NROW(DATA) *!1,1)*NROW(DATA)  + 1).
LOOP #=1 TO NROW(SIndex).
+  COMPUTE SIndex(#,3)=Data(CaseInd(#)).
END LOOP.
SAVE ({SIndex,CaseInd})/ OUTFILE * /VARIABLES Sample Index DataY OrigCase.
END MATRIX.
!ENDDEFINE .
NEW FILE.
** Call as follows ** .
* Simulates 1000 draws from Uniform *.
SIM 1000.

**Bootstrap 100 samples of Size (N: NROWS in data )from whatever active
file.
Boot 100.
----
SPLIT FILE BY Sample .
Blah Blah Blah....
Art Kendall wrote

       
          Based on part of an earlier discussion I put together this demo syntax.
* this syntax shows one approach to sampling WITH replacement.
* it does so by generating a random number which is used as
* an index into the vector of pop values.
* it has been tweaked based on a discussion on SPSSX-l.
* see the archives for a discussion entitled
* " Random sampling & matrix of histograms problem".
* Jon Peck mentioned that those who have the complex sampling module can
use that
* to draw samples with replacement.
* David Marso uses MATRIX.
* Andy W has an approach with more compact syntax that works with
original in long format.
* to use it you need to put your data in wide format and set the vector
to the pop size.
*
new file.
set seed 20130407.
input program.
    vector PopX(1000,f6.3).
    loop #i = 1 to 1000.
       compute PopX(#i) = rv.normal(0,1).
    end loop.
    end case.
    end file.
end input program.
execute.
dataset name madeup.
dataset activate madeup.

* this next command needs to be adapted to your data.
vector PopX = PopX1 to PopX1000.

* from pop of 1000 draw 100 samples of size 50 WITH replacement.
compute #PopSize = 1000.
compute #nSamples=100.
compute #nDraws=50.
numeric SampledX (f6.3) CasePicked (N7).
loop sample_id = 1 to #nSamples.
    loop draw = 1 to #nDraws.
       compute CasePicked = trunc(rv.uniform(1, (#PopSize+1))).
       compute SampledX = PopX(CasePicked).
       xsave outfile = 'c:\project\long1.sav' /keep =sample_id draw
CasePicked SampledX.
    end loop.
end loop.
execute.
get file= 'c:\project\long1.sav'.
dataset name longy.
split file by sample_id.
frequencies vars = casepicked /format=dfreq.
descriptives variables = SampledX.

--
Art Kendall
Social Research Consultants

       
     
     




-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-used-in-bootstrapping-complex-sampling-etc-Demo-syntax-tp5718495p5718503.html 
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]  (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


   
   
 


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"