Interquartile Range in Bootstrapping

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Interquartile Range in Bootstrapping

Bryan Mac
Hi,

I have SPSS 23 without the Bootstrapping module installed on my computer. I am also running Windows 7.

I have looked through various forums and I am confused on how to bootstrap interquartile range without the bootstrap module installed.

Thanks!

Bryan Mac
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

Bruce Weaver
Administrator
This thread from a few years ago may give you some ideas.  

http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-td5618318.html

HTH.

Bryan Mac wrote
Hi,

I have SPSS 23 without the Bootstrapping module installed on my computer. I am also running Windows 7.

I have looked through various forums and I am confused on how to bootstrap interquartile range without the bootstrap module installed.

Thanks!

Bryan Mac
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

David Marso
Administrator

Yes, Just generate the samples using the MATRIX code (beginning part) SAVE before the END LOOP and then use OMS with FREQ (SPLIT FILE) with PERCENTILE 25 75 then mop up the mess from OMS.
------------------------
Bruce Weaver wrote
This thread from a few years ago may give you some ideas.  

http://spssx-discussion.1045642.n5.nabble.com/Sampling-WITH-replacement-td5618318.html

HTH.

Bryan Mac wrote
Hi,

I have SPSS 23 without the Bootstrapping module installed on my computer. I am also running Windows 7.

I have looked through various forums and I am confused on how to bootstrap interquartile range without the bootstrap module installed.

Thanks!

Bryan Mac
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

Bryan Mac
Is this the syntax for what you are suggesting? I'm new to creating syntaxes in SPSS.

LOOP CASEID=1 TO N.
COMPUTE SAMPLES(CASEID)=DATA(RARRAY(CASEID)).
SAVE
OMS FREQ 25 75
SPLIT FILE
OMSEND
END LOOP.
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

David Marso
Administrator
You need the top part too.  Read up on MATRIX command.
Also look at OMS.  That just captures output.  FREQ is what generates the PTiles.
Procs can't be in loops. Long weekend for you.

On Sun, Sep 4, 2016 at 3:08 AM, Bryan Mac [via SPSSX Discussion] <[hidden email]> wrote:
Is this the syntax for what you are suggesting? I'm new to creating syntaxes in SPSS.

LOOP CASEID=1 TO N.
COMPUTE SAMPLES(CASEID)=DATA(RARRAY(CASEID)).
SAVE
OMS FREQ 25 75
SPLIT FILE
OMSEND
END LOOP.



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Interquartile-Range-in-Bootstrapping-tp5733036p5733049.html
To unsubscribe from Interquartile Range in Bootstrapping, click here.
NAML

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

Art Kendall
In reply to this post by Bryan Mac
Although there are technical meanings for "bootstrap", many times there are variations in actual use of the term.

Please explain what you are trying to accomplish.

please explain what the tern "bootstrap" means to you?


An interquartile range is often considered a rather robust descriptive.

If you follow up on the suggestions made on this list, please report back how much difference the bootstrapping makes.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

Bryan Mac
I want to accomplish case resampling and estimate the distribution of the sample mean. From my understanding, bootstrap means to estimating the means of large sample.  In general, does the bootstrapped mean larger compared to the non-bootstrapped mean.  Each time I increased the sample (ie.100,200,etc.) the mean kept getting larger. I thought the bootstrapped mean was near similar to the non-bootstrapped mean.

Also, here is the syntax that with suggestions included. However, I am not getting the interquartile range.

PRESERVE.
DEFINE BOOT (VAR !TOKENS(1) / NSAMP !TOKENS(1)/SPRINT !TOKENS(1) !DEFAULT(F) ).
PRESERVE.
SET MXLOOPS=200000.
*Replace NAR with Desired Variable Name*.
EXAMINE VARIABLES=NAR
  /PLOT NONE
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.
MATRIX.
GET DATA / VARIABLES !VAR / FILE *.
*N is the number of cases*.
COMPUTE N=NROW(DATA).
*Determine ranks of Median case(s)*.
COMPUTE CRIT={(N/2)+.5, (N/2)+.5 }.
DO IF N/2=TRUNC(N/2).
+  COMPUTE CRIT=CRIT + {-1.5,0.5}.
END IF.
COMPUTE Stats=MAKE(!NSAMP,3,0).
COMPUTE SAMPLES=MAKE(N,1,0).
LOOP SAMPLE=1 TO !NSAMP.
* Construct array of random Indexes (Data pointers).
COMPUTE RArray=TRUNC(UNIFORM(N,1)*N +1).
LOOP CASEID=1 TO N.
COMPUTE SAMPLES(CASEID)=DATA(RARRAY(CASEID)).
END LOOP.
OMS
  /SELECT TABLES
  /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies'] LABELS=["Bootstrap"]
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='Bootstrap' VIEWER=YES.
FREQUENCIES VARIABLES=NAR
  /FORMAT=NOTABLE
  /PERCENTILES=25.0 75.0
  /STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM MEAN MEDIAN SKEWNESS SESKEW KURTOSIS SEKURT
  /ORDER=ANALYSIS.
omsend tag = ['Bootstrap'].
** Calculate Median **.
COMPUTE MEDSTAT=GRADE(SAMPLES).
COMPUTE Stats(SAMPLE,3) =0.
LOOP I=1 TO N.
LOOP J=1 TO 2.
DO IF MEDSTAT(I)=CRIT(J).
COMPUTE Stats(SAMPLE,3) =Stats(SAMPLE,3)+SAMPLES(I).
END IF.
END LOOP.
END LOOP.
COMPUTE Stats(SAMPLE,3)=Stats(SAMPLE,3)/2.
*Replace NAR with Desired Variable Name*.
* Generate Sum(NAR) and SUM(NAR**2).
COMPUTE Stats(SAMPLE,1)=CSUM(Samples)/N.
COMPUTE Stats(SAMPLE,2)=T(Samples)*Samples).
END LOOP.
* Calculate StdDev *.
COMPUTE Stats(:,2)=SQRT((Stats(:,2)-N*Stats(:,1))/(N-1)).
!IF (!SPRINT !EQ T) !THEN
PRINT Stats
     /TITLE "Individual Bootstrapped Sample Statistics"
     /CLABELS "Mean","SD","Median".
!IFEND
* Calculate Averages of Bootsrapped statistics *.
PRINT (CSUM(Stats)/!NSAMP)
     /TITLE="Averaged Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
COMPUTE HCI=(CSUM(Stats)/!NSAMP)  + 1.96/SQRT(N).
COMPUTE LCI=(CSUM(Stats)/!NSAMP)  - 1.96/SQRT(N).
* Calculate 95% Confidence Interval of Bootsrapped statistics *.
PRINT LCI
     /TITLE="Lower Bound for 95% Confidence Interval of Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
PRINT HCI
     /TITLE="Higher Bound for 95% Confidence Interval of Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
END MATRIX.
!ENDDEFINE.
*Replace 100 with the desired sample*.
BOOT Var=NAR NSAMP=100 SPRINT=T.
RESTORE.
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

David Marso
Administrator
You can not embed an OMS/FREQ within the MATRIX END MATRIX block!
I wrote the base code you are using before SPSS had OMS.
Best bet is to create the samples in MATRIX (see SAVE command) and use SPLIT FILE with OMS and FREQ to get the biz done.
Best suggestion I can offer and it will work like a charm. Alternatively shell out $P$$ cash for the Bootstrapping module (A waste IMN$HO for something this basic).
Bryan Mac wrote
I want to accomplish case resampling and estimate the distribution of the sample mean. From my understanding, bootstrap means to estimating the means of large sample.  In general, does the bootstrapped mean larger compared to the non-bootstrapped mean.  Each time I increased the sample (ie.100,200,etc.) the mean kept getting larger. I thought the bootstrapped mean was near similar to the non-bootstrapped mean.

Also, here is the syntax that with suggestions included. However, I am not getting the interquartile range.

PRESERVE.
DEFINE BOOT (VAR !TOKENS(1) / NSAMP !TOKENS(1)/SPRINT !TOKENS(1) !DEFAULT(F) ).
PRESERVE.
SET MXLOOPS=200000.
*Replace NAR with Desired Variable Name*.
EXAMINE VARIABLES=NAR
  /PLOT NONE
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.
MATRIX.
GET DATA / VARIABLES !VAR / FILE *.
*N is the number of cases*.
COMPUTE N=NROW(DATA).
*Determine ranks of Median case(s)*.
COMPUTE CRIT={(N/2)+.5, (N/2)+.5 }.
DO IF N/2=TRUNC(N/2).
+  COMPUTE CRIT=CRIT + {-1.5,0.5}.
END IF.
COMPUTE Stats=MAKE(!NSAMP,3,0).
COMPUTE SAMPLES=MAKE(N,1,0).
LOOP SAMPLE=1 TO !NSAMP.
* Construct array of random Indexes (Data pointers).
COMPUTE RArray=TRUNC(UNIFORM(N,1)*N +1).
LOOP CASEID=1 TO N.
COMPUTE SAMPLES(CASEID)=DATA(RARRAY(CASEID)).
END LOOP.
OMS
  /SELECT TABLES
  /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies'] LABELS=["Bootstrap"]
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='Bootstrap' VIEWER=YES.
FREQUENCIES VARIABLES=NAR
  /FORMAT=NOTABLE
  /PERCENTILES=25.0 75.0
  /STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM MEAN MEDIAN SKEWNESS SESKEW KURTOSIS SEKURT
  /ORDER=ANALYSIS.
omsend tag = ['Bootstrap'].
** Calculate Median **.
COMPUTE MEDSTAT=GRADE(SAMPLES).
COMPUTE Stats(SAMPLE,3) =0.
LOOP I=1 TO N.
LOOP J=1 TO 2.
DO IF MEDSTAT(I)=CRIT(J).
COMPUTE Stats(SAMPLE,3) =Stats(SAMPLE,3)+SAMPLES(I).
END IF.
END LOOP.
END LOOP.
COMPUTE Stats(SAMPLE,3)=Stats(SAMPLE,3)/2.
*Replace NAR with Desired Variable Name*.
* Generate Sum(NAR) and SUM(NAR**2).
COMPUTE Stats(SAMPLE,1)=CSUM(Samples)/N.
COMPUTE Stats(SAMPLE,2)=T(Samples)*Samples).
END LOOP.
* Calculate StdDev *.
COMPUTE Stats(:,2)=SQRT((Stats(:,2)-N*Stats(:,1))/(N-1)).
!IF (!SPRINT !EQ T) !THEN
PRINT Stats
     /TITLE "Individual Bootstrapped Sample Statistics"
     /CLABELS "Mean","SD","Median".
!IFEND
* Calculate Averages of Bootsrapped statistics *.
PRINT (CSUM(Stats)/!NSAMP)
     /TITLE="Averaged Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
COMPUTE HCI=(CSUM(Stats)/!NSAMP)  + 1.96/SQRT(N).
COMPUTE LCI=(CSUM(Stats)/!NSAMP)  - 1.96/SQRT(N).
* Calculate 95% Confidence Interval of Bootsrapped statistics *.
PRINT LCI
     /TITLE="Lower Bound for 95% Confidence Interval of Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
PRINT HCI
     /TITLE="Higher Bound for 95% Confidence Interval of Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
END MATRIX.
!ENDDEFINE.
*Replace 100 with the desired sample*.
BOOT Var=NAR NSAMP=100 SPRINT=T.
RESTORE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

David Marso
Administrator
Very briefly, I have a swarm of alligators messing with me right now (ambiguous non delimited text data).
GET your data.
MATRIX.
GET your data...
LOOP generate ONE bootstrap sample...
SAVE...
END LOOP...
END MATRIX.
SPLIT FILE BY sample.
OMS ....
FREQ...
OMSEND.
parse the OMS, calculate, aggregate or whatever....

DONE!
have fun.
read back a few msgs and you will find this same advice I posted about a month ago.



David Marso wrote
You can not embed an OMS/FREQ within the MATRIX END MATRIX block!
I wrote the base code you are using before SPSS had OMS.
Best bet is to create the samples in MATRIX (see SAVE command) and use SPLIT FILE with OMS and FREQ to get the biz done.
Best suggestion I can offer and it will work like a charm. Alternatively shell out $P$$ cash for the Bootstrapping module (A waste IMN$HO for something this basic).
Bryan Mac wrote
I want to accomplish case resampling and estimate the distribution of the sample mean. From my understanding, bootstrap means to estimating the means of large sample.  In general, does the bootstrapped mean larger compared to the non-bootstrapped mean.  Each time I increased the sample (ie.100,200,etc.) the mean kept getting larger. I thought the bootstrapped mean was near similar to the non-bootstrapped mean.

Also, here is the syntax that with suggestions included. However, I am not getting the interquartile range.

PRESERVE.
DEFINE BOOT (VAR !TOKENS(1) / NSAMP !TOKENS(1)/SPRINT !TOKENS(1) !DEFAULT(F) ).
PRESERVE.
SET MXLOOPS=200000.
*Replace NAR with Desired Variable Name*.
EXAMINE VARIABLES=NAR
  /PLOT NONE
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.
MATRIX.
GET DATA / VARIABLES !VAR / FILE *.
*N is the number of cases*.
COMPUTE N=NROW(DATA).
*Determine ranks of Median case(s)*.
COMPUTE CRIT={(N/2)+.5, (N/2)+.5 }.
DO IF N/2=TRUNC(N/2).
+  COMPUTE CRIT=CRIT + {-1.5,0.5}.
END IF.
COMPUTE Stats=MAKE(!NSAMP,3,0).
COMPUTE SAMPLES=MAKE(N,1,0).
LOOP SAMPLE=1 TO !NSAMP.
* Construct array of random Indexes (Data pointers).
COMPUTE RArray=TRUNC(UNIFORM(N,1)*N +1).
LOOP CASEID=1 TO N.
COMPUTE SAMPLES(CASEID)=DATA(RARRAY(CASEID)).
END LOOP.
OMS
  /SELECT TABLES
  /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies'] LABELS=["Bootstrap"]
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='Bootstrap' VIEWER=YES.
FREQUENCIES VARIABLES=NAR
  /FORMAT=NOTABLE
  /PERCENTILES=25.0 75.0
  /STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM MEAN MEDIAN SKEWNESS SESKEW KURTOSIS SEKURT
  /ORDER=ANALYSIS.
omsend tag = ['Bootstrap'].
** Calculate Median **.
COMPUTE MEDSTAT=GRADE(SAMPLES).
COMPUTE Stats(SAMPLE,3) =0.
LOOP I=1 TO N.
LOOP J=1 TO 2.
DO IF MEDSTAT(I)=CRIT(J).
COMPUTE Stats(SAMPLE,3) =Stats(SAMPLE,3)+SAMPLES(I).
END IF.
END LOOP.
END LOOP.
COMPUTE Stats(SAMPLE,3)=Stats(SAMPLE,3)/2.
*Replace NAR with Desired Variable Name*.
* Generate Sum(NAR) and SUM(NAR**2).
COMPUTE Stats(SAMPLE,1)=CSUM(Samples)/N.
COMPUTE Stats(SAMPLE,2)=T(Samples)*Samples).
END LOOP.
* Calculate StdDev *.
COMPUTE Stats(:,2)=SQRT((Stats(:,2)-N*Stats(:,1))/(N-1)).
!IF (!SPRINT !EQ T) !THEN
PRINT Stats
     /TITLE "Individual Bootstrapped Sample Statistics"
     /CLABELS "Mean","SD","Median".
!IFEND
* Calculate Averages of Bootsrapped statistics *.
PRINT (CSUM(Stats)/!NSAMP)
     /TITLE="Averaged Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
COMPUTE HCI=(CSUM(Stats)/!NSAMP)  + 1.96/SQRT(N).
COMPUTE LCI=(CSUM(Stats)/!NSAMP)  - 1.96/SQRT(N).
* Calculate 95% Confidence Interval of Bootsrapped statistics *.
PRINT LCI
     /TITLE="Lower Bound for 95% Confidence Interval of Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
PRINT HCI
     /TITLE="Higher Bound for 95% Confidence Interval of Bootstrapped Statistics"
     /CLABELS "Mean","StdDev","Median".
END MATRIX.
!ENDDEFINE.
*Replace 100 with the desired sample*.
BOOT Var=NAR NSAMP=100 SPRINT=T.
RESTORE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

Art Kendall
In reply to this post by Bryan Mac
What is the population you are sampling from?
Is an actual population or an abstract population?
What do you know or anticipate the population distribution shape to be?
How is (are) the sample(s) drawn?

What does/do visualizations of the sample(s) show?



Is this an exercise to develop your understanding of the central limit theorem or an application?

Often bootstrapping etc is used to get a perspective/handle on how well the mean looks like the pop mean.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Interquartile Range in Bootstrapping

Jon Peck
If the user's time has any value, purchasing the bootstrap option is by far the cheapest option here.  The bootstrapping code would look like this,
BOOTSTRAP
  /SAMPLING METHOD=SIMPLE
  /VARIABLES TARGET=salary 
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
EXAMINE VARIABLES=salary
  /PLOT NONE
  /STATISTICS DESCRIPTIVES
  /NOTOTAL.

I don't know the current cost of that option, but it is one of the less expensive ones, and it would then be available for other statistics, too.  I'm not trying to sell anything, but the economics seem obvious.



On Wed, Sep 14, 2016 at 7:39 AM, Art Kendall <[hidden email]> wrote:
What is the population you are sampling from?
Is an actual population or an abstract population?
What do you know or anticipate the population distribution shape to be?
How is (are) the sample(s) drawn?

What does/do visualizations of the sample(s) show?



Is this an exercise to develop your understanding of the central limit
theorem or an application?

Often bootstrapping etc is used to get a perspective/handle on how well the
mean looks like the pop mean.



-----
Art Kendall
Social Research Consultants
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interquartile-Range-in-Bootstrapping-tp5733036p5733138.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD