Boostrapping three years of data for each hospital

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Boostrapping three years of data for each hospital

Fiveja
I have three years of data showing the percentage of patients being readmitted to hospitals. For example:

Hospital  Yr1 Yr2 Yr3
Smith     .45 .30 .37
Jones     .32 .11 .20
etc.

For each hospital, I would like to bootstrap the three years of data 1000 times and return an overall mean. So:

1. generate 1000 resamples of the three values
2. get the mean of each resample (i.e., 1000 means)
3. take the grand mean of the 1000 means

This will give me one grand mean for each hospital.

Below is how I currently do it. It works but it requires transposing the data and creating variables named after each hospital, and then hard coding those hospital names in the bootstrap syntax. If my hospitals change over time I need to revise the BOOTSTRAP syntax to ensure the correct hospitals are listed. I'm looking for a more automated solution. Any suggestions? I would have thought there is an easy to bootstrap the three years of data in it's current format, and stratify or split the file by Hospital, but I can't seem to get it to work.  

Current solution:

FLIP VARIABLES=Yr1 Yr2 Yr3
  /NEWNAMES=Hospital.

This creates a dataset like so:

CASE_LBL Smith  Jones  etc.
Yr1      .45    .32
Yr2      .30    .11
Yr3      .37    .20


BOOTSTRAP
  /SAMPLING METHOD=STRATIFIED(STRATA=Smith Jones etc.)  
  /VARIABLES INPUT=Smith Jones etc.
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=Smith Jones etc.
  /STATISTICS=MEAN
  /ORDER=ANALYSIS.
Reply | Threaded
Open this post in threaded view
|

Re: Boostrapping three years of data for each hospital

Bruce Weaver
Administrator
I would try the following (untested) approach:

1. Make Hospital a numeric variable with value labels (use AUTORECODE if it is already there as string).
2. Use VARSTOCASES to restructure file to long format with Year as the index variable, and Readmit the name of the variable holding the % readmitted.
3. SPLIT FILE by Hospital.
4. Then issue your BOOTSTRAP & FREQUENCIES commands (you'll have to modify them a bit, as now there is only one variable--Readmit).
5. SPLIT FILE OFF.

HTH.


Fiveja wrote
I have three years of data showing the percentage of patients being readmitted to hospitals. For example:

Hospital  Yr1 Yr2 Yr3
Smith     .45 .30 .37
Jones     .32 .11 .20
etc.

For each hospital, I would like to bootstrap the three years of data 1000 times and return an overall mean. So:

1. generate 1000 resamples of the three values
2. get the mean of each resample (i.e., 1000 means)
3. take the grand mean of the 1000 means

This will give me one grand mean for each hospital.

Below is how I currently do it. It works but it requires transposing the data and creating variables named after each hospital, and then hard coding those hospital names in the bootstrap syntax. If my hospitals change over time I need to revise the BOOTSTRAP syntax to ensure the correct hospitals are listed. I'm looking for a more automated solution. Any suggestions? I would have thought there is an easy to bootstrap the three years of data in it's current format, and stratify or split the file by Hospital, but I can't seem to get it to work.  

Current solution:

FLIP VARIABLES=Yr1 Yr2 Yr3
  /NEWNAMES=Hospital.

This creates a dataset like so:

CASE_LBL Smith  Jones  etc.
Yr1      .45    .32
Yr2      .30    .11
Yr3      .37    .20


BOOTSTRAP
  /SAMPLING METHOD=STRATIFIED(STRATA=Smith Jones etc.)  
  /VARIABLES INPUT=Smith Jones etc.
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=Smith Jones etc.
  /STATISTICS=MEAN
  /ORDER=ANALYSIS.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Boostrapping three years of data for each hospital

Jignesh Sutar
In reply to this post by Fiveja
Not familiar with the bootstrap command but could you recode your hospitals in a way such that when you flip the dataset instead of getting hospital names as variables (never ideal) that instead you get something like H001 to H589 (assuming 589 number of hospitals in the dataset).

AUTORECODE Hospital /INTO N_Hospital.
FORMATS N_Hospital (F8.0).
FLIP VARIABLES=Yr1 Yr2 Yr3
  /NEWNAMES=Hospital.

This gives me K_1 to K_<N> variables, where <N> is the number of cases/hospital in my test data, which might be a little more manageable?



On 21 April 2015 at 15:05, Fiveja <[hidden email]> wrote:
I have three years of data showing the percentage of patients being
readmitted to hospitals. For example:

Hospital  Yr1 Yr2 Yr3
Smith     .45 .30 .37
Jones     .32 .11 .20
etc.

For each hospital, I would like to bootstrap the three years of data 1000
times and return an overall mean. So:

1. generate 1000 resamples of the three values
2. get the mean of each resample (i.e., 1000 means)
3. take the grand mean of the 1000 means

This will give me one grand mean for each hospital.

Below is how I currently do it. It works but it requires transposing the
data and creating variables named after each hospital, and then hard coding
those hospital names in the bootstrap syntax. If my hospitals change over
time I need to revise the BOOTSTRAP syntax to ensure the correct hospitals
are listed. I'm looking for a more automated solution. Any suggestions? I
would have thought there is an easy to bootstrap the three years of data in
it's current format, and stratify or split the file by Hospital, but I can't
seem to get it to work.

Current solution:

FLIP VARIABLES=Yr1 Yr2 Yr3
  /NEWNAMES=Hospital.

This creates a dataset like so:

CASE_LBL Smith  Jones  etc.
Yr1      .45    .32
Yr2      .30    .11
Yr3      .37    .20


BOOTSTRAP
  /SAMPLING METHOD=STRATIFIED(STRATA=Smith Jones etc.)
  /VARIABLES INPUT=Smith Jones etc.
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=Smith Jones etc.
  /STATISTICS=MEAN
  /ORDER=ANALYSIS.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Boostrapping-three-years-of-data-for-each-hospital-tp5729295.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bootstrapping three years of data for each hospital

Jon K Peck
In reply to this post by Fiveja
I'm not sure of the goal here, but two possibilities come to mind.

1) Have you considered using Simulation to generate data matching your input?  That would fit distributions appropriate for each variable and give you the distributions over many replications with summary statistics available.

2) If the issue is that you don't know the names of the variables after the transposition and can't use ALL, you could use the SPSSINC SELECT VARIABLES extension command after the FLIP to generate a macro listing all the variables selected based on metadata such as all numeric variables or the measurement level, etc, and just reference that macro in the BOOTSTRAP and other commands.  Negative regular expressions are tricky, but to just exclude a variable named CASE_LBL, you could use the regular expression slot with
 "^((?!CASE_LBL).)*$"

SPSSINC SELECT VARIABLES is installed by default with Statistics 23.  With 22 it can be downloaded from the Utilities menu; for earlier versions you would need to get it from the website, and the Python Essentials must be installed.

It appears on the menus as Utilities > Define Variable Macro.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Fiveja <[hidden email]>
To:        [hidden email]
Date:        04/21/2015 08:07 AM
Subject:        [SPSSX-L] Boostrapping three years of data for each hospital
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have three years of data showing the percentage of patients being
readmitted to hospitals. For example:

Hospital  Yr1 Yr2 Yr3
Smith     .45 .30 .37
Jones     .32 .11 .20
etc.

For each hospital, I would like to bootstrap the three years of data 1000
times and return an overall mean. So:

1. generate 1000 resamples of the three values
2. get the mean of each resample (i.e., 1000 means)
3. take the grand mean of the 1000 means

This will give me one grand mean for each hospital.

Below is how I currently do it. It works but it requires transposing the
data and creating variables named after each hospital, and then hard coding
those hospital names in the bootstrap syntax. If my hospitals change over
time I need to revise the BOOTSTRAP syntax to ensure the correct hospitals
are listed. I'm looking for a more automated solution. Any suggestions? I
would have thought there is an easy to bootstrap the three years of data in
it's current format, and stratify or split the file by Hospital, but I can't
seem to get it to work.  

Current solution:

FLIP VARIABLES=Yr1 Yr2 Yr3
 /NEWNAMES=Hospital.

This creates a dataset like so:

CASE_LBL Smith  Jones  etc.
Yr1      .45    .32
Yr2      .30    .11
Yr3      .37    .20


BOOTSTRAP
 /SAMPLING METHOD=STRATIFIED(STRATA=Smith Jones etc.)  
 /VARIABLES INPUT=Smith Jones etc.
 /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
 /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=Smith Jones etc.
 /STATISTICS=MEAN
 /ORDER=ANALYSIS.



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Boostrapping-three-years-of-data-for-each-hospital-tp5729295.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bootstrapping three years of data for each hospital

Fiveja
Jon K Peck wrote
2) If the issue is that you don't know the names of the variables after
the transposition and can't use ALL, ...
ALL! That works. I simply used ALL in place of the individual hospital names and it worked. It also tries to bootstrap the variable, CASE_LBL, but I can just ignore that in the output.

Thank you!

Final code:

BOOTSTRAP
  /SAMPLING METHOD=STRATIFIED(STRATA=ALL)  
  /VARIABLES INPUT=ALL
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=ALL
  /STATISTICS=MEAN
  /ORDER=ANALYSIS.
Reply | Threaded
Open this post in threaded view
|

Re: Bootstrapping three years of data for each hospital

Bruce Weaver
Administrator
If you don't need CASE_LBL for anything else, delete it before your BOOTSTRAP command.

DELETE VARIABLES CASE_LBL.
BOOTSTRAP
  /SAMPLING METHOD=STRATIFIED(STRATA=ALL)  
  /VARIABLES INPUT=ALL
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=ALL
  /STATISTICS=MEAN
  /ORDER=ANALYSIS.



Fiveja wrote
Jon K Peck wrote
2) If the issue is that you don't know the names of the variables after
the transposition and can't use ALL, ...
ALL! That works. I simply used ALL in place of the individual hospital names and it worked. It also tries to bootstrap the variable, CASE_LBL, but I can just ignore that in the output.

Thank you!

Final code:

BOOTSTRAP
  /SAMPLING METHOD=STRATIFIED(STRATA=ALL)  
  /VARIABLES INPUT=ALL
  /CRITERIA CILEVEL=95 CITYPE=PERCENTILE  NSAMPLES=1000
  /MISSING USERMISSING=EXCLUDE.
FREQUENCIES VARIABLES=ALL
  /STATISTICS=MEAN
  /ORDER=ANALYSIS.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).