|
Hi all,
I wish to calculate the confidence interval (or standard error) of medians, from a dataset that has fractional weights. With unweighted data I normally use the RATIO STATISTICS method (resolution #21267 on support.spss.com). However, RATIO STATISTICS only uses frequency weights and so rounds my fractional weights to the nearest integer. Is anyone aware of a way to calculate the CI/SE of a median, when using fractional weights (in SPSS or elsewhere)? Thanks for any suggestions. Cheers, Kylie. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Kylie:
Kylie Lange wrote: > Hi all, > > I wish to calculate the confidence interval (or standard error) of medians, > from a dataset that has fractional weights. With unweighted data I normally > use the RATIO STATISTICS method (resolution #21267 on support.spss.com). > However, RATIO STATISTICS only uses frequency weights and so rounds my > fractional weights to the nearest integer. > > Is anyone aware of a way to calculate the CI/SE of a median, when using > fractional weights (in SPSS or elsewhere)? > > This is not a solution...yet: PASW18 has a bootstrapping module incorporated to many procedures, like FREQUENCIES and DESCRIPTIVES. You could get a 95% bootstrap CI for the median using PASW18. I think it will be available by early october or the end of september. Best regards, Marta GG -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks for your reply Marta. I had forgotten about the Bootstrapping module
in 18. I downloaded the trial version to test it out, but unfortunately the boostrapping ignores fractional weights, and only wants frequency weights. (It makes perfect sense when you imagine the mechanics of what bootstrapping actually does, but I was hoping anyway!) If anyone has any other suggestions, I'd appreciate them. Thanks, Kylie. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero Sent: Monday, 21 September 2009 5:12 PM To: [hidden email] Subject: Re: CI for median on weighted data Hi Kylie: Kylie Lange wrote: > Hi all, > > I wish to calculate the confidence interval (or standard error) of medians, > from a dataset that has fractional weights. With unweighted data I normally > use the RATIO STATISTICS method (resolution #21267 on support.spss.com). > However, RATIO STATISTICS only uses frequency weights and so rounds my > fractional weights to the nearest integer. > > Is anyone aware of a way to calculate the CI/SE of a median, when using > fractional weights (in SPSS or elsewhere)? > > This is not a solution...yet: PASW18 has a bootstrapping module incorporated to many procedures, like FREQUENCIES and DESCRIPTIVES. You could get a 95% bootstrap CI for the median using PASW18. I think it will be available by early october or the end of september. Best regards, Marta GG -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Kylie
I like challenges, and I'm kind of free right now (after a hectic week when I had to teach statistics using SPSS to a bunch of haematologists, 7 hours a day). Could you send me, off list, a sample of your dataset? I think I can write some MATRIX code for it, a modification of the one I have at the web page, that uses unweighted data. Best regards, Marta Kylie Lange wrote: > Thanks for your reply Marta. I had forgotten about the Bootstrapping module > in 18. > > I downloaded the trial version to test it out, but unfortunately the > boostrapping ignores fractional weights, and only wants frequency weights. > (It makes perfect sense when you imagine the mechanics of what bootstrapping > actually does, but I was hoping anyway!) > > If anyone has any other suggestions, I'd appreciate them. -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Kylie
I think I have something that "works" (gives correct results), but it is AWFULLY slow. Do you have a powerful computer? Mine has a 2 GHz single processor and only 1 Gb RAM, and the running time is several minutes..., and that with only 100 bootsamples. Perhaps it is time to call for help. I will explain how the code works after lunch. Best regards, Marta GG * Preparing dataset *. DATASET NAME OriginalData. SORT CASES BY Iodine (A) . COMPUTE CumWeights = wtclncS3 . DO IF $casenum GT 1. . COMPUTE CumWeights=CumWeights+LAG(CumWeights). END IF. EXE. /* Needed *. TEMPORARY. SET MXLOOPS=100000. MATRIX. GET Data /VAR=Iodine /NAMES=vname. GET CWeight /VAR=CumWeights. * Get number of different values *. COMPUTE K=NROW(Data). * Get real sample size (last cumulative weight) *. COMPUTE N=CWeight(K). * Round it (samples must have integer sample sizes) *. COMPUTE IntN=RND(N). ********* Start random sampling *********. * Number of bootstrap samples, increase it if you want more precision (but with higher running time) *. COMPUTE NReps=100. * Initialize empty sample vector *. COMPUTE BootSamp=MAKE(IntN,1,0). * Initialize vector of bootstrapped medians *. COMPUTE BootMed=MAKE(NReps,1,0). * Star bootstrap sampling with non integer weights *. LOOP Iter=1 TO NReps. . LOOP i=1 TO IntN. . COMPUTE RndValue=N*UNIFORM(1,1). . LOOP j=1 TO K. . COMPUTE flag=0. . DO IF RndValue LE CWeight(j). . COMPUTE Bootsamp(i)=Data(j). . COMPUTE flag=1. . END IF. . END LOOP IF flag=1. * Solve particular case when RndValue is GT last cumulative weight (rare but...) *. . DO IF flag=0. . COMPUTE Bootsamp(i)=Data(K). . END IF. . END LOOP. * Compute boot sample median (sorting algorithm by R Ristow & J Peck) *. . COMPUTE SortBoot=BootSamp. . COMPUTE SortBoot(GRADE(BootSamp))=BootSamp. . COMPUTE pair=(IntN/2) EQ IntN/2). /* Check if IntN is odd (0) or even (1) *. . DO IF pair EQ 0. /* Median formula for odd samples. . COMPUTE BootMed(Iter)=SortBoot((IntN+1)/2). . ELSE. /* Median formula for even samples *. . COMPUTE BootMed(Iter)=(SortBoot(IntN/2)+SortBoot(1+(IntN/2)))/2. . END IF. END LOOP. * Save bootsample medians to SPSS file *. SAVE bootmed /OUTFILE='C:\Temp\BootStrappedMedians.sav' /NAMES=vname. PRINT NReps /FORMAT='F8' /RLABEL='K=' /TITLE='K bootsampled medians saved to C:\Temp\BootStrappedMedians.sav'. END MATRIX. * Bootstrap Statistics *. GET FILE='C:\TEMP\BootStrappedMedians.sav'. DATASET NAME Bootdata WINDOW=FRONT. FREQUENCIES VARIABLES=Iodine /FORMAT=NOTABLE /PERCENTILES= 2.5 97.5 /STATISTICS=STDDEV MEAN MEDIAN /ORDER= ANALYSIS . Percentiles 2.5 & 97.5 form a 95%CI (non parametric). The standard deviation of the medians is an estimate of the SE(median). You can use it to compute median +/-1.96*SE(median). Kylie Lange wrote: > That's very generous of you Marta, I hope you enjoy it. :) > > The datafile that I have attached contains 1072 cases. 'Iodine' is the > measure that I wish to calculate the median of with its standard error or > confidence interval, and 'wtclncS3' are the weights. > -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Marta,
Thanks! I have run it on a dozen different subsets of the data, and the results agree with the median from FREQ (which does use fractional weights, but doesn't give the CI). My computer has dual 2.66 GHz processors and 2GB RAM, and the macro completes in approx 2.5 minutes for the complete dataset (so I plan to increase the number of bootstrap samples). If anyone else is interested in this, I have also discovered that Stata can do this too using Roger Newson's programs as described at http://www.imperial.ac.uk/nhli/r.newson/papers/somdext.pdf. I do not have access to Stata myself to compare the results to Marta's macro. Thanks again Marta - I appreciate you spending time on this. Cheers, Kylie. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero Sent: Monday, 21 September 2009 8:02 PM To: [hidden email] Subject: Re: CI for median on weighted data Hi Kylie I think I have something that "works" (gives correct results), but it is AWFULLY slow. Do you have a powerful computer? Mine has a 2 GHz single processor and only 1 Gb RAM, and the running time is several minutes..., and that with only 100 bootsamples. Perhaps it is time to call for help. I will explain how the code works after lunch. Best regards, Marta GG * Preparing dataset *. DATASET NAME OriginalData. SORT CASES BY Iodine (A) . COMPUTE CumWeights = wtclncS3 . DO IF $casenum GT 1. . COMPUTE CumWeights=CumWeights+LAG(CumWeights). END IF. EXE. /* Needed *. TEMPORARY. SET MXLOOPS=100000. MATRIX. GET Data /VAR=Iodine /NAMES=vname. GET CWeight /VAR=CumWeights. * Get number of different values *. COMPUTE K=NROW(Data). * Get real sample size (last cumulative weight) *. COMPUTE N=CWeight(K). * Round it (samples must have integer sample sizes) *. COMPUTE IntN=RND(N). ********* Start random sampling *********. * Number of bootstrap samples, increase it if you want more precision (but with higher running time) *. COMPUTE NReps=100. * Initialize empty sample vector *. COMPUTE BootSamp=MAKE(IntN,1,0). * Initialize vector of bootstrapped medians *. COMPUTE BootMed=MAKE(NReps,1,0). * Star bootstrap sampling with non integer weights *. LOOP Iter=1 TO NReps. . LOOP i=1 TO IntN. . COMPUTE RndValue=N*UNIFORM(1,1). . LOOP j=1 TO K. . COMPUTE flag=0. . DO IF RndValue LE CWeight(j). . COMPUTE Bootsamp(i)=Data(j). . COMPUTE flag=1. . END IF. . END LOOP IF flag=1. * Solve particular case when RndValue is GT last cumulative weight (rare but...) *. . DO IF flag=0. . COMPUTE Bootsamp(i)=Data(K). . END IF. . END LOOP. * Compute boot sample median (sorting algorithm by R Ristow & J Peck) *. . COMPUTE SortBoot=BootSamp. . COMPUTE SortBoot(GRADE(BootSamp))=BootSamp. . COMPUTE pair=(IntN/2) EQ IntN/2). /* Check if IntN is odd (0) or even (1) *. . DO IF pair EQ 0. /* Median formula for odd samples. . COMPUTE BootMed(Iter)=SortBoot((IntN+1)/2). . ELSE. /* Median formula for even samples *. . COMPUTE BootMed(Iter)=(SortBoot(IntN/2)+SortBoot(1+(IntN/2)))/2. . END IF. END LOOP. * Save bootsample medians to SPSS file *. SAVE bootmed /OUTFILE='C:\Temp\BootStrappedMedians.sav' /NAMES=vname. PRINT NReps /FORMAT='F8' /RLABEL='K=' /TITLE='K bootsampled medians saved to C:\Temp\BootStrappedMedians.sav'. END MATRIX. * Bootstrap Statistics *. GET FILE='C:\TEMP\BootStrappedMedians.sav'. DATASET NAME Bootdata WINDOW=FRONT. FREQUENCIES VARIABLES=Iodine /FORMAT=NOTABLE /PERCENTILES= 2.5 97.5 /STATISTICS=STDDEV MEAN MEDIAN /ORDER= ANALYSIS . Percentiles 2.5 & 97.5 form a 95%CI (non parametric). The standard deviation of the medians is an estimate of the SE(median). You can use it to compute median +/-1.96*SE(median). Kylie Lange wrote: > That's very generous of you Marta, I hope you enjoy it. :) > > The datafile that I have attached contains 1072 cases. 'Iodine' is the > measure that I wish to calculate the median of with its standard error or > confidence interval, and 'wtclncS3' are the weights. > -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Kylie:
I'm glad it worked. Anyway, I'm trying to shorten the overall runtime and improve performance. I have located the bottleneck step, the one responsible of most of the time involved. It's the inner loop, the one that selects randomly data, but with different probability, according to their weight: . LOOP j=1 TO K. . COMPUTE flag=0. . DO IF RndValue LE CWeight(j). . COMPUTE Bootsamp(i)=Data(j). . COMPUTE flag=1. . END IF. . END LOOP IF flag=1. If the random value is really high, then the loop has to cycle through a lot of values, until the "do if" condition is met, flag is set to 1, and the loop is ended. I think I can split the loop in two (low half K values, and upper half K values), and make the code more efficient. Once I have polished it, I'll post it in the web page. Best regards, Marta GG Kylie Lange wrote: > Hi Marta, > > Thanks! I have run it on a dozen different subsets of the data, and the > results agree with the median from FREQ (which does use fractional weights, > but doesn't give the CI). My computer has dual 2.66 GHz processors and 2GB > RAM, and the macro completes in approx 2.5 minutes for the complete dataset > (so I plan to increase the number of bootstrap samples). > -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
