SPSSX Discussion

CI for median on weighted data

Classic

List

Threaded

7 messages Options

Kylie

CI for median on weighted data

Hi all,

I wish to calculate the confidence interval (or standard error) of medians,
from a dataset that has fractional weights. With unweighted data I normally
use the RATIO STATISTICS method (resolution #21267 on support.spss.com).
However, RATIO STATISTICS only uses frequency weights and so rounds my
fractional weights to the nearest integer.

Is anyone aware of a way to calculate the CI/SE of a median, when using
fractional weights (in SPSS or elsewhere)?

Thanks for any suggestions.

Cheers,
Kylie.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: CI for median on weighted data

Hi Kylie:

Kylie Lange wrote:

> Hi all,
>
> I wish to calculate the confidence interval (or standard error) of medians,
> from a dataset that has fractional weights. With unweighted data I normally
> use the RATIO STATISTICS method (resolution #21267 on support.spss.com).
> However, RATIO STATISTICS only uses frequency weights and so rounds my
> fractional weights to the nearest integer.
>
> Is anyone aware of a way to calculate the CI/SE of a median, when using
> fractional weights (in SPSS or elsewhere)?
>
>

This is not a solution...yet: PASW18 has a bootstrapping module
incorporated to many procedures, like FREQUENCIES and DESCRIPTIVES. You
could get a 95% bootstrap CI for the median using PASW18. I think it
will be available by early october or the end of september.

Best regards,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kylie

Re: CI for median on weighted data

Thanks for your reply Marta. I had forgotten about the Bootstrapping module
in 18.

I downloaded the trial version to test it out, but unfortunately the
boostrapping ignores fractional weights, and only wants frequency weights.
(It makes perfect sense when you imagine the mechanics of what bootstrapping
actually does, but I was hoping anyway!)

If anyone has any other suggestions, I'd appreciate them.

Thanks,
Kylie.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Monday, 21 September 2009 5:12 PM
To: [hidden email]
Subject: Re: CI for median on weighted data

Hi Kylie:

Kylie Lange wrote:
> Hi all,
>
> I wish to calculate the confidence interval (or standard error) of
medians,
> from a dataset that has fractional weights. With unweighted data I
normally
> use the RATIO STATISTICS method (resolution #21267 on support.spss.com).
> However, RATIO STATISTICS only uses frequency weights and so rounds my
> fractional weights to the nearest integer.
>
> Is anyone aware of a way to calculate the CI/SE of a median, when using
> fractional weights (in SPSS or elsewhere)?
>
>

This is not a solution...yet: PASW18 has a bootstrapping module
incorporated to many procedures, like FREQUENCIES and DESCRIPTIVES. You
could get a 95% bootstrap CI for the median using PASW18. I think it
will be available by early october or the end of september.

Best regards,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: CI for median on weighted data

Hi Kylie

I like challenges, and I'm kind of free right now (after a hectic week
when I had to teach statistics using SPSS to a bunch of haematologists,
7 hours a day). Could you send me, off list, a sample of your dataset? I
think I can write some MATRIX code for it, a modification of the one I
have at the web page, that uses unweighted data.

Best regards,
Marta

Kylie Lange wrote:
> Thanks for your reply Marta. I had forgotten about the Bootstrapping module
> in 18.
>
> I downloaded the trial version to test it out, but unfortunately the
> boostrapping ignores fractional weights, and only wants frequency weights.
> (It makes perfect sense when you imagine the mechanics of what bootstrapping
> actually does, but I was hoping anyway!)
>
> If anyone has any other suggestions, I'd appreciate them.
--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: CI for median on weighted data

Hi Kylie

I think I have something that "works" (gives correct results), but it is
AWFULLY slow. Do you have a powerful computer? Mine has a 2 GHz single
processor and only 1 Gb RAM, and the running time is several minutes...,
and that with only 100 bootsamples.

Perhaps it is time to call for help. I will explain how the code works
after lunch.

Best regards,
Marta GG

* Preparing dataset *.
DATASET NAME OriginalData.
SORT CASES BY Iodine (A) .
COMPUTE CumWeights = wtclncS3 .
DO IF $casenum GT 1.
. COMPUTE CumWeights=CumWeights+LAG(CumWeights).
END IF.
EXE. /* Needed *.

TEMPORARY.
SET MXLOOPS=100000.
MATRIX.
GET Data /VAR=Iodine /NAMES=vname.
GET CWeight /VAR=CumWeights.
* Get number of different values *.
COMPUTE K=NROW(Data).
* Get real sample size (last cumulative weight) *.
COMPUTE N=CWeight(K).
* Round it (samples must have integer sample sizes) *.
COMPUTE IntN=RND(N).
********* Start random sampling *********.
* Number of bootstrap samples, increase it if you want more precision
(but with higher running time) *.
COMPUTE NReps=100.
* Initialize empty sample vector *.
COMPUTE BootSamp=MAKE(IntN,1,0).
* Initialize vector of bootstrapped medians *.
COMPUTE BootMed=MAKE(NReps,1,0).
* Star bootstrap sampling with non integer weights *.
LOOP Iter=1 TO NReps.
. LOOP i=1 TO IntN.
. COMPUTE RndValue=N*UNIFORM(1,1).
. LOOP j=1 TO K.
. COMPUTE flag=0.
. DO IF RndValue LE CWeight(j).
. COMPUTE Bootsamp(i)=Data(j).
. COMPUTE flag=1.
. END IF.
. END LOOP IF flag=1.
* Solve particular case when RndValue is GT last cumulative weight (rare
but...) *.
. DO IF flag=0.
. COMPUTE Bootsamp(i)=Data(K).
. END IF.
. END LOOP.
* Compute boot sample median (sorting algorithm by R Ristow & J Peck) *.
. COMPUTE SortBoot=BootSamp.
. COMPUTE SortBoot(GRADE(BootSamp))=BootSamp.
. COMPUTE pair=(IntN/2) EQ IntN/2). /* Check if IntN is odd
(0) or even (1) *.
. DO IF pair EQ 0. /* Median formula for odd
samples.
. COMPUTE BootMed(Iter)=SortBoot((IntN+1)/2).
. ELSE. /* Median formula for
even samples *.
. COMPUTE BootMed(Iter)=(SortBoot(IntN/2)+SortBoot(1+(IntN/2)))/2.
. END IF.
END LOOP.
* Save bootsample medians to SPSS file *.
SAVE bootmed /OUTFILE='C:\Temp\BootStrappedMedians.sav' /NAMES=vname.
PRINT NReps
/FORMAT='F8'
/RLABEL='K='
/TITLE='K bootsampled medians saved to C:\Temp\BootStrappedMedians.sav'.
END MATRIX.

* Bootstrap Statistics *.
GET FILE='C:\TEMP\BootStrappedMedians.sav'.
DATASET NAME Bootdata WINDOW=FRONT.
FREQUENCIES
VARIABLES=Iodine
/FORMAT=NOTABLE
/PERCENTILES= 2.5 97.5
/STATISTICS=STDDEV MEAN MEDIAN
/ORDER= ANALYSIS .

Percentiles 2.5 & 97.5 form a 95%CI (non parametric).
The standard deviation of the medians is an estimate of the SE(median).
You can use it to compute median +/-1.96*SE(median).

Kylie Lange wrote:
> That's very generous of you Marta, I hope you enjoy it. :)
>
> The datafile that I have attached contains 1072 cases. 'Iodine' is the
> measure that I wish to calculate the median of with its standard error or
> confidence interval, and 'wtclncS3' are the weights.
>

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kylie

Re: CI for median on weighted data

Hi Marta,

Thanks! I have run it on a dozen different subsets of the data, and the
results agree with the median from FREQ (which does use fractional weights,
but doesn't give the CI). My computer has dual 2.66 GHz processors and 2GB
RAM, and the macro completes in approx 2.5 minutes for the complete dataset
(so I plan to increase the number of bootstrap samples).

If anyone else is interested in this, I have also discovered that Stata can
do this too using Roger Newson's programs as described at
http://www.imperial.ac.uk/nhli/r.newson/papers/somdext.pdf. I do not have
access to Stata myself to compare the results to Marta's macro.

Thanks again Marta - I appreciate you spending time on this.

Cheers,
Kylie.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Monday, 21 September 2009 8:02 PM
To: [hidden email]
Subject: Re: CI for median on weighted data

Hi Kylie

I think I have something that "works" (gives correct results), but it is
AWFULLY slow. Do you have a powerful computer? Mine has a 2 GHz single
processor and only 1 Gb RAM, and the running time is several minutes...,
and that with only 100 bootsamples.

Perhaps it is time to call for help. I will explain how the code works
after lunch.

Best regards,
Marta GG

* Preparing dataset *.
DATASET NAME OriginalData.
SORT CASES BY Iodine (A) .
COMPUTE CumWeights = wtclncS3 .
DO IF $casenum GT 1.
. COMPUTE CumWeights=CumWeights+LAG(CumWeights).
END IF.
EXE. /* Needed *.

TEMPORARY.
SET MXLOOPS=100000.
MATRIX.
GET Data /VAR=Iodine /NAMES=vname.
GET CWeight /VAR=CumWeights.
* Get number of different values *.
COMPUTE K=NROW(Data).
* Get real sample size (last cumulative weight) *.
COMPUTE N=CWeight(K).
* Round it (samples must have integer sample sizes) *.
COMPUTE IntN=RND(N).
********* Start random sampling *********.
* Number of bootstrap samples, increase it if you want more precision
(but with higher running time) *.
COMPUTE NReps=100.
* Initialize empty sample vector *.
COMPUTE BootSamp=MAKE(IntN,1,0).
* Initialize vector of bootstrapped medians *.
COMPUTE BootMed=MAKE(NReps,1,0).
* Star bootstrap sampling with non integer weights *.
LOOP Iter=1 TO NReps.
. LOOP i=1 TO IntN.
. COMPUTE RndValue=N*UNIFORM(1,1).
. LOOP j=1 TO K.
. COMPUTE flag=0.
. DO IF RndValue LE CWeight(j).
. COMPUTE Bootsamp(i)=Data(j).
. COMPUTE flag=1.
. END IF.
. END LOOP IF flag=1.
* Solve particular case when RndValue is GT last cumulative weight (rare
but...) *.
. DO IF flag=0.
. COMPUTE Bootsamp(i)=Data(K).
. END IF.
. END LOOP.
* Compute boot sample median (sorting algorithm by R Ristow & J Peck) *.
. COMPUTE SortBoot=BootSamp.
. COMPUTE SortBoot(GRADE(BootSamp))=BootSamp.
. COMPUTE pair=(IntN/2) EQ IntN/2). /* Check if IntN is odd
(0) or even (1) *.
. DO IF pair EQ 0. /* Median formula for odd
samples.
. COMPUTE BootMed(Iter)=SortBoot((IntN+1)/2).
. ELSE. /* Median formula for
even samples *.
. COMPUTE BootMed(Iter)=(SortBoot(IntN/2)+SortBoot(1+(IntN/2)))/2.
. END IF.
END LOOP.
* Save bootsample medians to SPSS file *.
SAVE bootmed /OUTFILE='C:\Temp\BootStrappedMedians.sav' /NAMES=vname.
PRINT NReps
/FORMAT='F8'
/RLABEL='K='
/TITLE='K bootsampled medians saved to C:\Temp\BootStrappedMedians.sav'.
END MATRIX.

* Bootstrap Statistics *.
GET FILE='C:\TEMP\BootStrappedMedians.sav'.
DATASET NAME Bootdata WINDOW=FRONT.
FREQUENCIES
VARIABLES=Iodine
/FORMAT=NOTABLE
/PERCENTILES= 2.5 97.5
/STATISTICS=STDDEV MEAN MEDIAN
/ORDER= ANALYSIS .

Percentiles 2.5 & 97.5 form a 95%CI (non parametric).
The standard deviation of the medians is an estimate of the SE(median).
You can use it to compute median +/-1.96*SE(median).

Kylie Lange wrote:
> That's very generous of you Marta, I hope you enjoy it. :)
>
> The datafile that I have attached contains 1072 cases. 'Iodine' is the
> measure that I wish to calculate the median of with its standard error or
> confidence interval, and 'wtclncS3' are the weights.
>

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: CI for median on weighted data

Hi Kylie:

I'm glad it worked. Anyway, I'm trying to shorten the overall runtime
and improve performance. I have located the bottleneck step, the one
responsible of most of the time involved. It's the inner loop, the one
that selects randomly data, but with different probability, according to
their weight:

. LOOP j=1 TO K.
. COMPUTE flag=0.
. DO IF RndValue LE CWeight(j).
. COMPUTE Bootsamp(i)=Data(j).
. COMPUTE flag=1.
. END IF.
. END LOOP IF flag=1.

If the random value is really high, then the loop has to cycle through a
lot of values, until the "do if" condition is met, flag is set to 1, and
the loop is ended. I think I can split the loop in two (low half K
values, and upper half K values), and make the code more efficient. Once
I have polished it, I'll post it in the web page.

Best regards,
Marta GG

Kylie Lange wrote:
> Hi Marta,
>
> Thanks! I have run it on a dozen different subsets of the data, and the
> results agree with the median from FREQ (which does use fractional weights,
> but doesn't give the CI). My computer has dual 2.66 GHz processors and 2GB
> RAM, and the macro completes in approx 2.5 minutes for the complete dataset
> (so I plan to increase the number of bootstrap samples).
>
--

For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD