General Jackknife Syntax

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

General Jackknife Syntax

Kim Barchard
I have been struggling to write syntax that will create jackknife samples
(in each sample, one person has been deleted), and will calculate the
value of a statistic for each sample.  This is my first attempt at looping
structures, and I can see I'm doing something fundamentally wrong.  Any
help would be appreciated.  My email is [hidden email]

Current attempt:

COMPUTE JackID=$casenum.
COMPUTE SampleWeight=1.
EXECUTE.

LOOP #other=1 TO 66.       /* I have 66 participants.
WEIGHT OFF.
FILTER OFF.
DO IF (#other=JackID).
- COMPUTE SampleWeight=0.
ELSE.
- COMPUTE SampleWeight=1.
END IF.
WEIGHT BY SampleWeight.
FILTER BY SampleWeight.
FREQUENCIES
  VARIABLES=Age  /FORMAT=NOTABLE
  /STATISTICS=MEAN
  /ORDER=  ANALYSIS .
END LOOP.
EXECUTE.
Reply | Threaded
Open this post in threaded view
|

Re: General Jackknife Syntax

Richard Ristow
This won't be a solution, but a remark on the crucial problem.

At 02:08 PM 12/4/2006, Kim Barchard wrote:

>I have been struggling to write syntax that will create jackknife
>samples. This is my first attempt at looping structures, and I can see
>I'm doing something fundamentally wrong.

>COMPUTE JackID=$casenum.
>COMPUTE SampleWeight=1.
>EXECUTE.
>
>LOOP #other=1 TO 66.       /* I have 66 participants.
>WEIGHT OFF.
>FILTER OFF.
>DO IF (#other=JackID).
>- COMPUTE SampleWeight=0.
>ELSE.
>- COMPUTE SampleWeight=1.
>END IF.
>WEIGHT BY SampleWeight.
>FILTER BY SampleWeight.
>FREQUENCIES           <--- This is the problem
>   VARIABLES=Age  /FORMAT=NOTABLE
>   /STATISTICS=MEAN
>   /ORDER=  ANALYSIS .
>END LOOP.
>EXECUTE.

Your problem is the scope of LOOP: LOOP works only within a
transformation program. (A transformation program begins with a
procedure, EXECUTE, DATA LIST, GET..., ADD FILES, MATCH FILES. It ends
with a procedure, EXECUTE, or SAVE.)

FREQUENCIES is a procedure, so it can't be within a LOOP.

That leaves, for a loop that *can* include a FREQUENCIES,
. A !LOOP construct in a macro
. An SPSS.submit within a loop in Python, if you have SPSS 14 or 15
. A cute trick involving XSAVE that writes all 66 subsamples to the
same file (so, yes, it's 66 times as big), then uses SPLIT FILE to run
the 66 FREQUENCIES commands.

This sort of thing is often on Raynald Levesque's web site
spsstools.net, but on a quick look, I don't find it there.

I don't mean to leave you dangling, but for the moment to wait for
other participants, who may well have macros or Python code ready.
Myself, I might try the XSAVE route, but I'm probably notorious as an
XSAVE fan.

-Good luck,
  Richard Ristow
Reply | Threaded
Open this post in threaded view
|

Re: General Jackknife Syntax

Marta García-Granero
In reply to this post by Kim Barchard
Hi

When I was out (on holidays) I missed this thread:

Adp> Date:    Mon, 4 Dec 2006 14:08:33 -0500
Adp> From:    Kim Barchard <[hidden email]>
Adp> Subject: General Jackknife Syntax

Adp> I have been struggling to write syntax that will create jackknife samples
Adp> (in each sample, one person has been deleted), and will calculate the
Adp> value of a statistic for each sample.  This is my first attempt at looping
Adp> structures, and I can see I'm doing something fundamentally wrong.  Any
Adp> help would be appreciated.  My email is [hidden email]

Adp> Current attempt:

Adp> COMPUTE JackID=$casenum.
Adp> COMPUTE SampleWeight=1.
Adp> EXECUTE.

Adp> LOOP #other=1 TO 66.       /* I have 66 participants.
Adp> WEIGHT OFF.
Adp> FILTER OFF.
Adp> DO IF (#other=JackID).
Adp> - COMPUTE SampleWeight=0.
Adp> ELSE.
Adp> - COMPUTE SampleWeight=1.
Adp> END IF.
Adp> WEIGHT BY SampleWeight.
Adp> FILTER BY SampleWeight.
Adp> FREQUENCIES
Adp>   VARIABLES=Age  /FORMAT=NOTABLE
Adp>   /STATISTICS=MEAN
Adp>   /ORDER=  ANALYSIS .
Adp> END LOOP.
Adp> EXECUTE.

Richard replied:

Adp> Date:    Mon, 4 Dec 2006 15:41:51 -0500
Adp> From:    Richard Ristow <[hidden email]>
Adp> Subject: Re: General Jackknife Syntax

Adp> This won't be a solution, but a remark on the crucial problem.

Adp> At 02:08 PM 12/4/2006, Kim Barchard wrote:

>>I have been struggling to write syntax that will create jackknife
>>samples. This is my first attempt at looping structures, and I can see
>>I'm doing something fundamentally wrong.

>>COMPUTE JackID=$casenum.
>>COMPUTE SampleWeight=1.
>>EXECUTE.
>>
>>LOOP #other=1 TO 66.       /* I have 66 participants.
>>WEIGHT OFF.
>>FILTER OFF.
>>DO IF (#other=JackID).
>>- COMPUTE SampleWeight=0.
>>ELSE.
>>- COMPUTE SampleWeight=1.
>>END IF.
>>WEIGHT BY SampleWeight.
>>FILTER BY SampleWeight.
>>FREQUENCIES           <--- This is the problem
>>   VARIABLES=Age  /FORMAT=NOTABLE
>>   /STATISTICS=MEAN
>>   /ORDER=  ANALYSIS .
>>END LOOP.
>>EXECUTE.

Adp> Your problem is the scope of LOOP: LOOP works only within a
Adp> transformation program. (A transformation program begins with a
Adp> procedure, EXECUTE, DATA LIST, GET..., ADD FILES, MATCH FILES. It ends
Adp> with a procedure, EXECUTE, or SAVE.)

Adp> FREQUENCIES is a procedure, so it can't be within a LOOP.

Adp> That leaves, for a loop that *can* include a FREQUENCIES,
Adp> . A !LOOP construct in a macro
Adp> . An SPSS.submit within a loop in Python, if you have SPSS 14 or 15
Adp> . A cute trick involving XSAVE that writes all 66 subsamples to the
Adp> same file (so, yes, it's 66 times as big), then uses SPLIT FILE to run
Adp> the 66 FREQUENCIES commands.

Adp> This sort of thing is often on Raynald Levesque's web site
Adp> spsstools.net, but on a quick look, I don't find it there.

Adp> I don't mean to leave you dangling, but for the moment to wait for
Adp> other participants, who may well have macros or Python code ready.
Adp> Myself, I might try the XSAVE route, but I'm probably notorious as an
Adp> XSAVE fan.

Now, my contribution:

Richard can be an XSAVE fan, but I'm also notorious as a MATRIX fan.
I've done some jack-knifing myself using MATRIX to handle the data
with loops. See below an example (just in case it inspires you). It
computes the jack-knifed means and SD, and then computes their
correlation (since in normal distributions mean and standard deviation
are uncorrelated, a way of verifying normality is testing wether both
statistics are correlated or not). The jack means and SD are exported
to the active dataset. This same approach can be used to jack-knife
medians, R-square, regression parameters...

Regards,
Marta

* Sample dataset (replace by your own) *.
DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VARIABLE LABEL copper'Urinary copper (µmol/24hr)'.

* Jack-knife normality test (with JK means and SD) *.
MATRIX.
GET data /VAR=ALL /NAMES=vname /MISSING=OMIT.
COMPUTE k=NROW(data).
COMPUTE mean=CSUM(data)/k.
COMPUTE variance=(CSSQ(data)-k&*(mean&**2))/(k-1).
PRINT {mean,variance}
 /FORMAT='F8.3'
 /RNAME=vname
 /CLABEL='Mean','Variance'
 /TITLE='Statistics for full sample'.
* Compute empty matrix to store JK statistics (means&SD) *.
COMPUTE jack=MAKE(k,2,0).
COMPUTE n=k-1.
* Handle 1st & last values *.
COMPUTE jack(k,1)=MSUM(sample)/n.
COMPUTE jack(k,2)=(MSSQ(sample)-n&*(jack(k,1)&**2))/(n-1).
* Cycle thru all values *.
LOOP i=1 TO k.
- DO IF (i EQ 1).                          /* JK sample for 1st position *.
-  COMPUTE sample=data(2:k).
- ELSE IF (i GT 1) AND (i LT k). /* JK samples for 2nd to last-1 postion *.
-  COMPUTE sample={data(1:(i-1));data((i+1):k)}.
- ELSE IF (i EQ k).                       /* JK sample for last position *.
-  COMPUTE sample=data(1:(k-1)).
- END IF.
- COMPUTE jack(i,1)=MSUM(sample)/n.
- COMPUTE jack(i,2)=(CSSQ(sample)-n&*(jack(i,1)&**2))/(n-1).
END LOOP.
RELEASE data, sample.
DO IF k LE 50. /* Limit output for big sample sizes *.
PRINT jack
 /FORMAT='F8.3'
 /RNAME=vname
 /CLABEL='Mean','Variance'
 /TITLE='Jack-knife statistics'.
END IF.
* JK normality test *.
COMPUTE n=NROW(jack).
COMPUTE mean=CSUM(jack)/n.
COMPUTE variance=(CSSQ(jack)-n&*(mean&**2))/(n-1).
COMPUTE x=jack(:,1).
COMPUTE y=jack(:,2).
COMPUTE covxy=((T(x)*y)-n*mean(1)*mean(2))/(n-1).
COMPUTE r=covxy/SQRT(variance(1)*variance(2)).
PRINT {covxy,r}
 /FORMAT='F8.3'
 /CLABEL='COVxy','r'
 /TITLE='Covariance & Pearson r'.
COMPUTE tvalue=r*SQRT((n-2)/(1-r**2)).
COMPUTE tsig=2*(1-TCDF(ABS(tvalue),n-2)).
PRINT {tvalue,tsig}
 /FORMAT='F4.2'
 /CLABEL='t value','t Sig'
 /TITLE='Significance test for r (df=n-2)'.
COMPUTE zr=.5*(LN((1+r)/(1-r))).
COMPUTE sezr=1/SQRT(n-3).
COMPUTE lowzr=zr-1.96*sezr.
COMPUTE uppzr=zr+1.96*sezr.
COMPUTE lowr=((exp(2*lowzr))-1)/((exp(2*lowzr))+1).
COMPUTE uppr=((exp(2*uppzr))-1)/((exp(2*uppzr))+1).
PRINT {lowr,uppr}
 /FORMAT='F8.2'
 /CLABEL="Lower" "Upper"
 /TITLE='95%CI for R:'.
* Export JK statistics *.
COMPUTE namevec={'jackmean','jackvar'}.
SAVE jack /OUTFILE='C:\Temp\extracols.sav' /NAMES=namevec.
PRINT /TITLE='Jack-knife statistics saved to disk'.
END MATRIX.
MATCH FILES /FILE=*
 /FILE='C:\Temp\extracols.sav'.
EXECUTE .
Reply | Threaded
Open this post in threaded view
|

Re: General Jackknife Syntax

Richard Ristow
At 12:37 PM 12/10/2006, Marta García-Granero wrote:

>Richard replied:
>
>>Your problem is the scope of LOOP: LOOP works
>>only within a transformation program.
>>
>>That leaves, for a loop that *can* include a FREQUENCIES,
>>. A !LOOP construct in a macro
>>. An SPSS.submit within a loop in Python, if you have SPSS 14 or 15
>>. A cute trick involving XSAVE that writes all
>>66 subsamples to the same file (so, yes, it's
>>66 times as big), then uses SPLIT FILE to run
>>the 66 FREQUENCIES commands.
>>
>>Myself, I might try the XSAVE route, but I'm
>>probably notorious as an XSAVE fan.
>
>Now, my contribution:
>
>Richard can be an XSAVE fan, but I'm also
>notorious as a MATRIX fan. I've done some
>jack-knifing myself using MATRIX to handle the
>data with loops. See below an example (just in case it inspires you).

Thanks, Marta! A whole approach I'd missed. (And,
yes; you are teaching the rest of us how much you
can do with MATRIX, though I admit I'm a slow learner.)

Hope it was a GREAT vacation.
Richard
Reply | Threaded
Open this post in threaded view
|

Re: General Jackknife Syntax

Marta García-Granero
In reply to this post by Marta García-Granero
Hi

After reviewing my last message, I realized there were some details to
mention:

1) My example works as it is because the sample dataset has 40 cases.
Kim mentioned 66 cases. Trying to run the code with datasets more than
40 cases long will trigger a MATRIX error concerning MAX LOOPS being
reached. It can be avoided using SET MXLOOPS=100 (or 1000... depending
on sample size) before the MATRIX...END MATRIX procedure.

2) I forgot to delete (I was changing slightly the structure of the
MATRIX code while I wrote the message) a couple of useless lines:

* Handle 1st & last values *.
COMPUTE jack(k,1)=MSUM(sample)/n.
COMPUTE jack(k,2)=(MSSQ(sample)-n&*(jack(k,1)&**2))/(n-1).

My first version handled the first and last cases as exceptions
outside the loop. I replaced that by the DO IF, ELSE IF, ELSE inside
the loop (cleaner code), but forgot to delete part of the redundant
lines (it doesn't affect the result, but it triggers a couple of error
messages - Error # 12492 and Error # 12419 - before going on with the
calculations and display of results). [Insert an embarrased smile
here: I forgot rule number one "Thou shalt never modify code inside a
message without testing it in SPSS before"]

3) A limitation of this MATRIX code:

Statements like: COMPUTE sample=data(2:k).
Or:              COMPUTE sample={data(1:(i-1));data((i+1):k)}.

Will fail if more than 100000 rows of data are being handled. This
is an undocument limitation of MATRIX (at least, it was undocumented
in SPSS 13, perhaps I should take a look at SPSS 15 manual). Jon Peck
said to me it might be removed in future versions (provided its removal
didn't cause other errors or malfunctioning in MATRIX).

The revised code (it handles points #1 & #2) is:

* Sample dataset (replace by your own) *.
DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VARIABLE LABEL copper'Urinary copper (µmol/24hr)'.

TEMPORARY.
SET MXLOOPS=100. /* Set it according to your sample size *.

* Jack-knife normality test (with JK means and SD) *.
MATRIX.
GET data /VAR=ALL /NAMES=vname /MISSING=OMIT.
COMPUTE k=NROW(data).
COMPUTE mean=CSUM(data)/k.
COMPUTE variance=(CSSQ(data)-k&*(mean&**2))/(k-1).
PRINT {mean,variance}
 /FORMAT='F8.3'
 /RNAME=vname
 /CLABEL='Mean','Variance'
 /TITLE='Statistics for full sample'.
* Compute empty matrix to store JK statistics (means&SD) *.
COMPUTE jack=MAKE(k,2,0).
COMPUTE n=k-1.
* Cycle thru all values *.
LOOP i=1 TO k.
- DO IF (i EQ 1).                          /* JK sample for 1st position *.
-  COMPUTE sample=data(2:k).
- ELSE IF (i GT 1) AND (i LT k). /* JK samples for 2nd to last-1 postion *.
-  COMPUTE sample={data(1:(i-1));data((i+1):k)}.
- ELSE IF (i EQ k).                       /* JK sample for last position *.
-  COMPUTE sample=data(1:(k-1)).
- END IF.
- COMPUTE jack(i,1)=MSUM(sample)/n.
- COMPUTE jack(i,2)=(CSSQ(sample)-n&*(jack(i,1)&**2))/(n-1).
END LOOP.
RELEASE data, sample.
* Limit output to 10 first values *.
PRINT {jack(1:10,:)}
 /FORMAT='F8.3'
 /RNAME=vname
 /CLABEL='Mean','Variance'
 /TITLE='Jack-knife statistics for rows 1-10'.
* JK normality test *.
COMPUTE n=NROW(jack).
COMPUTE mean=CSUM(jack)/n.
COMPUTE variance=(CSSQ(jack)-n&*(mean&**2))/(n-1).
COMPUTE x=jack(:,1).
COMPUTE y=jack(:,2).
COMPUTE covxy=((T(x)*y)-n*mean(1)*mean(2))/(n-1).
COMPUTE r=covxy/SQRT(variance(1)*variance(2)).
PRINT {covxy,r}
 /FORMAT='F8.3'
 /CLABEL='COVxy','r'
 /TITLE='Covariance & Pearson r'.
COMPUTE tvalue=r*SQRT((n-2)/(1-r**2)).
COMPUTE tsig=2*(1-TCDF(ABS(tvalue),n-2)).
PRINT {tvalue,tsig}
 /FORMAT='F4.2'
 /CLABEL='t value','t Sig'
 /TITLE='Significance test for r (df=n-2)'.
COMPUTE zr=.5*(LN((1+r)/(1-r))).
COMPUTE sezr=1/SQRT(n-3).
COMPUTE lowzr=zr-1.96*sezr.
COMPUTE uppzr=zr+1.96*sezr.
COMPUTE lowr=((exp(2*lowzr))-1)/((exp(2*lowzr))+1).
COMPUTE uppr=((exp(2*uppzr))-1)/((exp(2*uppzr))+1).
PRINT {lowr,uppr}
 /FORMAT='F8.2'
 /CLABEL="Lower" "Upper"
 /TITLE='95%CI for R:'.
* Export JK statistics *.
COMPUTE namevec={'jackmean','jackvar'}.
SAVE jack /OUTFILE='C:\Temp\extracols.sav' /NAMES=namevec.
PRINT /TITLE='Jack-knife statistics added to active dataset'.
END MATRIX.
MATCH FILES /FILE=*
 /FILE='C:\Temp\extracols.sav'.

* This one goes for Richard (replace EXE by a PROCEDURE) *.
LIST /CASES=FROM 1 TO 10.



--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)