I have been struggling to write syntax that will create jackknife samples
(in each sample, one person has been deleted), and will calculate the value of a statistic for each sample. This is my first attempt at looping structures, and I can see I'm doing something fundamentally wrong. Any help would be appreciated. My email is [hidden email] Current attempt: COMPUTE JackID=$casenum. COMPUTE SampleWeight=1. EXECUTE. LOOP #other=1 TO 66. /* I have 66 participants. WEIGHT OFF. FILTER OFF. DO IF (#other=JackID). - COMPUTE SampleWeight=0. ELSE. - COMPUTE SampleWeight=1. END IF. WEIGHT BY SampleWeight. FILTER BY SampleWeight. FREQUENCIES VARIABLES=Age /FORMAT=NOTABLE /STATISTICS=MEAN /ORDER= ANALYSIS . END LOOP. EXECUTE. |
This won't be a solution, but a remark on the crucial problem.
At 02:08 PM 12/4/2006, Kim Barchard wrote: >I have been struggling to write syntax that will create jackknife >samples. This is my first attempt at looping structures, and I can see >I'm doing something fundamentally wrong. >COMPUTE JackID=$casenum. >COMPUTE SampleWeight=1. >EXECUTE. > >LOOP #other=1 TO 66. /* I have 66 participants. >WEIGHT OFF. >FILTER OFF. >DO IF (#other=JackID). >- COMPUTE SampleWeight=0. >ELSE. >- COMPUTE SampleWeight=1. >END IF. >WEIGHT BY SampleWeight. >FILTER BY SampleWeight. >FREQUENCIES <--- This is the problem > VARIABLES=Age /FORMAT=NOTABLE > /STATISTICS=MEAN > /ORDER= ANALYSIS . >END LOOP. >EXECUTE. Your problem is the scope of LOOP: LOOP works only within a transformation program. (A transformation program begins with a procedure, EXECUTE, DATA LIST, GET..., ADD FILES, MATCH FILES. It ends with a procedure, EXECUTE, or SAVE.) FREQUENCIES is a procedure, so it can't be within a LOOP. That leaves, for a loop that *can* include a FREQUENCIES, . A !LOOP construct in a macro . An SPSS.submit within a loop in Python, if you have SPSS 14 or 15 . A cute trick involving XSAVE that writes all 66 subsamples to the same file (so, yes, it's 66 times as big), then uses SPLIT FILE to run the 66 FREQUENCIES commands. This sort of thing is often on Raynald Levesque's web site spsstools.net, but on a quick look, I don't find it there. I don't mean to leave you dangling, but for the moment to wait for other participants, who may well have macros or Python code ready. Myself, I might try the XSAVE route, but I'm probably notorious as an XSAVE fan. -Good luck, Richard Ristow |
In reply to this post by Kim Barchard
Hi
When I was out (on holidays) I missed this thread: Adp> Date: Mon, 4 Dec 2006 14:08:33 -0500 Adp> From: Kim Barchard <[hidden email]> Adp> Subject: General Jackknife Syntax Adp> I have been struggling to write syntax that will create jackknife samples Adp> (in each sample, one person has been deleted), and will calculate the Adp> value of a statistic for each sample. This is my first attempt at looping Adp> structures, and I can see I'm doing something fundamentally wrong. Any Adp> help would be appreciated. My email is [hidden email] Adp> Current attempt: Adp> COMPUTE JackID=$casenum. Adp> COMPUTE SampleWeight=1. Adp> EXECUTE. Adp> LOOP #other=1 TO 66. /* I have 66 participants. Adp> WEIGHT OFF. Adp> FILTER OFF. Adp> DO IF (#other=JackID). Adp> - COMPUTE SampleWeight=0. Adp> ELSE. Adp> - COMPUTE SampleWeight=1. Adp> END IF. Adp> WEIGHT BY SampleWeight. Adp> FILTER BY SampleWeight. Adp> FREQUENCIES Adp> VARIABLES=Age /FORMAT=NOTABLE Adp> /STATISTICS=MEAN Adp> /ORDER= ANALYSIS . Adp> END LOOP. Adp> EXECUTE. Richard replied: Adp> Date: Mon, 4 Dec 2006 15:41:51 -0500 Adp> From: Richard Ristow <[hidden email]> Adp> Subject: Re: General Jackknife Syntax Adp> This won't be a solution, but a remark on the crucial problem. Adp> At 02:08 PM 12/4/2006, Kim Barchard wrote: >>I have been struggling to write syntax that will create jackknife >>samples. This is my first attempt at looping structures, and I can see >>I'm doing something fundamentally wrong. >>COMPUTE JackID=$casenum. >>COMPUTE SampleWeight=1. >>EXECUTE. >> >>LOOP #other=1 TO 66. /* I have 66 participants. >>WEIGHT OFF. >>FILTER OFF. >>DO IF (#other=JackID). >>- COMPUTE SampleWeight=0. >>ELSE. >>- COMPUTE SampleWeight=1. >>END IF. >>WEIGHT BY SampleWeight. >>FILTER BY SampleWeight. >>FREQUENCIES <--- This is the problem >> VARIABLES=Age /FORMAT=NOTABLE >> /STATISTICS=MEAN >> /ORDER= ANALYSIS . >>END LOOP. >>EXECUTE. Adp> Your problem is the scope of LOOP: LOOP works only within a Adp> transformation program. (A transformation program begins with a Adp> procedure, EXECUTE, DATA LIST, GET..., ADD FILES, MATCH FILES. It ends Adp> with a procedure, EXECUTE, or SAVE.) Adp> FREQUENCIES is a procedure, so it can't be within a LOOP. Adp> That leaves, for a loop that *can* include a FREQUENCIES, Adp> . A !LOOP construct in a macro Adp> . An SPSS.submit within a loop in Python, if you have SPSS 14 or 15 Adp> . A cute trick involving XSAVE that writes all 66 subsamples to the Adp> same file (so, yes, it's 66 times as big), then uses SPLIT FILE to run Adp> the 66 FREQUENCIES commands. Adp> This sort of thing is often on Raynald Levesque's web site Adp> spsstools.net, but on a quick look, I don't find it there. Adp> I don't mean to leave you dangling, but for the moment to wait for Adp> other participants, who may well have macros or Python code ready. Adp> Myself, I might try the XSAVE route, but I'm probably notorious as an Adp> XSAVE fan. Now, my contribution: Richard can be an XSAVE fan, but I'm also notorious as a MATRIX fan. I've done some jack-knifing myself using MATRIX to handle the data with loops. See below an example (just in case it inspires you). It computes the jack-knifed means and SD, and then computes their correlation (since in normal distributions mean and standard deviation are uncorrelated, a way of verifying normality is testing wether both statistics are correlated or not). The jack means and SD are exported to the active dataset. This same approach can be used to jack-knife medians, R-square, regression parameters... Regards, Marta * Sample dataset (replace by your own) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VARIABLE LABEL copper'Urinary copper (µmol/24hr)'. * Jack-knife normality test (with JK means and SD) *. MATRIX. GET data /VAR=ALL /NAMES=vname /MISSING=OMIT. COMPUTE k=NROW(data). COMPUTE mean=CSUM(data)/k. COMPUTE variance=(CSSQ(data)-k&*(mean&**2))/(k-1). PRINT {mean,variance} /FORMAT='F8.3' /RNAME=vname /CLABEL='Mean','Variance' /TITLE='Statistics for full sample'. * Compute empty matrix to store JK statistics (means&SD) *. COMPUTE jack=MAKE(k,2,0). COMPUTE n=k-1. * Handle 1st & last values *. COMPUTE jack(k,1)=MSUM(sample)/n. COMPUTE jack(k,2)=(MSSQ(sample)-n&*(jack(k,1)&**2))/(n-1). * Cycle thru all values *. LOOP i=1 TO k. - DO IF (i EQ 1). /* JK sample for 1st position *. - COMPUTE sample=data(2:k). - ELSE IF (i GT 1) AND (i LT k). /* JK samples for 2nd to last-1 postion *. - COMPUTE sample={data(1:(i-1));data((i+1):k)}. - ELSE IF (i EQ k). /* JK sample for last position *. - COMPUTE sample=data(1:(k-1)). - END IF. - COMPUTE jack(i,1)=MSUM(sample)/n. - COMPUTE jack(i,2)=(CSSQ(sample)-n&*(jack(i,1)&**2))/(n-1). END LOOP. RELEASE data, sample. DO IF k LE 50. /* Limit output for big sample sizes *. PRINT jack /FORMAT='F8.3' /RNAME=vname /CLABEL='Mean','Variance' /TITLE='Jack-knife statistics'. END IF. * JK normality test *. COMPUTE n=NROW(jack). COMPUTE mean=CSUM(jack)/n. COMPUTE variance=(CSSQ(jack)-n&*(mean&**2))/(n-1). COMPUTE x=jack(:,1). COMPUTE y=jack(:,2). COMPUTE covxy=((T(x)*y)-n*mean(1)*mean(2))/(n-1). COMPUTE r=covxy/SQRT(variance(1)*variance(2)). PRINT {covxy,r} /FORMAT='F8.3' /CLABEL='COVxy','r' /TITLE='Covariance & Pearson r'. COMPUTE tvalue=r*SQRT((n-2)/(1-r**2)). COMPUTE tsig=2*(1-TCDF(ABS(tvalue),n-2)). PRINT {tvalue,tsig} /FORMAT='F4.2' /CLABEL='t value','t Sig' /TITLE='Significance test for r (df=n-2)'. COMPUTE zr=.5*(LN((1+r)/(1-r))). COMPUTE sezr=1/SQRT(n-3). COMPUTE lowzr=zr-1.96*sezr. COMPUTE uppzr=zr+1.96*sezr. COMPUTE lowr=((exp(2*lowzr))-1)/((exp(2*lowzr))+1). COMPUTE uppr=((exp(2*uppzr))-1)/((exp(2*uppzr))+1). PRINT {lowr,uppr} /FORMAT='F8.2' /CLABEL="Lower" "Upper" /TITLE='95%CI for R:'. * Export JK statistics *. COMPUTE namevec={'jackmean','jackvar'}. SAVE jack /OUTFILE='C:\Temp\extracols.sav' /NAMES=namevec. PRINT /TITLE='Jack-knife statistics saved to disk'. END MATRIX. MATCH FILES /FILE=* /FILE='C:\Temp\extracols.sav'. EXECUTE . |
At 12:37 PM 12/10/2006, Marta García-Granero wrote:
>Richard replied: > >>Your problem is the scope of LOOP: LOOP works >>only within a transformation program. >> >>That leaves, for a loop that *can* include a FREQUENCIES, >>. A !LOOP construct in a macro >>. An SPSS.submit within a loop in Python, if you have SPSS 14 or 15 >>. A cute trick involving XSAVE that writes all >>66 subsamples to the same file (so, yes, it's >>66 times as big), then uses SPLIT FILE to run >>the 66 FREQUENCIES commands. >> >>Myself, I might try the XSAVE route, but I'm >>probably notorious as an XSAVE fan. > >Now, my contribution: > >Richard can be an XSAVE fan, but I'm also >notorious as a MATRIX fan. I've done some >jack-knifing myself using MATRIX to handle the >data with loops. See below an example (just in case it inspires you). Thanks, Marta! A whole approach I'd missed. (And, yes; you are teaching the rest of us how much you can do with MATRIX, though I admit I'm a slow learner.) Hope it was a GREAT vacation. Richard |
In reply to this post by Marta García-Granero
Hi
After reviewing my last message, I realized there were some details to mention: 1) My example works as it is because the sample dataset has 40 cases. Kim mentioned 66 cases. Trying to run the code with datasets more than 40 cases long will trigger a MATRIX error concerning MAX LOOPS being reached. It can be avoided using SET MXLOOPS=100 (or 1000... depending on sample size) before the MATRIX...END MATRIX procedure. 2) I forgot to delete (I was changing slightly the structure of the MATRIX code while I wrote the message) a couple of useless lines: * Handle 1st & last values *. COMPUTE jack(k,1)=MSUM(sample)/n. COMPUTE jack(k,2)=(MSSQ(sample)-n&*(jack(k,1)&**2))/(n-1). My first version handled the first and last cases as exceptions outside the loop. I replaced that by the DO IF, ELSE IF, ELSE inside the loop (cleaner code), but forgot to delete part of the redundant lines (it doesn't affect the result, but it triggers a couple of error messages - Error # 12492 and Error # 12419 - before going on with the calculations and display of results). [Insert an embarrased smile here: I forgot rule number one "Thou shalt never modify code inside a message without testing it in SPSS before"] 3) A limitation of this MATRIX code: Statements like: COMPUTE sample=data(2:k). Or: COMPUTE sample={data(1:(i-1));data((i+1):k)}. Will fail if more than 100000 rows of data are being handled. This is an undocument limitation of MATRIX (at least, it was undocumented in SPSS 13, perhaps I should take a look at SPSS 15 manual). Jon Peck said to me it might be removed in future versions (provided its removal didn't cause other errors or malfunctioning in MATRIX). The revised code (it handles points #1 & #2) is: * Sample dataset (replace by your own) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VARIABLE LABEL copper'Urinary copper (µmol/24hr)'. TEMPORARY. SET MXLOOPS=100. /* Set it according to your sample size *. * Jack-knife normality test (with JK means and SD) *. MATRIX. GET data /VAR=ALL /NAMES=vname /MISSING=OMIT. COMPUTE k=NROW(data). COMPUTE mean=CSUM(data)/k. COMPUTE variance=(CSSQ(data)-k&*(mean&**2))/(k-1). PRINT {mean,variance} /FORMAT='F8.3' /RNAME=vname /CLABEL='Mean','Variance' /TITLE='Statistics for full sample'. * Compute empty matrix to store JK statistics (means&SD) *. COMPUTE jack=MAKE(k,2,0). COMPUTE n=k-1. * Cycle thru all values *. LOOP i=1 TO k. - DO IF (i EQ 1). /* JK sample for 1st position *. - COMPUTE sample=data(2:k). - ELSE IF (i GT 1) AND (i LT k). /* JK samples for 2nd to last-1 postion *. - COMPUTE sample={data(1:(i-1));data((i+1):k)}. - ELSE IF (i EQ k). /* JK sample for last position *. - COMPUTE sample=data(1:(k-1)). - END IF. - COMPUTE jack(i,1)=MSUM(sample)/n. - COMPUTE jack(i,2)=(CSSQ(sample)-n&*(jack(i,1)&**2))/(n-1). END LOOP. RELEASE data, sample. * Limit output to 10 first values *. PRINT {jack(1:10,:)} /FORMAT='F8.3' /RNAME=vname /CLABEL='Mean','Variance' /TITLE='Jack-knife statistics for rows 1-10'. * JK normality test *. COMPUTE n=NROW(jack). COMPUTE mean=CSUM(jack)/n. COMPUTE variance=(CSSQ(jack)-n&*(mean&**2))/(n-1). COMPUTE x=jack(:,1). COMPUTE y=jack(:,2). COMPUTE covxy=((T(x)*y)-n*mean(1)*mean(2))/(n-1). COMPUTE r=covxy/SQRT(variance(1)*variance(2)). PRINT {covxy,r} /FORMAT='F8.3' /CLABEL='COVxy','r' /TITLE='Covariance & Pearson r'. COMPUTE tvalue=r*SQRT((n-2)/(1-r**2)). COMPUTE tsig=2*(1-TCDF(ABS(tvalue),n-2)). PRINT {tvalue,tsig} /FORMAT='F4.2' /CLABEL='t value','t Sig' /TITLE='Significance test for r (df=n-2)'. COMPUTE zr=.5*(LN((1+r)/(1-r))). COMPUTE sezr=1/SQRT(n-3). COMPUTE lowzr=zr-1.96*sezr. COMPUTE uppzr=zr+1.96*sezr. COMPUTE lowr=((exp(2*lowzr))-1)/((exp(2*lowzr))+1). COMPUTE uppr=((exp(2*uppzr))-1)/((exp(2*uppzr))+1). PRINT {lowr,uppr} /FORMAT='F8.2' /CLABEL="Lower" "Upper" /TITLE='95%CI for R:'. * Export JK statistics *. COMPUTE namevec={'jackmean','jackvar'}. SAVE jack /OUTFILE='C:\Temp\extracols.sav' /NAMES=namevec. PRINT /TITLE='Jack-knife statistics added to active dataset'. END MATRIX. MATCH FILES /FILE=* /FILE='C:\Temp\extracols.sav'. * This one goes for Richard (replace EXE by a PROCEDURE) *. LIST /CASES=FROM 1 TO 10. -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Free forum by Nabble | Edit this page |