Hello,
I wonder how I should use SPSs syntax for the following problem. I have 100 cases (countries) and would like to do a regression of y on x1 x2 and x3. To check the robustness of the results, I was asked to replicate the analyses 100 times. So, in one run I might exclude Uganda but include the remaining 99 countries; in another analyses I might exclude Angola but include the remaining 99 other countries and so on. I wonder how I might program this. My variables are country (for identification of the cases), y (dependent var) and x1, x2, x3 (independent var). Any help (references, code segments, syntax examples) would be highly appreciated! Many thanks to all of you! ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Tino.
You are asking for a jackknife (leave-one-out) analysis. See if you can adapt the following code for your purposes: *** JACKKNIFE STATISTICS FOR MULTIPLE LINEAR REGRESSION *** * For the whole set of equations used, see Set #1, #2 & #5 at http://www.anthony-vba.kefra.com/vba/vba9.htm *. * Uses SPSS 14/Newer *. DEFINE JACKMLR(y=!TOKENS(1) / x=!CMDEND). * This step uses SPSS 14/Newer *. DATASET NAME OriginalData. PRESERVE. SET MXLOOPS=10000. /* Should be at least equal to sample size *. MATRIX. PRINT /TITLE='************** JACKKNIFE MLR STATISTICS **************'. * Input data (DV goes first) with listwise deletion *. GET data /VAR=!y !x /MISSING=OMIT /NAME=vnames. * Full sample statistics *. COMPUTE n=NROW(data). COMPUTE p=NCOL(data). /* Nr. of parameters (with constant) *. COMPUTE x={MAKE(n,1,1),data(:,(2:p))}. /* Split data into "y" and "x" matrices *. COMPUTE y=data(:,1). COMPUTE meany=CSUM(y)/n. COMPUTE b=GINV(x)*y. /* Note: GINV(x)=INV[T(x)*x]*T(x) *. COMPUTE ESS=T(b)*T(x)*y-n*meany&**2. /* Effect Sum of Squares *. COMPUTE TSS=T(Y)*y-n*meany&**2. /* Total Sum of Squares *. COMPUTE sigma2=(TSS-ESS)/(n-p). /* Residual variance *. COMPUTE SEb=SQRT(DIAG(sigma2*INV(T(x)*x))). /* SE of coefficients *. COMPUTE tvalue=b/SEb. COMPUTE tsig=2*(1-TCDF(ABS(tvalue),n-p)). COMPUTE Rsquare=ESS/TSS. COMPUTE Psquare=1-(1-rsquare)*(n-1)/(n-p). COMPUTE Fvalue=(rsquare/(p-1))/((1-rsquare)/(n-p)). COMPUTE Fsig=1-FCDF(Fvalue,(p-1),(n-p)). * Reports *. COMPUTE vnames={'Constant',vnames(2:p)}. PRINT /TITLE='Full sample statistics'. PRINT {b,SEb,tvalue,tsig} /FORMAT='F8.3' /RNAMES=vnames /CLABEL='Coeff.','SE','T','Sig.' /TITLE='Unstandardized coefficients'. PRINT {rsquare,psquare,SQRT(sigma2)} /FORMAT='F8.3' /CLABEL='RSq.','AdjRSq.','Res(SD)' /TITLE='Model Summary'. PRINT {Fvalue,Fsig} /FORMAT='F8.4' /CLABEL='F value','Sig.' /TITLE='R-square test (F value & Significance)'. * JACKKNIFE STATISTICS *. COMPUTE nj=n-1. /* New sample size *. * Compute empty matrix to store JK statistics *. COMPUTE JackStat=MAKE(n,(p+2),0). * Cycle thru all values *. LOOP i=1 TO n. . DO IF (i EQ 1). /* Extract JK sample for first case *. . COMPUTE sample=data(2:n,:). . ELSE IF (i GT 1) AND (i LT n). /* JK samples for cases from 2 to n-1 *. . COMPUTE sample={data(1:(i-1),:); data((i+1):n,:)}. . ELSE IF (i EQ n). /* Extract JK sample for last case *. . COMPUTE sample=data(1:(n-1),:). . END IF. . * Statistics for every JK sample *. . COMPUTE x={MAKE(nj,1,1),sample(:,(2:p))}. . COMPUTE y=sample(:,1). . COMPUTE meany=CSUM(y)/nj. . COMPUTE b=GINV(x)*y. . COMPUTE ESS=T(b)*T(x)*y-nj*meany&**2. . COMPUTE TSS=T(Y)*y-nj*meany&**2. . COMPUTE Rsquare=ESS/TSS. . COMPUTE Psquare=1-(1-rsquare)*(nj-1)/(nj-p). . * Store all statistics in JackStat(i) *. . COMPUTE JackStat(i,1) =Rsquare. . COMPUTE JackStat(i,2) =Psquare. . COMPUTE Jackstat(i,3:(p+2))=T(b). END LOOP. * Report first 10 values *. COMPUTE AllNames={'R2','P2',vnames}. COMPUTE CasesID={' 1',' 2',' 3',' 4',' 5',' 6',' 7',' 8',' 9','10'}. PRINT {JackStat(1:10,:)} /FORMAT='F8.3' /CNAMES=AllNames /RNAMES=CasesID /TITLE='Jackknife statistics for rows 1-10'. * Export JK statistics to active dataset *. SAVE JackStat /OUTFILE='C:\Temp\Coefficients.sav' /NAMES=AllNames. PRINT /TITLE='Jackknife statistics exported to Coefficients Dataset'. END MATRIX. RESTORE. * This part uses SPSS 14/Newer *. GET FILE'C:\Temp\Coefficients.sav'. DATASET NAME Coefficients. FORMAT ALL (F8.3). VAR LABEL R2 'R Square'/P2 'Adj.R Square'. FREQUENCIES VAR=ALL /FORMAT=NOTABLE /PERCENTILES= 2.5 25 50 75 97.5 /STATISTICS=STDDEV MINIMUM MAXIMUM MEAN. DATASET ACTIVATE OriginalData. DATASET CLOSE Coefficients. !ENDDEFINE. * Sample dataset (50 random cases from Rosner's dataset FEV.sav) *. DATA LIST FREE/fev(F8.3) age(F8.0) hgt(F8.1) gender smoke (2 F8.0). BEGIN DATA 1.415 6 56.0 0 0 2.646 10 60.0 1 0 3.519 19 66.0 0 1 3.000 9 65.5 1 0 3.428 14 64.0 0 1 1.694 11 60.0 1 1 3.957 14 72.0 1 1 1.962 8 57.0 1 0 2.384 12 63.5 0 1 2.679 15 66.0 0 1 2.387 10 66.0 0 1 1.794 8 54.5 1 0 2.646 13 61.5 0 0 2.198 15 62.0 0 1 3.345 19 65.5 0 1 2.599 13 62.5 0 1 3.082 17 67.0 1 1 2.903 16 63.0 0 1 3.004 15 64.0 0 1 1.603 7 51.0 0 0 1.196 5 46.5 0 0 1.697 8 59.0 0 0 2.813 10 61.5 0 0 3.985 15 71.0 1 0 4.309 14 69.0 1 1 1.947 9 56.5 0 0 3.169 11 62.5 0 1 3.406 17 69.0 1 1 2.358 10 59.0 0 0 1.933 9 58.0 0 0 3.297 13 65.0 0 1 3.680 14 67.0 1 0 1.953 9 58.0 1 1 3.247 11 65.5 1 0 4.086 18 67.0 1 1 3.585 14 70.0 1 0 3.498 10 68.0 1 1 2.953 11 67.0 0 1 3.127 10 62.0 1 0 1.338 6 51.5 0 0 2.569 12 63.0 0 0 3.320 11 65.5 1 0 3.780 14 70.0 1 0 4.404 18 70.5 1 1 4.637 11 72.0 1 1 3.727 15 68.0 1 1 4.203 12 71.0 1 0 2.564 7 58.0 0 0 3.152 13 62.0 0 1 2.391 10 59.5 1 0 END DATA. VAR LABEL fev 'FEV (liters)' /age 'Age (years)' /hgt 'Height (inches)'. VALUE LABEL gender 0 'Female' 1 'Male' /smoke 0'Non Smoker' 1'Smoker'. VAR LEV gender smoke (NOMINAL). * Adding an interaction term to the model *. COMPUTE smokehgt=smoke*hgt. FORMAT smokehgt (F8). REGRESSION /STATISTICS COEFF OUTS R ANOVA /DEPENDENT fev /METHOD=ENTER age hgt gender smoke smokehgt. JACKMLR y=fev x=age TO smokehgt. El 24/02/2012 9:31, Tino Nsenene escribió: > Hello, > > I wonder how I should use SPSs syntax for the following problem. > > I have 100 cases (countries) and would like to do a regression of y on x1 x2 > and x3. > > To check the robustness of the results, I was asked to replicate the > analyses 100 times. So, in one run I might exclude Uganda but include the > remaining 99 countries; in another analyses I might exclude Angola but > include the remaining 99 other countries and so on. > > I wonder how I might program this. My variables are country (for > identification of the cases), y (dependent var) and x1, x2, x3 (independent > var). > > Any help (references, code segments, syntax examples) would be highly > appreciated! Many thanks to all of you! > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
See RESIDUALS SUBCOMMAND in REGRESSION!!!
----- DRESID Deleted residuals. DFBETA Change in the regression coefficient that results from the deletion of the ith case. A DFBETA value is computed for each case for each regression coefficient generated by a model. (Belsley, Kuh, and Welsch, 1980) SDBETA Standardized DFBETA. An SDBETA value is computed for each case for each regression coefficient generated by a model. (Belsley et al., 1980) DFFIT Change in the predicted value when the ith case is deleted.(Belsley et al., 1980) SDFIT Standardized DFFIT.(Belsley et al., 1980) -------------------------------------------------------------------------------------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
A Jackknife approach gives you also the chance of estimating bias and
getting estimates of the standard errors of, for instance, R-square. You can also see the prediction error for the excluded value when predicted by the n-1 models, providing an interval validation of the model, and its robustness (as Tino asked for). It gives you the same information the RESIDUALS subcommand gives... plus more. So, what the fuss about the piece of code? Marta El 24/02/2012 14:28, David Marso escribió: > See RESIDUALS SUBCOMMAND in REGRESSION!!! > ----- > DRESID Deleted residuals. > DFBETA Change in the regression coefficient that results from the deletion > of the ith case. A > DFBETA value is computed for each case for each regression coefficient > generated > by a model. (Belsley, Kuh, and Welsch, 1980) > SDBETA Standardized DFBETA. An SDBETA value is computed for each case for > each > regression coefficient generated by a model. (Belsley et al., 1980) > DFFIT Change in the predicted value when the ith case is deleted.(Belsley et > al., 1980) > SDFIT Standardized DFFIT.(Belsley et al., 1980) > ------------------------------------------------------------------------------------- > > Marta García-Granero-2 wrote >> Hi Tino. >> >> You are asking for a jackknife (leave-one-out) analysis. >> <SNIP> >> >> >> El 24/02/2012 9:31, Tino Nsenene escribió: >>> Hello, >>> >>> I wonder how I should use SPSs syntax for the following problem. >>> >>> I have 100 cases (countries) and would like to do a regression of y on x1 >>> x2 >>> and x3. >>> >>> To check the robustness of the results, I was asked to replicate the >>> analyses 100 times. So, in one run I might exclude Uganda but include the >>> remaining 99 countries; in another analyses I might exclude Angola but >>> include the remaining 99 other countries and so on. >>> >>> I wonder how I might program this. My variables are country (for >>> identification of the cases), y (dependent var) and x1, x2, x3 >>> (independent >>> var). >>> >>> Any help (references, code segments, syntax examples) would be highly >>> appreciated! Many thanks to all of you! >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Syntax-question-Replication-analysis-tp5512075p5512784.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
No fuss Marta, Just didn't think it necessary to quote your entire program.
Most people simply don't know about D*-residuals available in Regression, GLM etc. and they provide the essential information available from a JK.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Note that the most expensive calculations in the JK loop.
. COMPUTE meany=CSUM(y)/nj. . COMPUTE b=GINV(x)*y. can be made more efficient by use of provisional means approach to calculation of mean and the DFBETA from REGRESSION. Meany=(OverallMean*n-Y(i))*(n-1). JKBeta(i,:)=Beta(*)-DFBeta(i,:) So we can save all of that expensive matrix inversion by running regression first, saving the DFBeta and simple subtraction to obtain the JK coefficients. With a large N I imagine this can be considerable. -------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Additionally, there is no need to create N JK samples and repeat a complete analysis for each sample.
Note [X'X](i) = X'X - X(i)'X(i) using [X'X](i) to denote the model crossproducts without row i. X(i) is the ith row of X. X'X is the full sample crossproducts . Similarily for X'Y. So all of the stats for the JK can be calculated using updating rather than complete calculations from scratch. COMPUTE TXY=T(X)*Y. COMPUTE TYY=T(Y)*Y. . COMPUTE ESS=T(b)*TXY - nj*meany&**2. . COMPUTE TSS=T(Y)*y-nj*meany&**2. In the loop... COMPUTE Bi=B-DFBeta. COMPUTE MeanYi=(MeanY*N-Y(i))/nj. COMPUTE MC=nj*MeanYi&**2. COMPUTE TXYi=TXY-T(X(i,:))*Y(i,:). COMPUTE ESSi=T(Bi)*TXYi-MC. COMPUTE TSSi=TYY-Y(i,:)*Y(i)-MC. I might find time and juice to implement this using Marta's code as a baseline. Basic outline: Run Regression and save DFBeta COMPUTE TXX=T(X)*X. COMPUTE TXY=T(X)*Y. COMPUTE Beta=GINV(TXX)*TXY. ..... LOOP.... COMPUTE Betai=Beta-DFBETA(i,:). etc... HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |