SPSSX Discussion

Vuong Test example for comparing non-nested linear mixed models employed in SPSS

Classic

List

Threaded

1 message

Ryan

Vuong Test example for comparing non-nested linear mixed models employed in SPSS

Dear SPSS-L,

I know I have mentioned a few times that one can employ a Vuong test to compare non-nested models, but I do not believe I have ever provided an actual example. Below is a simulated example that provides the unadjusted Vuong z-test, the AIC-adjusted Vuong z-test, and the BIC adjusted-Vuong z-test.

Let me make a few comments before providing the code:

1. The code for the Vuong tests was developed from the following wikipedia page (which uses the BIC correction) and the original article:

http://en.wikipedia.org/wiki/Vuong's_closeness_test

Vuong, Quang H. (1989). "Likelihood Ratio Tests for Model Selection and non-nested Hypotheses". Econometrica 57 (2): 307–333.

2. I also examined the way one of the brilliant posters on SAS-L constructed the test using the NLMIXED procedure:

http://listserv.uga.edu/cgi-bin/wa?A2=ind0506a&L=sas-l&P=30791

although this was constructed to compare more complex models.

3. I usually provide comments throughout the code, but I have decided to leave out all comments. My assumption is that by examining the links above, along with the actual article, one should be able to figure out the test construction.

4. For validation purposes, I compared my results to those produced by the SAS macro available on the SAS website. Even though the results from this example are VERY similar to those produced by the SAS macro (p-values are identical to the 3rd decimal place, with a VERY slight divergence at the 4th decimal place), I wonder if there is an error in my code somewhere along the way. Chances are it's just a rounding issue, but not knowing the reason for the slight discrepancy at the 4th decimal place is concerning. I'm not sure when I'll have time to investigate this any further. Certainly, if someone finds an error, please do write back to the list. Again, the difference is so small that it's likely not due to an error.

5. I adapted the data generation code from the IBM website.

Ryan

set seed 9875394.
input program.
loop ID=1 to 1000.
compute x1 = rv.norm(0,1).
compute x2 = rv.norm(0,1).
compute x3 = rv.norm(0,1).
compute x4 = rv.norm(0,1).
end case.
end loop.
end file.
end input program.
execute.

matrix.
get x / variables=x1 to x4.
compute cor={1.000,.441,.301,.400;
                        .441,1.000,.001,.001;
                        .301,.001,1.000,.001;
                        .400,.001,.001,1.000}.

print cor /format=f5.3.
compute deter=det(cor).
print deter / title "determinant of corr matrix" / format=f10.7 .
print sval(cor) / title "singular value decomposition of corr".
print eval(cor) / title "eigenvalues of input corr".

compute condnum=mmax(sval(cor))/mmin(sval(cor)).
print condnum / title "condition number of corr matrix" / format=f10.2 .
compute cho=chol(cor).
print cho / title "cholesky factor of corr matrix" .
compute chochek=t(cho)*cho.
print chochek / title "chol factor premult by its transpose " /format=f10.3 .
compute newx=x*cho.
save newx /outfile=* /variables= X1 to X4.
end matrix.

RENAME VARIABLES (X1 X2 X3 X4 = y x1 x2 x3).

MIXED y WITH x1 x2
/FIXED=x1 x2 | SSTYPE(3)
/METHOD=ML
/SAVE=PRED.

MIXED y WITH x3
/FIXED=x3 | SSTYPE(3)
/METHOD=ML
/SAVE=PRED.

compute #PI = 3.1415926535897932.
compute #model1_resid_var = .7317415469065396.
compute #model2_resid_var = .8034443848961190.
compute model1_ll_i = -0.5 * (ln(2*#PI*#model1_resid_var) + ((y-PRED_1)**2 / #model1_resid_var)).
compute model2_ll_i = -0.5 * (ln(2*#PI*#model2_resid_var) + ((y-PRED_2)**2 / #model2_resid_var)).
compute modeldiff_ll_i = model1_ll_i - model2_ll_i.
execute.

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=
/model1_ll=sum(model1_ll_i)
/model2_ll=sum(model2_ll_i)
/model_diff_sd = sd(modeldiff_ll_i)
/N_BREAK=N.

compute #Vuong_Statistic_numerator = model1_ll - model2_ll.
compute #Vuong_Statistic_denominator = sqrt(N_BREAK*((model_diff_sd)**2)).
compute Vuong_Statistic = #Vuong_Statistic_numerator / #Vuong_Statistic_denominator.

compute #parm1=4.
compute #parm2=3.

compute Vuong_Statistic_AIC_adj = (#Vuong_Statistic_numerator - ((#parm1) - (#parm2))) / (#Vuong_Statistic_denominator).
compute Vuong_Statistic_BIC_adj = (#Vuong_Statistic_numerator - ((((#parm1)/2)*ln(N_BREAK))-(((#parm2)/2)*ln(N_BREAK)))) / (#Vuong_Statistic_denominator).

compute p_value_unadj = 2*(1 - cdfnorm(abs(Vuong_Statistic))).
compute p_value_AIC_adj = 2*(1 - cdfnorm(abs(Vuong_Statistic_AIC_adj))).
compute p_value_BIC_adj = 2*(1 - cdfnorm(abs(Vuong_Statistic_BIC_adj))).
execute.

delete variables PRED_1 PRED_2 model1_ll_i model2_ll_i modeldiff_ll_i model_diff_sd N_BREAK.

compute neg2LL_model1=-2*model1_ll.
compute neg2LL_model2=-2*model2_ll.
execute.