Dear list members,
I am confused about the interpretation of the npar outcomes. I know from the (tax-)register covering my target population the distribution of a variable coded with twelve categories: VAR LAB myvar 1 '-1800' 2 '1800-3600' 3 '3600-6000' 4 '6000-12000' 5 '12000-18000' 6 '18000-24000' 7 '24000-30000' 8 '30000-36000' 9 '36000-60000' 10 '60000-90000' 11 '90000-120000' 12 '120000+'. In my sample I observe the same variable with the same twelve categories. I am interested to compare the two distributions, expecting in the sample the same distribution as in the population. Therefore I run the chi square test as goodness of fit and added the expected proportion for each category. NPAR TEST /CHISQUARE=myvar (1,12) /EXPECTED=0.04 0.08 0.04 1.16 6.31 11.58 12.45 10.88 33.88 17.48 4.48 1.57 /STATISTICS all /MISSING ANALYSIS. The results are the following. Chi-Square Test myvar Category Observed N Expected N Residual 1 1 -1800 2 .4 1.6 2 2 1800-3600 11 .8 10.2 3 3 3600-6000 9 .4 8.6 4 4 6000-12000 12 11.3 .7 5 5 12000-18000 55 61.4 -6.4 6 6 18000-24000 117 112.6 4.4 7 7 24000-30000 187 121.1 65.9 8 8 30000-36000 138 105.8 32.2 9 9 36000-60000 254 329.5 -75.5 10 10 60000-90000 130 170.0 -40.0 11 11 90000-120000 45 43.6 1.4 12 12 120000+ 12 15.3 -3.3 Total 972 Test Statistics myvar Chi-Square(a) 405.611 df 11 Asymp. Sig. .000 a 3 cells (25.0%) have expected frequencies less than 5. The minimum expected cell frequency is .4. Finally my question: Is the test based on the observed distribution is EQUAL to the expected distribution Or the observed distribution is UNEQUAL to the expected distribution In other words, I am confused about the H0 and H1 and the tested hypothesis in the npar test. Thank you in advance for your friendly clarifications Uwe Warner CEPS/INSTEAD B.P. 48 L-4501 Differdange Luxembourg Email: [hidden email] phone: +352 585855 1 fax: +352 585560 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept ********************************************************************** |
Hi Uwe
A general rule for hypothesis testing is that H0 states that there are NO differences, while the H1 opposes H0 (there are differences). Therefore, the null hypothesis for the goodness of fit test you run is: "the observed distribution is equal to the expected". Another comment (you didn't ask for it, but I'm forwarding it anyway): Check the warning message at the "Test statistics" table footnote. You have very low expected frequencies (below 5, even below 1 - that's unacceptable) and in more than 20% of the cells (again unacceptable). The chi-square statistic in unreliable under these conditions. Do you have the EXACT TESTS module installed? If you have it, ask for an exact test, if not, then you have to collapse the categories with low frequencies to an adjacent category (RECODE you variable) and rerun the test. In your case, you should collapse categories 1 to 3 into a single one (call it 1 '-6000'), this way, the lowest expected frequency will rise to 1.6 and you will have only one cell in 10 with expected frequencies below 5 (only 10%). Under these conditions, the asymptotic significance provided by the test will be reliable enough. uw> I am confused about the interpretation of the npar outcomes. uw> I know from the (tax-)register covering my target population the uw> distribution of a variable coded with twelve categories: uw> VAR LAB myvar uw> 1 '-1800' uw> 2 '1800-3600' uw> 3 '3600-6000' uw> 4 '6000-12000' uw> 5 '12000-18000' uw> 6 '18000-24000' uw> 7 '24000-30000' uw> 8 '30000-36000' uw> 9 '36000-60000' uw> 10 '60000-90000' uw> 11 '90000-120000' uw> 12 '120000+'. uw> In my sample I observe the same variable with the same twelve uw> categories. uw> I am interested to compare the two distributions, expecting in the uw> sample the same distribution as in the population. uw> Therefore I run the chi square test as goodness of fit and added the uw> expected proportion for each category. uw> NPAR TEST uw> /CHISQUARE=myvar (1,12) uw> /EXPECTED=0.04 0.08 0.04 1.16 6.31 11.58 12.45 10.88 33.88 17.48 4.48 uw> 1.57 uw> /STATISTICS all uw> /MISSING ANALYSIS. uw> The results are the following. uw> Chi-Square Test uw> myvar uw> Category Observed N Expected N Residual uw> 1 1 -1800 2 .4 1.6 uw> 2 2 1800-3600 11 .8 10.2 uw> 3 3 3600-6000 9 .4 8.6 uw> 4 4 6000-12000 12 11.3 .7 uw> 5 5 12000-18000 55 61.4 -6.4 uw> 6 6 18000-24000 117 112.6 4.4 uw> 7 7 24000-30000 187 121.1 65.9 uw> 8 8 30000-36000 138 105.8 32.2 uw> 9 9 36000-60000 254 329.5 -75.5 uw> 10 10 60000-90000 130 170.0 -40.0 uw> 11 11 90000-120000 45 43.6 1.4 uw> 12 12 120000+ 12 15.3 -3.3 uw> Total 972 uw> Test Statistics uw> myvar uw> Chi-Square(a) 405.611 uw> df 11 uw> Asymp. Sig. .000 uw> a 3 cells (25.0%) have expected frequencies less than 5. uw> The minimum expected cell frequency is .4. uw> Finally my question: Is the test based on the observed uw> distribution is EQUAL to the expected distribution Or the observed uw> distribution is UNEQUAL to the expected distribution uw> In other words, I am confused about the H0 and H1 and the tested uw> hypothesis in the npar test. -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Free forum by Nabble | Edit this page |