Npar interpretation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Npar interpretation

Uwe Warner-2
Dear list members,

I am confused about the interpretation of the npar outcomes.

I know from the (tax-)register covering my target population the
distribution of a variable coded with twelve categories:

VAR LAB myvar
        1       '-1800'
        2       '1800-3600'
        3       '3600-6000'
        4       '6000-12000'
        5       '12000-18000'
        6       '18000-24000'
        7       '24000-30000'
        8       '30000-36000'
        9       '36000-60000'
        10      '60000-90000'
        11      '90000-120000'
        12      '120000+'.

In my sample I observe the same variable with the same twelve
categories.

I am interested to compare the two distributions, expecting in the
sample the same distribution as in the population.
Therefore I run the chi square test as goodness of fit and added the
expected proportion for each category.

NPAR TEST
  /CHISQUARE=myvar  (1,12)
  /EXPECTED=0.04 0.08 0.04 1.16 6.31 11.58 12.45 10.88 33.88 17.48 4.48
1.57
  /STATISTICS  all
  /MISSING ANALYSIS.

The results are the following.

Chi-Square Test
        myvar
        Category        Observed N      Expected N      Residual
1       1  -1800                2       .4              1.6
2       2  1800-3600    11      .8              10.2
3       3  3600-6000    9       .4              8.6
4       4  6000-12000   12      11.3            .7
5       5  12000-18000  55      61.4            -6.4
6       6  18000-24000  117     112.6           4.4
7       7  24000-30000  187     121.1           65.9
8       8  30000-36000  138     105.8           32.2
9       9  36000-60000  254     329.5           -75.5
10      10  60000-90000 130     170.0           -40.0
11      11  90000-120000        45      43.6            1.4
12      12  120000+             12      15.3            -3.3
Total                           972


Test Statistics
        myvar
        Chi-Square(a)   405.611
        df                      11
        Asymp. Sig.             .000
        a       3 cells (25.0%) have expected frequencies less than 5.
The minimum expected cell frequency is .4.


Finally my question: Is the test based on
        the observed distribution is EQUAL to the expected distribution
Or
        the observed distribution is UNEQUAL to the expected
distribution

In other words, I am confused about the H0 and H1 and the tested
hypothesis in the npar test.

Thank you in advance for your friendly clarifications

Uwe Warner
CEPS/INSTEAD
B.P. 48

L-4501 Differdange
Luxembourg

Email: [hidden email]
phone: +352 585855 1
fax: +352 585560

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Npar interpretation

Marta García-Granero
Hi Uwe

A general rule for hypothesis testing is that H0 states that there are
NO differences, while the H1 opposes H0 (there are differences).
Therefore, the null hypothesis for the goodness of fit test you run
is: "the observed distribution is equal to the expected".

Another comment (you didn't ask for it, but I'm forwarding it anyway):
Check the warning message at the "Test statistics" table footnote. You
have very low expected frequencies (below 5, even below 1 - that's
unacceptable) and in more than 20% of the cells (again unacceptable).
The chi-square statistic in unreliable under these conditions. Do you
have the EXACT TESTS module installed? If you have it, ask for an
exact test, if not, then you have to collapse the categories with low
frequencies to an adjacent category (RECODE you variable) and rerun
the test. In your case, you should collapse categories 1 to 3 into a
single one (call it 1 '-6000'), this way, the lowest expected
frequency will rise to 1.6 and you will have only one cell in 10 with
expected frequencies below 5 (only 10%). Under these conditions, the
asymptotic significance provided by the test will be reliable enough.

uw> I am confused about the interpretation of the npar outcomes.

uw> I know from the (tax-)register covering my target population the
uw> distribution of a variable coded with twelve categories:

uw> VAR LAB myvar
uw>         1       '-1800'
uw>         2       '1800-3600'
uw>         3       '3600-6000'
uw>         4       '6000-12000'
uw>         5       '12000-18000'
uw>         6       '18000-24000'
uw>         7       '24000-30000'
uw>         8       '30000-36000'
uw>         9       '36000-60000'
uw>         10      '60000-90000'
uw>         11      '90000-120000'
uw>         12      '120000+'.

uw> In my sample I observe the same variable with the same twelve
uw> categories.

uw> I am interested to compare the two distributions, expecting in the
uw> sample the same distribution as in the population.
uw> Therefore I run the chi square test as goodness of fit and added the
uw> expected proportion for each category.

uw> NPAR TEST
uw>   /CHISQUARE=myvar  (1,12)
uw>   /EXPECTED=0.04 0.08 0.04 1.16 6.31 11.58 12.45 10.88 33.88 17.48 4.48
uw> 1.57
uw>   /STATISTICS  all
uw>   /MISSING ANALYSIS.

uw> The results are the following.

uw> Chi-Square Test
uw>         myvar
uw>         Category    Observed N   Expected N    Residual
uw> 1       1  -1800            2       .4           1.6
uw> 2       2  1800-3600       11       .8          10.2
uw> 3       3  3600-6000        9       .4           8.6
uw> 4       4  6000-12000      12     11.3            .7
uw> 5       5  12000-18000     55     61.4          -6.4
uw> 6       6  18000-24000    117    112.6           4.4
uw> 7       7  24000-30000    187    121.1          65.9
uw> 8       8  30000-36000    138    105.8          32.2
uw> 9       9  36000-60000    254    329.5         -75.5
uw> 10      10  60000-90000   130    170.0         -40.0
uw> 11      11  90000-120000   45     43.6           1.4
uw> 12      12  120000+        12     15.3          -3.3
uw> Total                     972

uw> Test Statistics
uw>         myvar
uw>         Chi-Square(a)   405.611
uw>         df                   11
uw>         Asymp. Sig.        .000
uw>         a       3 cells (25.0%) have expected frequencies less than 5.
uw> The minimum expected cell frequency is .4.

uw> Finally my question: Is the test based on the observed
uw> distribution is EQUAL to the expected distribution Or the observed
uw> distribution is UNEQUAL to the expected distribution

uw> In other words, I am confused about the H0 and H1 and the tested
uw> hypothesis in the npar test.

--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)