SPSSX Discussion

Npar interpretation

Classic

List

Threaded

2 messages Options

Uwe Warner-2

Npar interpretation

Dear list members,

I am confused about the interpretation of the npar outcomes.

I know from the (tax-)register covering my target population the
distribution of a variable coded with twelve categories:

VAR LAB myvar
1 '-1800'
2 '1800-3600'
3 '3600-6000'
4 '6000-12000'
5 '12000-18000'
6 '18000-24000'
7 '24000-30000'
8 '30000-36000'
9 '36000-60000'
10 '60000-90000'
11 '90000-120000'
12 '120000+'.

In my sample I observe the same variable with the same twelve
categories.

I am interested to compare the two distributions, expecting in the
sample the same distribution as in the population.
Therefore I run the chi square test as goodness of fit and added the
expected proportion for each category.

NPAR TEST
/CHISQUARE=myvar (1,12)
/EXPECTED=0.04 0.08 0.04 1.16 6.31 11.58 12.45 10.88 33.88 17.48 4.48
1.57
/STATISTICS all
/MISSING ANALYSIS.

The results are the following.

Chi-Square Test
myvar
Category Observed N Expected N Residual
1 1 -1800 2 .4 1.6
2 2 1800-3600 11 .8 10.2
3 3 3600-6000 9 .4 8.6
4 4 6000-12000 12 11.3 .7
5 5 12000-18000 55 61.4 -6.4
6 6 18000-24000 117 112.6 4.4
7 7 24000-30000 187 121.1 65.9
8 8 30000-36000 138 105.8 32.2
9 9 36000-60000 254 329.5 -75.5
10 10 60000-90000 130 170.0 -40.0
11 11 90000-120000 45 43.6 1.4
12 12 120000+ 12 15.3 -3.3
Total 972

Test Statistics
myvar
Chi-Square(a) 405.611
df 11
Asymp. Sig. .000
a 3 cells (25.0%) have expected frequencies less than 5.
The minimum expected cell frequency is .4.

Finally my question: Is the test based on
the observed distribution is EQUAL to the expected distribution
Or
the observed distribution is UNEQUAL to the expected
distribution

In other words, I am confused about the H0 and H1 and the tested
hypothesis in the npar test.

Thank you in advance for your friendly clarifications

Uwe Warner
CEPS/INSTEAD
B.P. 48

L-4501 Differdange
Luxembourg

Email: [hidden email]
phone: +352 585855 1
fax: +352 585560

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept
**********************************************************************

Marta García-Granero

Re: Npar interpretation

Hi Uwe

A general rule for hypothesis testing is that H0 states that there are
NO differences, while the H1 opposes H0 (there are differences).
Therefore, the null hypothesis for the goodness of fit test you run
is: "the observed distribution is equal to the expected".

Another comment (you didn't ask for it, but I'm forwarding it anyway):
Check the warning message at the "Test statistics" table footnote. You
have very low expected frequencies (below 5, even below 1 - that's
unacceptable) and in more than 20% of the cells (again unacceptable).
The chi-square statistic in unreliable under these conditions. Do you
have the EXACT TESTS module installed? If you have it, ask for an
exact test, if not, then you have to collapse the categories with low
frequencies to an adjacent category (RECODE you variable) and rerun
the test. In your case, you should collapse categories 1 to 3 into a
single one (call it 1 '-6000'), this way, the lowest expected
frequency will rise to 1.6 and you will have only one cell in 10 with
expected frequencies below 5 (only 10%). Under these conditions, the
asymptotic significance provided by the test will be reliable enough.

uw> I am confused about the interpretation of the npar outcomes.

uw> I know from the (tax-)register covering my target population the
uw> distribution of a variable coded with twelve categories:

uw> VAR LAB myvar
uw> 1 '-1800'
uw> 2 '1800-3600'
uw> 3 '3600-6000'
uw> 4 '6000-12000'
uw> 5 '12000-18000'
uw> 6 '18000-24000'
uw> 7 '24000-30000'
uw> 8 '30000-36000'
uw> 9 '36000-60000'
uw> 10 '60000-90000'
uw> 11 '90000-120000'
uw> 12 '120000+'.

uw> In my sample I observe the same variable with the same twelve
uw> categories.

uw> I am interested to compare the two distributions, expecting in the
uw> sample the same distribution as in the population.
uw> Therefore I run the chi square test as goodness of fit and added the
uw> expected proportion for each category.

uw> NPAR TEST
uw> /CHISQUARE=myvar (1,12)
uw> /EXPECTED=0.04 0.08 0.04 1.16 6.31 11.58 12.45 10.88 33.88 17.48 4.48
uw> 1.57
uw> /STATISTICS all
uw> /MISSING ANALYSIS.

uw> The results are the following.

uw> Chi-Square Test
uw> myvar
uw> Category Observed N Expected N Residual
uw> 1 1 -1800 2 .4 1.6
uw> 2 2 1800-3600 11 .8 10.2
uw> 3 3 3600-6000 9 .4 8.6
uw> 4 4 6000-12000 12 11.3 .7
uw> 5 5 12000-18000 55 61.4 -6.4
uw> 6 6 18000-24000 117 112.6 4.4
uw> 7 7 24000-30000 187 121.1 65.9
uw> 8 8 30000-36000 138 105.8 32.2
uw> 9 9 36000-60000 254 329.5 -75.5
uw> 10 10 60000-90000 130 170.0 -40.0
uw> 11 11 90000-120000 45 43.6 1.4
uw> 12 12 120000+ 12 15.3 -3.3
uw> Total 972

uw> Test Statistics
uw> myvar
uw> Chi-Square(a) 405.611
uw> df 11
uw> Asymp. Sig. .000
uw> a 3 cells (25.0%) have expected frequencies less than 5.
uw> The minimum expected cell frequency is .4.

uw> Finally my question: Is the test based on the observed
uw> distribution is EQUAL to the expected distribution Or the observed
uw> distribution is UNEQUAL to the expected distribution

uw> In other words, I am confused about the H0 and H1 and the tested
uw> hypothesis in the npar test.

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)