SPSSX Discussion

ordinal regression - no. cases/independent variables

Classic

List

Threaded

3 messages Options

SoS Statistical Services

ordinal regression - no. cases/independent variables

I am doing ordinal regression using SPSS and get the following warning:

There are 1467 (71.5%) cells (i.e., dependent variable levels by combinations of predictor variable values) with zero frequencies.

I have 750 cases, 6 categorical independent variables (from 3-5 groups with 100+ per group, often 200+) and my dependent variable has 4 categories with frequencies of 328, 62, 141, 219 respectively.

Does anyone know of a rule-of-thumb or recommendation regarding no. cases, no. independent variables etc for ordinal regression.

Thank you for your help.

Evie

SoS Statistical Services

Re: ordinal regression - no. cases/independent variables

Thanks, Art. Below is some additional info about my variables - the dependent variables is INHALER2 with the 6 independent (ordinal) variables shown below. The original INHALER variable is coded 0 to 9 with the intention of using multiple regression but I decided to categorise it into 4 groups because it was heavily skewed and hence use ordinal regressin instead.

Case Processing Summary

		N	Marginal Percentage
INHALER2	0	328	43.7%
	1-3	62	8.3%
	4-6	141	18.8%
	7-9	219	29.2%
BREATH2	1.00	220	29.3%
	2.00	124	16.5%
	3.00	140	18.7%
	4.00	109	14.5%
	5.00	157	20.9%
COUGH2	1.00	286	38.1%
	2.00	229	30.5%
	3.00	235	31.3%
ACTIV2	1.00	233	31.1%
	2.00	168	22.4%
	3.00	165	22.0%
	4.00	184	24.5%
SPUCOL2	1.00	30	4.0%
	2.00	218	29.1%
	3.00	156	20.8%
	4.00	171	22.8%
	5.00	175	23.3%
SPUVOL2	1.00	227	30.3%
	2.00	207	27.6%
	3.00	136	18.1%
	4.00	180	24.0%
WHEZ2	1.00	382	50.9%
WHEZ2	2.00	368	49.1%
Valid		750	100.0%
Missing		0
Total		750

Evie Gardner

--- On Tue, 3/3/09, Art Kendall <[hidden email]> wrote:

From: Art Kendall <[hidden email]>
Subject: Re: ordinal regression - no. cases/independent variables
To: [hidden email]
Date: Tuesday, 3 March, 2009, 1:52 PM

if you had 6 categorical variables each with exactly 3 values, that would mean that you have a minimum of 3^6= 729 cases for each value of the DV. Even if you treated the DV as not very discrepant from interval you still have at least 729 cells on the independent side.

Are you sure that none of your variables are somewhere near interval level?

Please describe your study in more detail. Perhaps if you posted the data definition variable names, variable labels, values and value labels, and what level of measurement assumptions you are willing to make, members of the list could offer suggestions.

Think of the situation as a 7 way crosstab.

Art Kendall
Social Research Consultants

SoS Statistical Services wrote:

I am doing ordinal regression using SPSS and get the following warning:

There are 1467 (71.5%) cells (i.e., dependent variable levels by combinations of predictor variable values) with zero frequencies.

I have 750 cases, 6 categorical independent variables (from 3-5 groups with 100+ per group, often 200+) and my dependent variable has 4 categories with frequencies of 328, 62, 141, 219 respectively.

Does anyone know of a rule-of-thumb or recommendation regarding no. cases, no. independent variables etc for ordinal regression.

Thank you for your help.

Evie

Art Kendall

Re: ordinal regression - no. cases/independent variables

It is not very important what the distribution of the raw scores on INHALER2 is. Regression assumes that the residuals are not severely discrepant from interval.

If you treat the IVs and DV as nominal you have a 7 way crosstab as the basis of your analysis. Necessarily most cells are empty.
If you treat the IVs as nominal and the DV as continuous, you have a 6 way crosstab as your IV side with 1800 cells (a 6 way ANOVA). The ANOVA approach necessarily has empty cells so higher order interactions need to be pooled.

If the values of all of the IVs are at least ordinal,I would suggest starting with the most straightforward analysis.i.e., treat them as not very discrepant from interval.
do ordinary regression first. Examine the residuals see if they are very far from normal looking.

Then use CATREG and see how much difference it makes in the substantive conclusions if you relax the level of measurement.

If level of measurement assumptions seem to be important, try an ANOVA via GLM. There should be info in the archives on ANOVA with empty design cells.

Art Kendall
Social Research Consultants

SoS Statistical Services wrote:

Thanks, Art. Below is some additional info about my variables - the dependent variables is INHALER2 with the 6 independent (ordinal) variables shown below. The original INHALER variable is coded 0 to 9 with the intention of using multiple regression but I decided to categorise it into 4 groups because it was heavily skewed and hence use ordinal regressin instead.

Case Processing Summary

N

Marginal Percentage

INHALER2

0

328

43.7%

1-3

62

8.3%

4-6

141

18.8%

7-9

219

29.2%

BREATH2

1.00

220

29.3%

2.00

124

16.5%

3.00

140

18.7%

4.00

109

14.5%

5.00

157

20.9%

COUGH2

1.00

286

38.1%

2.00

229

30.5%

3.00

235

31.3%

ACTIV2

1.00

233

31.1%

2.00

168

22.4%

3.00

165

22.0%

4.00

184

24.5%

SPUCOL2

1.00

30

4.0%

2.00

218

29.1%

3.00

156

20.8%

4.00

171

22.8%

5.00

175

23.3%

SPUVOL2

1.00

227

30.3%

2.00

207

27.6%

3.00

136

18.1%

4.00

180

24.0%

WHEZ2

1.00

382

50.9%

2.00

368

49.1%

Valid

750

100.0%

Missing

0

Total

750

Evie Gardner

--- On Tue, 3/3/09, Art Kendall [hidden email] wrote:

From: Art Kendall [hidden email]
Subject: Re: ordinal regression - no. cases/independent variables
To: [hidden email]
Date: Tuesday, 3 March, 2009, 1:52 PM

if you had 6 categorical variables each with exactly 3 values, that would mean that you have a minimum of 3^6= 729 cases for each value of the DV. Even if you treated the DV as not very discrepant from interval you still have at least 729 cells on the independent side.

Are you sure that none of your variables are somewhere near interval level?

Please describe your study in more detail. Perhaps if you posted the data definition variable names, variable labels, values and value labels, and what level of measurement assumptions you are willing to make, members of the list could offer suggestions.

Think of the situation as a 7 way crosstab.

Art Kendall
Social Research Consultants

SoS Statistical Services wrote:

I am doing ordinal regression using SPSS and get the following warning:

There are 1467 (71.5%) cells (i.e., dependent variable levels by combinations of predictor variable values) with zero frequencies.

I have 750 cases, 6 categorical independent variables (from 3-5 groups with 100+ per group, often 200+) and my dependent variable has 4 categories with frequencies of 328, 62, 141, 219 respectively.

Does anyone know of a rule-of-thumb or recommendation regarding no. cases, no. independent variables etc for ordinal regression.

Thank you for your help.

Evie

Art Kendall
Social Research Consultants