SPSSX Discussion

Analysis with Categorical Independent and Dependent Variables

Classic

List

Threaded

11 messages Options

bdates

Analysis with Categorical Independent and Dependent Variables

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

Maguin, Eugene

Re: Analysis with Categorical Independent and Dependent Variables

Two commands: Nomreg and Plum or GenLin. Spss has a tutorial on multinomial regression in tutorials->case studies -> regression options. A book: J. Scott Long, Regression models for categorical and limited dependent variables.

Your Former Student (FS) has a three category DV. After FS has been through crosstabs, start with Nomreg/Genlin, which computes a regression for each category against the reference category. The output will show an intercept and predictor slopes for each category. If the predictor slopes are all the ‘same’ across categories, FS can switch to Plum (where the same assumption can be explicitly tested, unlike GenLin where it can’t). The predictors have lots of categories and when looking for interactions the categories kind of explode. But, why not treat the IVs as just IVs rather than as factorial IVs? Therefore, compute the IV contrast terms (and contrast term interactions and enter them in the regressions (use the WITH keyword, not the BY keyword). Harder to interpret? Yes. But, FS won’t be faced with messages about cells with few or no cases.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Dates, Brian
Sent: Wednesday, November 14, 2012 10:34 AM
To: [hidden email]
Subject: Analysis with Categorical Independent and Dependent Variables

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

bdates

Re: Analysis with Categorical Independent and Dependent Variables

Thanks, Gene.

Brian

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Wednesday, November 14, 2012 11:20 AM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

Gene Maguin

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

John F Hall

Re: Analysis with Categorical Independent and Dependent Variables

In reply to this post by Maguin, Eugene

Brian

Yes, definitely do the crosstabs first, even if you have to dichotomise some of the variables. I’ve been doing something similar using CTABLES (much easier for a lay audience to understand).

John F Hall (Mr)

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Jon Peck sent me some complex syntax to produce a format a bit like this: Raynald Levesque sent me something years ago which produced exactly what I wanted.

Sexism by Sex, controlling for Race

Sexism Percent (n)	All	White	Other
All	10.8 (86)	11.0 (52)	10.6 (34)
Boys	12.7 (42)	13.4 (22)	11.9 (20)
Girls	9.0 (44)	9.2 (30)	8.6 (14)

The main point about such tables is that the zero-order sample statistic appears top left, first order statistics in the 1^st column or row, and second order statistics in the 2^ndand 3rd columns or rows: right-to-left language users may prefer the table to be flipped horizontally.

These are the tables I started with, but a format like the above is much easier to read and interpret. This output was a while back, but I’ll look for the syntax and see if it can be modified for your data.

1st order tables

		Grouped happy
		Low 0-4	Medium5 -8	High 9-10	Total
		Row N %	Row N %	Row N %	Count
Sex	Male	10.4%	58.4%	31.2%	71561
	Female	11.2%	54.9%	33.9%	91658
	Total	10.9%	56.4%	32.7%	163219

		Grouped happy
		Low 0-4	Medium5 -8	High 9-10	Total
		Row N %	Row N %	Row N %	Count
Age in two groups	Under 50	11.6%	59.6%	28.7%	74710
	50 and over	10.2%	53.8%	36.0%	88509
	Total	10.9%	56.4%	32.7%	163219

		Grouped happy
		Low 0-4	Medium5 -8	High 9-10	Total
		Row N %	Row N %	Row N %	Count
Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	8.4%	56.0%	35.6%	99916
	Non married	14.7%	57.1%	28.2%	63303
	Total	10.9%	56.4%	32.7%	163219

2nd order tables

				Grouped happy
				Low 0-4	Medium5 -8	High 9-10	Total
				Row N %	Row N %	Row N %	Count
Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Sex	Male	8.2%	57.8%	34.0%	46911
			Female	8.6%	54.4%	37.0%	53005
			Total	8.4%	56.0%	35.6%	99916
	Non married	Sex	Male	14.6%	59.6%	25.8%	24650
			Female	14.8%	55.5%	29.7%	38653
			Total	14.7%	57.1%	28.2%	63303
	Total	Sex	Male	10.4%	58.4%	31.2%	71561
			Female	11.2%	54.9%	33.9%	91658
			Total	10.9%	56.4%	32.7%	163219

				Grouped happy
				Low 0-4	Medium5 -8	High 9-10	Total
				Row N %	Row N %	Row N %	Count
Age in two groups	Under 50	Sex	Male	11.5%	62.1%	26.4%	30657
			Female	11.7%	57.9%	30.4%	44053
			Total	11.6%	59.6%	28.7%	74710
	50 and over	Sex	Male	9.6%	55.7%	34.7%	40904
			Female	10.7%	52.1%	37.2%	47605
			Total	10.2%	53.8%	36.0%	88509
	Total	Sex	Male	10.4%	58.4%	31.2%	71561
			Female	11.2%	54.9%	33.9%	91658
			Total	10.9%	56.4%	32.7%	163219

				Grouped happy
				Low 0-4	Medium5 -8	High 9-10	Total
				Row N %	Row N %	Row N %	Count
Age in two groups	Under 50	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	9.1%	59.8%	31.1%	45282
			Non married	15.5%	59.4%	25.1%	29428
			Total	11.6%	59.6%	28.7%	74710
	50 and over	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	7.8%	52.9%	39.3%	54634
			Non married	14.0%	55.1%	30.9%	33875
			Total	10.2%	53.8%	36.0%	88509
	Total	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	8.4%	56.0%	35.6%	99916
			Non married	14.7%	57.1%	28.2%	63303
			Total	10.9%	56.4%	32.7%	163219

3rd order table

						Grouped happy
						Low 0-4	Medium5 -8	High 9-10	Total
						Row N %	Row N %	Row N %	Count
Sex	Male	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Age in two groups	Under 50	9.4%	62.3%	28.3%	18648
					50 and over	7.4%	54.9%	37.7%	28263
					Total	8.2%	57.8%	34.0%	46911
			Non married	Age in two groups	Under 50	14.7%	61.8%	23.5%	12009
					50 and over	14.5%	57.6%	28.0%	12641
					Total	14.6%	59.6%	25.8%	24650
			Total	Age in two groups	Under 50	11.5%	62.1%	26.4%	30657
					50 and over	9.6%	55.7%	34.7%	40904
					Total	10.4%	58.4%	31.2%	71561
	Female	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Age in two groups	Under 50	8.9%	58.0%	33.1%	26634
					50 and over	8.3%	50.8%	40.9%	26371
					Total	8.6%	54.4%	37.0%	53005
			Non married	Age in two groups	Under 50	16.1%	57.7%	26.2%	17419
					50 and over	13.7%	53.7%	32.6%	21234
					Total	14.8%	55.5%	29.7%	38653
			Total	Age in two groups	Under 50	11.7%	57.9%	30.4%	44053
					50 and over	10.7%	52.1%	37.2%	47605
					Total	11.2%	54.9%	33.9%	91658
	Total	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Age in two groups	Under 50	9.1%	59.8%	31.1%	45282
					50 and over	7.8%	52.9%	39.3%	54634
					Total	8.4%	56.0%	35.6%	99916
			Non married	Age in two groups	Under 50	15.5%	59.4%	25.1%	29428
					50 and over	14.0%	55.1%	30.9%	33875
					Total	14.7%	57.1%	28.2%	63303
			Total	Age in two groups	Under 50	11.6%	59.6%	28.7%	74710
					50 and over	10.2%	53.8%	36.0%	88509
					Total	10.9%	56.4%	32.7%	163219

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: 14 November 2012 17:20
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Dates, Brian
Sent: Wednesday, November 14, 2012 10:34 AM
To: [hidden email]
Subject: Analysis with Categorical Independent and Dependent Variables

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

bdates

Re: Analysis with Categorical Independent and Dependent Variables

John,

This is great! Thanks.

Brian

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John F Hall
Sent: Wednesday, November 14, 2012 1:15 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

Brian

Yes, definitely do the crosstabs first, even if you have to dichotomise some of the variables. I’ve been doing something similar using CTABLES (much easier for a lay audience to understand).

John F Hall (Mr)

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Jon Peck sent me some complex syntax to produce a format a bit like this: Raynald Levesque sent me something years ago which produced exactly what I wanted.

Sexism by Sex, controlling for Race

Sexism Percent (n)	All	White	Other
All	10.8 (86)	11.0 (52)	10.6 (34)
Boys	12.7 (42)	13.4 (22)	11.9 (20)
Girls	9.0 (44)	9.2 (30)	8.6 (14)

1st order tables

		Grouped happy
		Low 0-4	Medium5 -8	High 9-10	Total
		Row N %	Row N %	Row N %	Count
Sex	Male	10.4%	58.4%	31.2%	71561
	Female	11.2%	54.9%	33.9%	91658
	Total	10.9%	56.4%	32.7%	163219

		Grouped happy
		Low 0-4	Medium5 -8	High 9-10	Total
		Row N %	Row N %	Row N %	Count
Age in two groups	Under 50	11.6%	59.6%	28.7%	74710
	50 and over	10.2%	53.8%	36.0%	88509
	Total	10.9%	56.4%	32.7%	163219

		Grouped happy
		Low 0-4	Medium5 -8	High 9-10	Total
		Row N %	Row N %	Row N %	Count
Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	8.4%	56.0%	35.6%	99916
	Non married	14.7%	57.1%	28.2%	63303
	Total	10.9%	56.4%	32.7%	163219

2nd order tables

				Grouped happy
				Low 0-4	Medium5 -8	High 9-10	Total
				Row N %	Row N %	Row N %	Count
Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Sex	Male	8.2%	57.8%	34.0%	46911
			Female	8.6%	54.4%	37.0%	53005
			Total	8.4%	56.0%	35.6%	99916
	Non married	Sex	Male	14.6%	59.6%	25.8%	24650
			Female	14.8%	55.5%	29.7%	38653
			Total	14.7%	57.1%	28.2%	63303
	Total	Sex	Male	10.4%	58.4%	31.2%	71561
			Female	11.2%	54.9%	33.9%	91658
			Total	10.9%	56.4%	32.7%	163219

				Grouped happy
				Low 0-4	Medium5 -8	High 9-10	Total
				Row N %	Row N %	Row N %	Count
Age in two groups	Under 50	Sex	Male	11.5%	62.1%	26.4%	30657
			Female	11.7%	57.9%	30.4%	44053
			Total	11.6%	59.6%	28.7%	74710
	50 and over	Sex	Male	9.6%	55.7%	34.7%	40904
			Female	10.7%	52.1%	37.2%	47605
			Total	10.2%	53.8%	36.0%	88509
	Total	Sex	Male	10.4%	58.4%	31.2%	71561
			Female	11.2%	54.9%	33.9%	91658
			Total	10.9%	56.4%	32.7%	163219

				Grouped happy
				Low 0-4	Medium5 -8	High 9-10	Total
				Row N %	Row N %	Row N %	Count
Age in two groups	Under 50	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	9.1%	59.8%	31.1%	45282
			Non married	15.5%	59.4%	25.1%	29428
			Total	11.6%	59.6%	28.7%	74710
	50 and over	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	7.8%	52.9%	39.3%	54634
			Non married	14.0%	55.1%	30.9%	33875
			Total	10.2%	53.8%	36.0%	88509
	Total	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	8.4%	56.0%	35.6%	99916
			Non married	14.7%	57.1%	28.2%	63303
			Total	10.9%	56.4%	32.7%	163219

3rd order table

						Grouped happy
						Low 0-4	Medium5 -8	High 9-10	Total
						Row N %	Row N %	Row N %	Count
Sex	Male	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Age in two groups	Under 50	9.4%	62.3%	28.3%	18648
					50 and over	7.4%	54.9%	37.7%	28263
					Total	8.2%	57.8%	34.0%	46911
			Non married	Age in two groups	Under 50	14.7%	61.8%	23.5%	12009
					50 and over	14.5%	57.6%	28.0%	12641
					Total	14.6%	59.6%	25.8%	24650
			Total	Age in two groups	Under 50	11.5%	62.1%	26.4%	30657
					50 and over	9.6%	55.7%	34.7%	40904
					Total	10.4%	58.4%	31.2%	71561
	Female	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Age in two groups	Under 50	8.9%	58.0%	33.1%	26634
					50 and over	8.3%	50.8%	40.9%	26371
					Total	8.6%	54.4%	37.0%	53005
			Non married	Age in two groups	Under 50	16.1%	57.7%	26.2%	17419
					50 and over	13.7%	53.7%	32.6%	21234
					Total	14.8%	55.5%	29.7%	38653
			Total	Age in two groups	Under 50	11.7%	57.9%	30.4%	44053
					50 and over	10.7%	52.1%	37.2%	47605
					Total	11.2%	54.9%	33.9%	91658
	Total	Married/co-habiting/Civil Partners	Married/Cohabiting/Civil Partner	Age in two groups	Under 50	9.1%	59.8%	31.1%	45282
					50 and over	7.8%	52.9%	39.3%	54634
					Total	8.4%	56.0%	35.6%	99916
			Non married	Age in two groups	Under 50	15.5%	59.4%	25.1%	29428
					50 and over	14.0%	55.1%	30.9%	33875
					Total	14.7%	57.1%	28.2%	63303
			Total	Age in two groups	Under 50	11.6%	59.6%	28.7%	74710
					50 and over	10.2%	53.8%	36.0%	88509
					Total	10.9%	56.4%	32.7%	163219

Gene Maguin

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

Art Kendall

Re: Analysis with Categorical Independent and Dependent Variables

In reply to this post by bdates

See the CATREG procedure, you can mix nominal, ordinal, and interval variables. Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.

It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells. you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables

Art Kendall
Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Maguin, Eugene

Re: Analysis with Categorical Independent and Dependent Variables

Art,

If you’ve used the CatReg proc, what is your experience with the concordance of coefficients and standard errors between using CatReg and GenLin, NomReg, Plum, or Logistic to analyze the same datasets? It seems as if there are different assumptions in the underlying model between CatReg and the other procedures and, certainly in the estimation method. Given a categorical DV, when would you choose CatReg over the other categorical methods?

Thanks, Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Wednesday, November 14, 2012 2:44 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

Art Kendall

Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates
Director of Evaluation and Research
Southwest Counseling Solutions
1700 Waterman
Detroit, MI 48209
313-841-7442
[hidden email]

Leading the Way in Building a Healthy Community

bdates

Re: Analysis with Categorical Independent and Dependent Variables

In reply to this post by Art Kendall

Thanks, Art.

From: Art Kendall [mailto:[hidden email]]
Sent: Wednesday, November 14, 2012 2:44 PM
To: Dates, Brian
Cc: [hidden email]
Subject: Re: [SPSSX-L] Analysis with Categorical Independent and Dependent Variables

Art Kendall

Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

Art Kendall

Re: Analysis with Categorical Independent and Dependent Variables

In reply to this post by Maguin, Eugene

I do not have a great deal of experience with these procedures.
If I understand correctly, plum and logistic are special cases for GenLin and or NomReg.

CATREG has built-in ways to test the degree to which level of measurement makes much of a difference in fit. Depending how small N's are, treating education and age as not too discrepant from interval level of measurement means that fewer cases are needed. (I.e., 1 predictor per variable vs # of categories minus 1 per variable).

If this were a problem I were working on I would use it as an opportunity to compare the detailed results and the substantive conclusions from CATREG, Genlin, and NomReg. If I had some time I would see how correspondence analysis

it would be interesting to hear from the Leiden people who designed the CATEGORIES module, or from people who have experience with a pair of more of these procedures, about comparing CATREG correspondence analysis, GenLin and NomREG

Without a more detailed understanding of the context, the present problem looks like a discriminant function type of question except that some of the predictors are mixed levels of measurement nominal (religion), ordinal or interval (education,age) and interval (gender).
In other words, the question might be "what distinguishes people who voted Obama/Romney/other.

It makes intuitive sense that just as interaction terms can be used in regression, there is most likely a way to use interaction terms in CATREG.

From practical point of view 1) I would be surprised if the there were enough "other" votes to make very fine distinctions. 2) It would be an unusual context where there were cases in each value of religion to do much. Of course a lot depends on how large the total pop is in the legislative district

I would be very hesitant about generalizing from a district with such an unusual representation of religious subgroups.

Art Kendall
Social Research Consultants

On 11/14/2012 2:55 PM, Maguin, Eugene wrote:

Art,

If you’ve used the CatReg proc, what is your experience with the concordance of coefficients and standard errors between using CatReg and GenLin, NomReg, Plum, or Logistic to analyze the same datasets? It seems as if there are different assumptions in the underlying model between CatReg and the other procedures and, certainly in the estimation method. Given a categorical DV, when would you choose CatReg over the other categorical methods?

Thanks, Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Art Kendall
Sent: Wednesday, November 14, 2012 2:44 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables
See the CATREG procedure, you can mix nominal, ordinal, and interval variables. Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.

It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells. you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables
Art Kendall
Social Research Consultants
On 11/14/2012 10:33 AM, Dates, Brian wrote:
I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Christine Eastman

Re: Analysis with Categorical Independent and Dependent Variables

Out of curiosity, would a loglinear analysis be appropriate for these data? (I think it's GENLOG / HILOG in SPSS)

On 15 November 2012 07:44, Art Kendall <[hidden email]> wrote:

I do not have a great deal of experience with these procedures.
If I understand correctly, plum and logistic are special cases for GenLin and or NomReg.

CATREG has built-in ways to test the degree to which level of measurement makes much of a difference in fit. Depending how small N's are, treating education and age as not too discrepant from interval level of measurement means that fewer cases are needed. (I.e., 1 predictor per variable vs # of categories minus 1 per variable).

If this were a problem I were working on I would use it as an opportunity to compare the detailed results and the substantive conclusions from CATREG, Genlin, and NomReg. If I had some time I would see how correspondence analysis

it would be interesting to hear from the Leiden people who designed the CATEGORIES module, or from people who have experience with a pair of more of these procedures, about comparing CATREG correspondence analysis, GenLin and NomREG

Without a more detailed understanding of the context, the present problem looks like a discriminant function type of question except that some of the predictors are mixed levels of measurement nominal (religion), ordinal or interval (education,age) and interval (gender).
In other words, the question might be "what distinguishes people who voted Obama/Romney/other.

It makes intuitive sense that just as interaction terms can be used in regression, there is most likely a way to use interaction terms in CATREG.

From practical point of view 1) I would be surprised if the there were enough "other" votes to make very fine distinctions. 2) It would be an unusual context where there were cases in each value of religion to do much. Of course a lot depends on how large the total pop is in the legislative district

I would be very hesitant about generalizing from a district with such an unusual representation of religious subgroups.
Art Kendall
Social Research Consultants
On 11/14/2012 2:55 PM, Maguin, Eugene wrote:
Art,

If you’ve used the CatReg proc, what is your experience with the concordance of coefficients and standard errors between using CatReg and GenLin, NomReg, Plum, or Logistic to analyze the same datasets? It seems as if there are different assumptions in the underlying model between CatReg and the other procedures and, certainly in the estimation method. Given a categorical DV, when would you choose CatReg over the other categorical methods?

Thanks, Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Art Kendall
Sent: Wednesday, November 14, 2012 2:44 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables
See the CATREG procedure, you can mix nominal, ordinal, and interval variables. Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.

It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells. you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables
Art Kendall
Social Research Consultants
On 11/14/2012 10:33 AM, Dates, Brian wrote:
I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

<a href="tel:313-841-7442" value="+13138417442" target="_blank">313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Ryan

Re: Analysis with Categorical Independent and Dependent Variables

In reply to this post by bdates

A generalized logit model would be an appropriate place to start. Incorporate interaction terms judiciously. Be aware of cell sample sizes, as others have pointed out.

Ryan

On Nov 14, 2012, at 10:33 AM, "Dates, Brian" <[hidden email]> wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical. For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding. But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome. Thanks.

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI 48209

313-841-7442

[hidden email]

Leading the Way in Building a Healthy Community