Analysis with Categorical Independent and Dependent Variables

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Analysis with Categorical Independent and Dependent Variables

bdates

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

Maguin, Eugene

Two commands: Nomreg and Plum or GenLin. Spss has a tutorial on multinomial regression in tutorials->case studies -> regression options. A book: J. Scott Long, Regression models for categorical and limited dependent variables.

Your Former Student (FS) has a three category DV. After FS has been through crosstabs, start with Nomreg/Genlin, which computes a regression for each category against the reference category. The output will show an intercept and predictor slopes for each category. If the predictor slopes are all the ‘same’ across categories, FS can switch to Plum (where the same assumption can be explicitly tested, unlike GenLin where it can’t). The predictors have lots of categories and when looking for interactions the categories kind of explode. But, why not treat the IVs as just IVs rather than as factorial IVs? Therefore, compute the IV contrast terms (and contrast term interactions and enter them in the regressions (use the WITH keyword, not the BY keyword). Harder to interpret? Yes. But, FS won’t be faced with messages about cells with few or no cases.

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Dates, Brian
Sent: Wednesday, November 14, 2012 10:34 AM
To: [hidden email]
Subject: Analysis with Categorical Independent and Dependent Variables

 

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

bdates

Thanks, Gene.

 

Brian

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Wednesday, November 14, 2012 11:20 AM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

Two commands: Nomreg and Plum or GenLin. Spss has a tutorial on multinomial regression in tutorials->case studies -> regression options. A book: J. Scott Long, Regression models for categorical and limited dependent variables.

Your Former Student (FS) has a three category DV. After FS has been through crosstabs, start with Nomreg/Genlin, which computes a regression for each category against the reference category. The output will show an intercept and predictor slopes for each category. If the predictor slopes are all the ‘same’ across categories, FS can switch to Plum (where the same assumption can be explicitly tested, unlike GenLin where it can’t). The predictors have lots of categories and when looking for interactions the categories kind of explode. But, why not treat the IVs as just IVs rather than as factorial IVs? Therefore, compute the IV contrast terms (and contrast term interactions and enter them in the regressions (use the WITH keyword, not the BY keyword). Harder to interpret? Yes. But, FS won’t be faced with messages about cells with few or no cases.

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Dates, Brian
Sent: Wednesday, November 14, 2012 10:34 AM
To: [hidden email]
Subject: Analysis with Categorical Independent and Dependent Variables

 

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

John F Hall
In reply to this post by Maguin, Eugene

Brian

 

Yes, definitely do the crosstabs first, even if you have to dichotomise some of the variables.  I’ve been doing something similar using CTABLES (much easier for a lay audience to understand). 

 

John F Hall (Mr)

 

Email:     [hidden email]

Website: www.surveyresearch.weebly.com

 

Jon Peck sent me some complex syntax to produce a format a bit like this: Raynald Levesque sent me something years ago which produced exactly what I wanted.

 

 

Sexism by Sex, controlling for Race


Sexism

  Percent 
           (n)

 

All

 

White

 

Other

All

10.8
    (86)

11.0
    (52)

10.6
    (34)

Boys

12.7
     (42)

13.4
     (22)

11.9
     (20)

Girls

  9.0
     (44)

  9.2
     (30)

  8.6
     (14)

 

The main point about such tables is that the zero-order sample statistic appears top left, first order statistics in the 1st column or row, and second order statistics in the 2nd and 3rd columns or rows:  right-to-left language users may prefer the table to be flipped horizontally.   

These are the tables I started with, but a format like the above is much easier to read and interpret.  This output was a while back, but I’ll look for the syntax and see if it can be modified for your data.

  

1st order tables

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Sex

Male

10.4%

58.4%

31.2%

71561

Female

11.2%

54.9%

33.9%

91658

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Age in two groups

Under 50

11.6%

59.6%

28.7%

74710

50 and over

10.2%

53.8%

36.0%

88509

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

8.4%

56.0%

35.6%

99916

Non married

14.7%

57.1%

28.2%

63303

Total

10.9%

56.4%

32.7%

163219

2nd order tables

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Sex

Male

8.2%

57.8%

34.0%

46911

Female

8.6%

54.4%

37.0%

53005

Total

8.4%

56.0%

35.6%

99916

Non married

Sex

Male

14.6%

59.6%

25.8%

24650

Female

14.8%

55.5%

29.7%

38653

Total

14.7%

57.1%

28.2%

63303

Total

Sex

Male

10.4%

58.4%

31.2%

71561

Female

11.2%

54.9%

33.9%

91658

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Age in two groups

Under 50

Sex

Male

11.5%

62.1%

26.4%

30657

Female

11.7%

57.9%

30.4%

44053

Total

11.6%

59.6%

28.7%

74710

50 and over

Sex

Male

9.6%

55.7%

34.7%

40904

Female

10.7%

52.1%

37.2%

47605

Total

10.2%

53.8%

36.0%

88509

Total

Sex

Male

10.4%

58.4%

31.2%

71561

Female

11.2%

54.9%

33.9%

91658

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Age in two groups

Under 50

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

9.1%

59.8%

31.1%

45282

Non married

15.5%

59.4%

25.1%

29428

Total

11.6%

59.6%

28.7%

74710

50 and over

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

7.8%

52.9%

39.3%

54634

Non married

14.0%

55.1%

30.9%

33875

Total

10.2%

53.8%

36.0%

88509

Total

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

8.4%

56.0%

35.6%

99916

Non married

14.7%

57.1%

28.2%

63303

Total

10.9%

56.4%

32.7%

163219

 

3rd order table

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Sex

Male

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Age in two groups

Under 50

9.4%

62.3%

28.3%

18648

50 and over

7.4%

54.9%

37.7%

28263

Total

8.2%

57.8%

34.0%

46911

Non married

Age in two groups

Under 50

14.7%

61.8%

23.5%

12009

50 and over

14.5%

57.6%

28.0%

12641

Total

14.6%

59.6%

25.8%

24650

Total

Age in two groups

Under 50

11.5%

62.1%

26.4%

30657

50 and over

9.6%

55.7%

34.7%

40904

Total

10.4%

58.4%

31.2%

71561

Female

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Age in two groups

Under 50

8.9%

58.0%

33.1%

26634

50 and over

8.3%

50.8%

40.9%

26371

Total

8.6%

54.4%

37.0%

53005

Non married

Age in two groups

Under 50

16.1%

57.7%

26.2%

17419

50 and over

13.7%

53.7%

32.6%

21234

Total

14.8%

55.5%

29.7%

38653

Total

Age in two groups

Under 50

11.7%

57.9%

30.4%

44053

50 and over

10.7%

52.1%

37.2%

47605

Total

11.2%

54.9%

33.9%

91658

Total

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Age in two groups

Under 50

9.1%

59.8%

31.1%

45282

50 and over

7.8%

52.9%

39.3%

54634

Total

8.4%

56.0%

35.6%

99916

Non married

Age in two groups

Under 50

15.5%

59.4%

25.1%

29428

50 and over

14.0%

55.1%

30.9%

33875

Total

14.7%

57.1%

28.2%

63303

Total

Age in two groups

Under 50

11.6%

59.6%

28.7%

74710

50 and over

10.2%

53.8%

36.0%

88509

Total

10.9%

56.4%

32.7%

163219

 

 

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: 14 November 2012 17:20
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

Two commands: Nomreg and Plum or GenLin. Spss has a tutorial on multinomial regression in tutorials->case studies -> regression options. A book: J. Scott Long, Regression models for categorical and limited dependent variables.

Your Former Student (FS) has a three category DV. After FS has been through crosstabs, start with Nomreg/Genlin, which computes a regression for each category against the reference category. The output will show an intercept and predictor slopes for each category. If the predictor slopes are all the ‘same’ across categories, FS can switch to Plum (where the same assumption can be explicitly tested, unlike GenLin where it can’t). The predictors have lots of categories and when looking for interactions the categories kind of explode. But, why not treat the IVs as just IVs rather than as factorial IVs? Therefore, compute the IV contrast terms (and contrast term interactions and enter them in the regressions (use the WITH keyword, not the BY keyword). Harder to interpret? Yes. But, FS won’t be faced with messages about cells with few or no cases.

 

Gene Maguin

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Dates, Brian
Sent: Wednesday, November 14, 2012 10:34 AM
To: [hidden email]
Subject: Analysis with Categorical Independent and Dependent Variables

 

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

bdates

John,

 

This is great!  Thanks.

 

Brian

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John F Hall
Sent: Wednesday, November 14, 2012 1:15 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

Brian

 

Yes, definitely do the crosstabs first, even if you have to dichotomise some of the variables.  I’ve been doing something similar using CTABLES (much easier for a lay audience to understand). 

 

John F Hall (Mr)

 

Email:     [hidden email]

Website: www.surveyresearch.weebly.com

 

Jon Peck sent me some complex syntax to produce a format a bit like this: Raynald Levesque sent me something years ago which produced exactly what I wanted.

 

 

Sexism by Sex, controlling for Race


Sexism

  Percent 
           (n)

 

All

 

White

 

Other

All

10.8
    (86)

11.0
    (52)

10.6
    (34)

Boys

12.7
     (42)

13.4
     (22)

11.9
     (20)

Girls

  9.0
     (44)

  9.2
     (30)

  8.6
     (14)

 

The main point about such tables is that the zero-order sample statistic appears top left, first order statistics in the 1st column or row, and second order statistics in the 2nd and 3rd columns or rows:  right-to-left language users may prefer the table to be flipped horizontally.  

These are the tables I started with, but a format like the above is much easier to read and interpret.  This output was a while back, but I’ll look for the syntax and see if it can be modified for your data.

 

1st order tables

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Sex

Male

10.4%

58.4%

31.2%

71561

Female

11.2%

54.9%

33.9%

91658

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Age in two groups

Under 50

11.6%

59.6%

28.7%

74710

50 and over

10.2%

53.8%

36.0%

88509

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

8.4%

56.0%

35.6%

99916

Non married

14.7%

57.1%

28.2%

63303

Total

10.9%

56.4%

32.7%

163219

2nd order tables

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Sex

Male

8.2%

57.8%

34.0%

46911

Female

8.6%

54.4%

37.0%

53005

Total

8.4%

56.0%

35.6%

99916

Non married

Sex

Male

14.6%

59.6%

25.8%

24650

Female

14.8%

55.5%

29.7%

38653

Total

14.7%

57.1%

28.2%

63303

Total

Sex

Male

10.4%

58.4%

31.2%

71561

Female

11.2%

54.9%

33.9%

91658

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Age in two groups

Under 50

Sex

Male

11.5%

62.1%

26.4%

30657

Female

11.7%

57.9%

30.4%

44053

Total

11.6%

59.6%

28.7%

74710

50 and over

Sex

Male

9.6%

55.7%

34.7%

40904

Female

10.7%

52.1%

37.2%

47605

Total

10.2%

53.8%

36.0%

88509

Total

Sex

Male

10.4%

58.4%

31.2%

71561

Female

11.2%

54.9%

33.9%

91658

Total

10.9%

56.4%

32.7%

163219

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Age in two groups

Under 50

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

9.1%

59.8%

31.1%

45282

Non married

15.5%

59.4%

25.1%

29428

Total

11.6%

59.6%

28.7%

74710

50 and over

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

7.8%

52.9%

39.3%

54634

Non married

14.0%

55.1%

30.9%

33875

Total

10.2%

53.8%

36.0%

88509

Total

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

8.4%

56.0%

35.6%

99916

Non married

14.7%

57.1%

28.2%

63303

Total

10.9%

56.4%

32.7%

163219

 

3rd order table

 

 

Grouped happy

Low 0-4

Medium5 -8

High 9-10

Total

Row N %

Row N %

Row N %

Count

Sex

Male

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Age in two groups

Under 50

9.4%

62.3%

28.3%

18648

50 and over

7.4%

54.9%

37.7%

28263

Total

8.2%

57.8%

34.0%

46911

Non married

Age in two groups

Under 50

14.7%

61.8%

23.5%

12009

50 and over

14.5%

57.6%

28.0%

12641

Total

14.6%

59.6%

25.8%

24650

Total

Age in two groups

Under 50

11.5%

62.1%

26.4%

30657

50 and over

9.6%

55.7%

34.7%

40904

Total

10.4%

58.4%

31.2%

71561

Female

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Age in two groups

Under 50

8.9%

58.0%

33.1%

26634

50 and over

8.3%

50.8%

40.9%

26371

Total

8.6%

54.4%

37.0%

53005

Non married

Age in two groups

Under 50

16.1%

57.7%

26.2%

17419

50 and over

13.7%

53.7%

32.6%

21234

Total

14.8%

55.5%

29.7%

38653

Total

Age in two groups

Under 50

11.7%

57.9%

30.4%

44053

50 and over

10.7%

52.1%

37.2%

47605

Total

11.2%

54.9%

33.9%

91658

Total

Married/co-habiting/Civil Partners

Married/Cohabiting/Civil Partner

Age in two groups

Under 50

9.1%

59.8%

31.1%

45282

50 and over

7.8%

52.9%

39.3%

54634

Total

8.4%

56.0%

35.6%

99916

Non married

Age in two groups

Under 50

15.5%

59.4%

25.1%

29428

50 and over

14.0%

55.1%

30.9%

33875

Total

14.7%

57.1%

28.2%

63303

Total

Age in two groups

Under 50

11.6%

59.6%

28.7%

74710

50 and over

10.2%

53.8%

36.0%

88509

Total

10.9%

56.4%

32.7%

163219

 

 

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: 14 November 2012 17:20
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

Two commands: Nomreg and Plum or GenLin. Spss has a tutorial on multinomial regression in tutorials->case studies -> regression options. A book: J. Scott Long, Regression models for categorical and limited dependent variables.

Your Former Student (FS) has a three category DV. After FS has been through crosstabs, start with Nomreg/Genlin, which computes a regression for each category against the reference category. The output will show an intercept and predictor slopes for each category. If the predictor slopes are all the ‘same’ across categories, FS can switch to Plum (where the same assumption can be explicitly tested, unlike GenLin where it can’t). The predictors have lots of categories and when looking for interactions the categories kind of explode. But, why not treat the IVs as just IVs rather than as factorial IVs? Therefore, compute the IV contrast terms (and contrast term interactions and enter them in the regressions (use the WITH keyword, not the BY keyword). Harder to interpret? Yes. But, FS won’t be faced with messages about cells with few or no cases.

 

Gene Maguin

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Dates, Brian
Sent: Wednesday, November 14, 2012 10:34 AM
To: [hidden email]
Subject: Analysis with Categorical Independent and Dependent Variables

 

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

Art Kendall
In reply to this post by bdates
See the CATREG procedure,  you can mix nominal, ordinal,  and interval variables.  Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.


It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells.  you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables 
Art Kendall
Social Research Consultants
On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

Maguin, Eugene

Art,

 

If you’ve used the CatReg proc, what is your experience with the concordance of coefficients and standard errors between using CatReg and GenLin, NomReg, Plum, or Logistic to analyze the same datasets? It seems as if there are different assumptions in the underlying model between CatReg and the other procedures and, certainly in the estimation method. Given a categorical DV, when would you choose CatReg over the other categorical methods?

 

Thanks, Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Wednesday, November 14, 2012 2:44 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

See the CATREG procedure,  you can mix nominal, ordinal,  and interval variables.  Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.


It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells.  you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables 

Art Kendall
Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

bdates
In reply to this post by Art Kendall

Thanks, Art.

 


From: Art Kendall [mailto:[hidden email]]
Sent: Wednesday, November 14, 2012 2:44 PM
To: Dates, Brian
Cc: [hidden email]
Subject: Re: [SPSSX-L] Analysis with Categorical Independent and Dependent Variables

 

See the CATREG procedure,  you can mix nominal, ordinal,  and interval variables.  Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.


It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells.  you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables 

Art Kendall
Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

Art Kendall
In reply to this post by Maguin, Eugene
I do not have a great deal of experience  with these procedures.
If I understand correctly, plum and logistic are special cases for GenLin and or NomReg.

CATREG has built-in ways to test the degree to which level of measurement makes much of a difference in fit.  Depending how small N's are, treating education and age as not too discrepant from interval level of measurement means that fewer cases are needed. (I.e., 1 predictor per variable vs # of categories minus 1 per variable).

If this were a problem I were working on I would use it as an opportunity to compare the detailed results and the substantive conclusions from CATREG, Genlin, and NomReg. If I had some time I would see how correspondence analysis

it would be interesting to hear from the Leiden people who designed the CATEGORIES module, or from people who have experience with a pair of more of these procedures,  about comparing CATREG correspondence analysis, GenLin and NomREG

Without a more detailed understanding of the context, the present problem looks like a discriminant function type of question except that some of the predictors are mixed levels of measurement nominal (religion), ordinal or interval (education,age) and interval (gender).
In other words, the question might be "what distinguishes people who voted Obama/Romney/other.

 It makes intuitive sense that just as interaction terms can be used in regression, there is most likely a way to use interaction terms in CATREG.

From practical point of view 1) I would be surprised if the there were enough "other" votes to make  very fine distinctions. 2) It would be an unusual context where there were cases in each value of religion to do much.  Of course a lot depends on how large the total pop is  in the legislative district 

I would be very hesitant about generalizing from a district with such an unusual representation of religious subgroups.
Art Kendall
Social Research Consultants
On 11/14/2012 2:55 PM, Maguin, Eugene wrote:

Art,

 

If you’ve used the CatReg proc, what is your experience with the concordance of coefficients and standard errors between using CatReg and GenLin, NomReg, Plum, or Logistic to analyze the same datasets? It seems as if there are different assumptions in the underlying model between CatReg and the other procedures and, certainly in the estimation method. Given a categorical DV, when would you choose CatReg over the other categorical methods?

 

Thanks, Gene Maguin

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Art Kendall
Sent: Wednesday, November 14, 2012 2:44 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

See the CATREG procedure,  you can mix nominal, ordinal,  and interval variables.  Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.


It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells.  you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables 

Art Kendall
Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

Christine Eastman
Out of curiosity, would a loglinear analysis be appropriate for these data? (I think it's GENLOG / HILOG in SPSS)

On 15 November 2012 07:44, Art Kendall <[hidden email]> wrote:
I do not have a great deal of experience  with these procedures.
If I understand correctly, plum and logistic are special cases for GenLin and or NomReg.

CATREG has built-in ways to test the degree to which level of measurement makes much of a difference in fit.  Depending how small N's are, treating education and age as not too discrepant from interval level of measurement means that fewer cases are needed. (I.e., 1 predictor per variable vs # of categories minus 1 per variable).

If this were a problem I were working on I would use it as an opportunity to compare the detailed results and the substantive conclusions from CATREG, Genlin, and NomReg. If I had some time I would see how correspondence analysis

it would be interesting to hear from the Leiden people who designed the CATEGORIES module, or from people who have experience with a pair of more of these procedures,  about comparing CATREG correspondence analysis, GenLin and NomREG

Without a more detailed understanding of the context, the present problem looks like a discriminant function type of question except that some of the predictors are mixed levels of measurement nominal (religion), ordinal or interval (education,age) and interval (gender).
In other words, the question might be "what distinguishes people who voted Obama/Romney/other.

 It makes intuitive sense that just as interaction terms can be used in regression, there is most likely a way to use interaction terms in CATREG.

From practical point of view 1) I would be surprised if the there were enough "other" votes to make  very fine distinctions. 2) It would be an unusual context where there were cases in each value of religion to do much.  Of course a lot depends on how large the total pop is  in the legislative district 

I would be very hesitant about generalizing from a district with such an unusual representation of religious subgroups.
Art Kendall
Social Research Consultants
On 11/14/2012 2:55 PM, Maguin, Eugene wrote:

Art,

 

If you’ve used the CatReg proc, what is your experience with the concordance of coefficients and standard errors between using CatReg and GenLin, NomReg, Plum, or Logistic to analyze the same datasets? It seems as if there are different assumptions in the underlying model between CatReg and the other procedures and, certainly in the estimation method. Given a categorical DV, when would you choose CatReg over the other categorical methods?

 

Thanks, Gene Maguin

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Art Kendall
Sent: Wednesday, November 14, 2012 2:44 PM
To: [hidden email]
Subject: Re: Analysis with Categorical Independent and Dependent Variables

 

See the CATREG procedure,  you can mix nominal, ordinal,  and interval variables.  Are you stuck with age as a grouped variable? (for example the cell with age 18-25 would have a small n for advanced degrees.


It would be a good idea to first check cell sizes etc via crosstabs. Treating age and education as ordinal may offset potential problems of empty or small cells.  you might or might not have to ignore higher way interaction.

You might also consider clustering some of the categorical variables 

Art Kendall
Social Research Consultants

On 11/14/2012 10:33 AM, Dates, Brian wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

<a href="tel:313-841-7442" value="+13138417442" target="_blank">313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Analysis with Categorical Independent and Dependent Variables

Ryan
In reply to this post by bdates
A generalized logit model would be an appropriate place to start. Incorporate interaction terms judiciously. Be aware of cell sample sizes, as others have pointed out.

Ryan

On Nov 14, 2012, at 10:33 AM, "Dates, Brian" <[hidden email]> wrote:

I’ve had a question from a former student who has been asked, as a legislative aide, to analyze data from a legislative district. All variables are categorical.  For example, he would like to know what the effect of Religion (Roman Catholic, Maronite, Chaldean, Melkite, Other), Gender (Male, Female), Education (Less than High School, High School, Some College, Completed College, Advanced Degree), and Age Group (18-25, 26-54, 55 or older) had on voting behavior (Obama, Romney, Other). Factorial logistic regression would be in order if the dependent variable was dichotomous, or at least that’s my understanding.  But it’s not, and there will be other analyses in which the dependent will have more than three categories. Any ideas would be welcome.  Thanks.

 

 

Brian G. Dates

Director of Evaluation and Research

Southwest Counseling Solutions

1700 Waterman

Detroit, MI     48209

313-841-7442

[hidden email]

 

Leading the Way in Building a Healthy Community