Regression and chi-square questions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Regression and chi-square questions

Brian J. Hall
Dear list,
I have a couple of questions regarding testing for moderation with a
three-level categorical IV. I typically have used rather large samples and
conducted between group or moderation with 2-groups. The 3-group situation
appears to be more complex.

First, the IV is ethnicity. The three levels contain 200, 150, and 38 people.
Question 1: can I include the group of 38 in the interaction term, or is
this sub-sample too small?
Question 2: what is the best strategy to construct the IV and interaction
terms? Dummy coding the variables into 3 separate ethnicity variables and
multiplying each by the centered continuous level predictor is an approach I
am familiar with, but I have not seen a source that makes this unambiguous.
If anyone had syntax to share, that would be most helpful.
Question 3: how can one calculate power (and sample size) for testing
interaction terms?

We also conducted Chi-square tests on ethnicity and other categorical
variables. When the test is significant, it might not be entirely clear
where the significance lies. Conducting a 3-group chi-square and following
this up with pairwise chi-square analyses does not appear efficient.
Question 4: Is there a way of identifying the source of the significant
differences between three groups without follow-up tests?

I hope these questions are clear to the list. Any help in answering these
questions would be greatly appreciated.

Best regards,
Brian

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Regression and chi-square questions

Trejtowicz, Mariusz
Dear Brian,

Is your model: DV = categorical_IV + scale_predictor +
categorical_IV*scale_predictor (standard ANCOVA) ?
Why not to use UNIANOVA (instead of regression)? It will do dummy coding
for you (although you will be able to test user contrast also), you will
be able to obtain observed power (if you are concerned with statistical
power), and use EMMEANS to better understand the results. Of course, if
you like REGRESSION procedure e.g. because of diagnostics features, you
will be able to obtain the same results with REGRESSION, using dummy
variables for your categorical predictor.

I wouldn't worry about 38 people category, until it's not being used as
a reference category. It probably makes more sense however to compare
minorities to majority group, so I would only recode IV so that the
majority category will be the last.


Below I attached syntax for structurally similar analysis with use of
"Employee data.sav" data file. Note that this is methodologically
problematic example, because of a strong factor-covariate correlation
violating assumptions of GLM.

**** Example **** .
comp lnsalary = ln(salary).
comp lnsalbeg = ln(salbegin).
desc lnsalbeg.
* Centralisation.
comp lnsalbegcen = lnsalbeg - 9.6694.

fre jobcat.
* recoding 'clerical' category to be the last one.
recode jobcat (1=4) (else=copy).
add val lab jobcat 4'Clerical'.

UNIANOVA lnsalary BY jobcat WITH lnsalbegcen
  /CONTRAST(jobcat)=Simple(3)
  /METHOD=SSTYPE(3)
  /INTERCEPT=INCLUDE
  /SAVE=PRED(salpred)
  /EMMEANS=TABLES(jobcat) WITH(lnsalbegcen=MEAN) COMPARE ADJ(LSD)
  /PRINT=ETASQ PARAMETER OPOWER
  /CRITERIA=ALPHA(.05)
  /DESIGN=jobcat lnsalbegcen jobcat*lnsalbegcen.

DESCRIPTIVES VARIABLES=lnsalbegcen salpred
  /SAVE
  /STATISTICS=MEAN STDDEV MIN MAX.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Zlnsalbegcen Zsalpred
jobcat MISSING=LISTWISE
    REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Zlnsalbegcen=col(source(s), name("Zlnsalbegcen"))
  DATA: Zsalpred=col(source(s), name("Zsalpred"))
  DATA: jobcat=col(source(s), name("jobcat"), unit.category())
  GUIDE: axis(dim(1), label("lnsalbegcens"))
  GUIDE: axis(dim(2), label("lnsalarys"))
  GUIDE: legend(aesthetic(aesthetic.color.exterior), label("Employment
Category"))
  SCALE: cat(aesthetic(aesthetic.color.exterior), include("1", "2",
"3"))
  ELEMENT: line(position(smooth.linear(Zlnsalbegcen*Zsalpred)),
color(jobcat))
END GPL.

**** end of the example **** .



Ad. Quest. 3.
I recommend you G*Power:
http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/
And for your data, using UNIANOVA you can compute observed power for
model parameters.


Ad. Quest. 4.
I think that the best way to understand the relation between categorical
variables is to analyse standardized residuals within crosstables. If a
standardized residual is outside <-1.96;1.96> range, the cell's observed
count is significantly different (p<0,05) from expected count.

An example (again, "Employee data.sav" datafile):

CROSSTABS
  /TABLES=gender BY jobcat
  /FORMAT=AVALUE TABLES
  /STATISTICS=CHISQ PHI
  /CELLS=COUNT EXPECTED COLUMN ASRESID
  /COUNT ROUND CELL.



HTH,
Mariusz



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Brian Hall
Sent: Thursday, March 19, 2009 8:46 PM
To: [hidden email]
Subject: Regression and chi-square questions


Dear list,
I have a couple of questions regarding testing for moderation with a
three-level categorical IV. I typically have used rather large samples
and
conducted between group or moderation with 2-groups. The 3-group
situation
appears to be more complex.

First, the IV is ethnicity. The three levels contain 200, 150, and 38
people.
Question 1: can I include the group of 38 in the interaction term, or is
this sub-sample too small?
Question 2: what is the best strategy to construct the IV and
interaction
terms? Dummy coding the variables into 3 separate ethnicity variables
and
multiplying each by the centered continuous level predictor is an
approach I
am familiar with, but I have not seen a source that makes this
unambiguous.
If anyone had syntax to share, that would be most helpful.
Question 3: how can one calculate power (and sample size) for testing
interaction terms?

We also conducted Chi-square tests on ethnicity and other categorical
variables. When the test is significant, it might not be entirely clear
where the significance lies. Conducting a 3-group chi-square and
following
this up with pairwise chi-square analyses does not appear efficient.
Question 4: Is there a way of identifying the source of the significant
differences between three groups without follow-up tests?

I hope these questions are clear to the list. Any help in answering
these
questions would be greatly appreciated.

Best regards,
Brian

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD