Hi all,
This is not SPSS specific but I wanted pose a general statistical
question, if I may.
I am working with a large scale test data. The purpose is to
compute mean scale scores for a number of Districts, adjusted for certain
demographic variables (such as race/ethnicity; SES etc.).
To make the scenario simpler say we have 3 Districts in the entire
data set and one control variable (X1). The sample sizes of the districts are
NOT equal.
Here is my approach with a couple of questions at the end.
Step 1: Create unweighted effect codes
for the three Districts (D1, D2, D3) as follows:
District1 District2 District3
E1 1
0 -1
E2 0
1
-1
E1 compares the mean of D1 to the grand mean and E2 compares
the mean of D2 to the grand mean.
Step 2: Grand center X1: Compute X-Xbar, where
Xbar is the mean of X across all students in 3 districts.
Step 3: Run the following regression
Y = a + b1 (X1- Xbar) + b2 (E1)+ b3(E2)
Y is the scale scores; a is the intercept and b’s are
the slopes.
Step 3: Here is where I am stuck. To get the adjusted
District means I have 2 options and I am not sure which one is right:
a)
Plug in the value of X, E1 and E2 for each student in
each District and obtain predicted Y. Mean of this predicted Y for each District
is the adjusted mean for that District; or is it?
b)
The intercept (a) is the grand mean adjusted for X1.
Slope of E1 corresponds to the difference between the adjusted grand mean and
the adjusted District mean. Hence for District 1, Adjusted District mean =
a+b2. For District 2, the adjusted mean = a+b3; or is
it? J
Questions:
·
I am right in including the effect codes in the
regression at Step 2?
·
Can we use
unweighted effects codes in Step 1 since the sample sizes of the Districts are
NOT equal? Should I center the effect codes too or should I use weighted effect
codes instead?
·
How do we obtain adjusted means in Step 3;
option a or option b?
Thank you
Enis