simple transformation quandry

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

simple transformation quandry

tseifert
Hello,
I'm struggling with a simple transformation and any help you could
provide would be most appreciated.

I have a categorical variable (HS_FamInc) on which 10% or the data are
missing.  I have decided to give the ethnic group's mean (after
setting the categorical variable to the midpoint of its category) to
those who are missing on the HS_FamInc variable.  The problem is that
I don't know how to write the syntax so as not to lose the other 90%
of the sample who really do have data on the HS_FamIncMid variable.  I
can't think of how to write the else=copy syntax.  The syntax I have
so far is below.

I would greatly appreciate any help that the list could provide.
Thanks so much.  -Tricia

Do if (SYSMIS(HS_FamInc) AND Ethnicity EQ 1) .
Compute HS_FamIncMid = 43102.24 .
else if (SYSMIS(HS_FamInc) AND Ethnicity EQ 2 ) .
Compute HS_FamIncMid = 48656.86 .
else if (SYSMIS(HS_FamInc) AND Ethnicity EQ 3 ) .
Compute HS_FamIncMid = 60901.16 .
else if (SYSMIS(HS_FamIncMid)  AND Ethnicity EQ 4 ) .
Compute HS_FamIncMid = 56993.13 .
else if (SYSMIS(HS_FamIncMid) AND Ethnicity EQ 5 ) .
Compute HS_FamIncMid = 55588.24 .
else if (SYSMIS(HS_FamIncMid)  AND Ethnicity EQ 6 ) .
Compute HS_FamIncMid = 56993.13 .
else if (SYSMIS(HS_FamIncMid)  AND Ethnicity EQ 7) .
Compute HS_FamIncMid = 57712.33 .
else if (SYSMIS(HS_FamIncMid)  AND Ethnicity EQ 8) .
Compute HS_FamIncMid = 57927.54 .
end if .
exe .

Tricia Seifert
Graduate Student and Research Assistant
Student Affairs Administration and Research
University of Iowa
N440 Lindquist Center
Iowa City, IA 52242
Reply | Threaded
Open this post in threaded view
|

Re: simple transformation quandry

Richard Ristow
At 09:49 AM 10/4/2006, [hidden email] wrote:

This isn't mainly about the particular logic. You write,

>I have a categorical variable (HS_FamInc) on which 10% or the data are
>missing.  I have decided to give the ethnic group's mean (after
>setting the categorical variable to the midpoint of its category) to
>those who are missing on the HS_FamInc variable.

What you're describing, and seem to be doing, is a very bad idea: using
logic to impute values for missing data in an EXISTING variable.
Imputing missing values is fine (with the appropriate techniques and
precautions), but the variable whose values are to be imputed should
always be copied to a NEW variable first. Impute into the existing
variable, and you're very likely to lose track of which values are
measured and which imputed; and that way, madness and major confusion
lie.

But since I'm writing anyway,

>I have a categorical variable (HS_FamInc) on which 10% or the data are
>missing.  I have decided to give the ethnic group's mean (after
>setting the categorical variable to the midpoint of its category) to
>those who are missing on the HS_FamInc variable.  The problem is that
>I don't know how to write the syntax so as not to lose the other 90%
>of the sample who really do have data on the HS_FamIncMid variable.  I
>can't think of how to write the else=copy syntax.  The syntax I have
>so far is below.

Or, are you computing the new variable HS_FamInc into new variable
HS_FamIncMid, which is
. The (dollar) value corresponding to the HS_FamInc category, if
HS_FamInc is not missing
. The ethnicity-mean income if HS_FamInc is missing?

If so, something like this, maybe. Not tested even as it stands; and,
not confirmed that what I'm trying to do, is what you want.

The other doubtful practice in your code (and in mine, below) is that
the ethnicity-mean values in the RECODE are, effectively, 'data in
code', a fertile source of confusion and errors. (Why? Because you can
forget it's there, so you don't look for it and change it, if you find
the values need to be changed. Worse, you copy the code to various
other programs, which makes it even easier not to change them all if
values need to be changed. Finally, if your analysis is to be
replicated, even by you, you need to know WHAT all the data is, and
WHERE it is. You're likely to forget you had data in code; anybody else
won't know it in the first place. I've been bitten...)

Do if (SYSMIS(HS_FamInc))
.  RECODE ETHNICITY
    ( 1 = 43102.24 )
    ( 2 = 48656.86 )
    ( 3 = 60901.16 )
    ( 4 = 56993.13 )
    ( 5 = 55588.24 )
    ( 6 = 56993.13 )
    ( 7 = 57712.33 )
    ( 8 = 57927.54 )
      INTO HS_FamIncMid.
ELSE.
.  RECODE HS_FamInc
    <put in the recodes from codes to values>
         INTO HS_FamIncMid.
END IF.