SPSSX Discussion

Recoding variables

Classic

List

Threaded

3 messages Options

Mariko Kato

Recoding variables

Hello!

I'd like to recode a variable in the following way:
The Codes of a categorical variable should be ranked descendingly in order of their frequency. Categories like "miscellaneous" should
not be ranked, instead they should be put at the end of the ranking.
Up to now we tried this syntax but it does not consider differences between categories with same frequencies.

AGGREGATE
/OUTFILE=*
MODE=ADDVARIABLES
/BREAK=v1
/v1_fre 'Frequenvcy v1' = NU(v1).

if (v1>999 & not(sysmis(v1))) v1_fre=0.
exe.

RANK
VARIABLES=v1_fre (D) /RANK
/PRINT=no
/TIES=CONDENSE.

Do you have an idea? Or is this the completely wrong way?

Großes Kino für zu Hause - Kostenlos für alle WEB.DE Nutzer!
Jetzt kostenlos anmelden unter *http://www.blockbuster.web.de* [http://www.blockbuster.web.de]

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Recoding variables

Mariko,

If you have one or just a few variables, the simplest way is going to be do
it by hand.

If you have 'lots of' variables to do this for then other methods are
needed. This part of your problem
>>Categories like "miscellaneous" should not be ranked, instead they should
be put at the end of the ranking.
Poses a problem because you have to decide which values are miscellaneous.
'... Put them at the end of the ranking is a problem also. You also have a
problem with missing values.
I'd do this. Lastly, you are going to have a problem with value labels
because values of the new variable will be completely disconnected from the
original variable.

Missing values v1().
Recode v1(sysmis <missing value code list>=999).
* These two command gather up all the missing vaue codes, including sysmis,
into a single new code, which will make things easier later.

AGGREGATE OUTFILE=* MODE=ADDVARIABLES/BREAK=v1
/v1_fre 'Frequenvcy v1' = NU(v1).

Crosstabs v1 by v1_fre.

* This shows how values of the new variable correspond to values of the old
variable. This matters for your miscellaneous codes and for the missing
value code. No escaping this. It's a by-hand operation. Suppose that your
original variable had 15 valid values, three of which were miscellaneous,
plus a couple of missing value codes and sysmis. Your new variable will have
12 primary codes, numbered 1-12, three miscellaneous codes, numbered 13-15,
and one missing value code=999. Of couse, there is the possibility that one
or more of your valid data codes will have the same frequency. This is where
rank creates the problem.

Recode v1_fre(<nu for miscellaneous code 1>=-13)(<nu for miscellaneous code
2>=-14)
(<nu for miscellaneous code 3>=-15)(<nu for missing data code>=-999).

Sort cases by v1_fre(d).

Do if ($casenum eq 1).
+ compute v1_fre=1.
Else if (v1_fre gt 0).
+ if (v1 eq lag(v1)) v1_fre=lag(v1_fre).
+ if (v1 ne lag(v1)) v1_fre=lag(v1_fre)+1.
End if.
Compute v1_fre=abs(v1_fre).
Missing values v1_fre(999).

* So basically, I'm doing a ranking but with a different procedure for
handling ties and excluding cases with specific values. I don't know that
there is any way to automate this so that lots of variables could be worked
on. Given what you want, I think the crosstabs and recode steps have to be
in there. AND, don't forget value labels. Perhaps someone else will see a
better way.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: Recoding variables

Note that there are programmability tools that handle this problem in the context of Ctables (in the spssaux2.py module on SPSS Developer Central www.spss.com/devcentral). Ctables can sort by statistics in the table, and the genCategoryList function handles cases where there is an exception list of things that should be moved to the bottom (or top).

genCategoryList is normally used to generate a macro that would be fed to the Ctables syntax, but it returns a sorted category list with exceptions handled that could easily be plugged in to a generated recode command.

Furthermore, Dev Central has another module called specialtransforms.py that has a Recode class that will do the recodes and preserve the original value labels in the new variables.

So using these two things together, you can pretty easily automate the entire task. These require the programmability plugin but should work with SPSS versions at least back to 15 and probably 14 as well.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Wednesday, July 30, 2008 9:05 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Recoding variables

Mariko,

If you have one or just a few variables, the simplest way is going to be do
it by hand.

If you have 'lots of' variables to do this for then other methods are
needed. This part of your problem
>>Categories like "miscellaneous" should not be ranked, instead they should
be put at the end of the ranking.
Poses a problem because you have to decide which values are miscellaneous.
'... Put them at the end of the ranking is a problem also. You also have a
problem with missing values.
I'd do this. Lastly, you are going to have a problem with value labels
because values of the new variable will be completely disconnected from the
original variable.

Missing values v1().
Recode v1(sysmis <missing value code list>=999).
* These two command gather up all the missing vaue codes, including sysmis,
into a single new code, which will make things easier later.

AGGREGATE OUTFILE=* MODE=ADDVARIABLES/BREAK=v1
/v1_fre 'Frequenvcy v1' = NU(v1).

Crosstabs v1 by v1_fre.

* This shows how values of the new variable correspond to values of the old
variable. This matters for your miscellaneous codes and for the missing
value code. No escaping this. It's a by-hand operation. Suppose that your
original variable had 15 valid values, three of which were miscellaneous,
plus a couple of missing value codes and sysmis. Your new variable will have
12 primary codes, numbered 1-12, three miscellaneous codes, numbered 13-15,
and one missing value code=999. Of couse, there is the possibility that one
or more of your valid data codes will have the same frequency. This is where
rank creates the problem.

Recode v1_fre(<nu for miscellaneous code 1>=-13)(<nu for miscellaneous code
2>=-14)
(<nu for miscellaneous code 3>=-15)(<nu for missing data code>=-999).

Sort cases by v1_fre(d).

Do if ($casenum eq 1).
+ compute v1_fre=1.
Else if (v1_fre gt 0).
+ if (v1 eq lag(v1)) v1_fre=lag(v1_fre).
+ if (v1 ne lag(v1)) v1_fre=lag(v1_fre)+1.
End if.
Compute v1_fre=abs(v1_fre).
Missing values v1_fre(999).

* So basically, I'm doing a ranking but with a different procedure for
handling ties and excluding cases with specific values. I don't know that
there is any way to automate this so that lots of variables could be worked
on. Given what you want, I think the crosstabs and recode steps have to be
in there. AND, don't forget value labels. Perhaps someone else will see a
better way.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD