|
Hello!
I'd like to recode a variable in the following way: The Codes of a categorical variable should be ranked descendingly in order of their frequency. Categories like "miscellaneous" should not be ranked, instead they should be put at the end of the ranking. Up to now we tried this syntax but it does not consider differences between categories with same frequencies. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=v1 /v1_fre 'Frequenvcy v1' = NU(v1). if (v1>999 & not(sysmis(v1))) v1_fre=0. exe. RANK VARIABLES=v1_fre (D) /RANK /PRINT=no /TIES=CONDENSE. Do you have an idea? Or is this the completely wrong way? Großes Kino für zu Hause - Kostenlos für alle WEB.DE Nutzer! Jetzt kostenlos anmelden unter *http://www.blockbuster.web.de* [http://www.blockbuster.web.de] ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Mariko,
If you have one or just a few variables, the simplest way is going to be do it by hand. If you have 'lots of' variables to do this for then other methods are needed. This part of your problem >>Categories like "miscellaneous" should not be ranked, instead they should be put at the end of the ranking. Poses a problem because you have to decide which values are miscellaneous. '... Put them at the end of the ranking is a problem also. You also have a problem with missing values. I'd do this. Lastly, you are going to have a problem with value labels because values of the new variable will be completely disconnected from the original variable. Missing values v1(). Recode v1(sysmis <missing value code list>=999). * These two command gather up all the missing vaue codes, including sysmis, into a single new code, which will make things easier later. AGGREGATE OUTFILE=* MODE=ADDVARIABLES/BREAK=v1 /v1_fre 'Frequenvcy v1' = NU(v1). Crosstabs v1 by v1_fre. * This shows how values of the new variable correspond to values of the old variable. This matters for your miscellaneous codes and for the missing value code. No escaping this. It's a by-hand operation. Suppose that your original variable had 15 valid values, three of which were miscellaneous, plus a couple of missing value codes and sysmis. Your new variable will have 12 primary codes, numbered 1-12, three miscellaneous codes, numbered 13-15, and one missing value code=999. Of couse, there is the possibility that one or more of your valid data codes will have the same frequency. This is where rank creates the problem. Recode v1_fre(<nu for miscellaneous code 1>=-13)(<nu for miscellaneous code 2>=-14) (<nu for miscellaneous code 3>=-15)(<nu for missing data code>=-999). Sort cases by v1_fre(d). Do if ($casenum eq 1). + compute v1_fre=1. Else if (v1_fre gt 0). + if (v1 eq lag(v1)) v1_fre=lag(v1_fre). + if (v1 ne lag(v1)) v1_fre=lag(v1_fre)+1. End if. Compute v1_fre=abs(v1_fre). Missing values v1_fre(999). * So basically, I'm doing a ranking but with a different procedure for handling ties and excluding cases with specific values. I don't know that there is any way to automate this so that lots of variables could be worked on. Given what you want, I think the crosstabs and recode steps have to be in there. AND, don't forget value labels. Perhaps someone else will see a better way. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Note that there are programmability tools that handle this problem in the context of Ctables (in the spssaux2.py module on SPSS Developer Central www.spss.com/devcentral). Ctables can sort by statistics in the table, and the genCategoryList function handles cases where there is an exception list of things that should be moved to the bottom (or top).
genCategoryList is normally used to generate a macro that would be fed to the Ctables syntax, but it returns a sorted category list with exceptions handled that could easily be plugged in to a generated recode command. Furthermore, Dev Central has another module called specialtransforms.py that has a Recode class that will do the recodes and preserve the original value labels in the new variables. So using these two things together, you can pretty easily automate the entire task. These require the programmability plugin but should work with SPSS versions at least back to 15 and probably 14 as well. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Wednesday, July 30, 2008 9:05 AM To: [hidden email] Subject: Re: [SPSSX-L] Recoding variables Mariko, If you have one or just a few variables, the simplest way is going to be do it by hand. If you have 'lots of' variables to do this for then other methods are needed. This part of your problem >>Categories like "miscellaneous" should not be ranked, instead they should be put at the end of the ranking. Poses a problem because you have to decide which values are miscellaneous. '... Put them at the end of the ranking is a problem also. You also have a problem with missing values. I'd do this. Lastly, you are going to have a problem with value labels because values of the new variable will be completely disconnected from the original variable. Missing values v1(). Recode v1(sysmis <missing value code list>=999). * These two command gather up all the missing vaue codes, including sysmis, into a single new code, which will make things easier later. AGGREGATE OUTFILE=* MODE=ADDVARIABLES/BREAK=v1 /v1_fre 'Frequenvcy v1' = NU(v1). Crosstabs v1 by v1_fre. * This shows how values of the new variable correspond to values of the old variable. This matters for your miscellaneous codes and for the missing value code. No escaping this. It's a by-hand operation. Suppose that your original variable had 15 valid values, three of which were miscellaneous, plus a couple of missing value codes and sysmis. Your new variable will have 12 primary codes, numbered 1-12, three miscellaneous codes, numbered 13-15, and one missing value code=999. Of couse, there is the possibility that one or more of your valid data codes will have the same frequency. This is where rank creates the problem. Recode v1_fre(<nu for miscellaneous code 1>=-13)(<nu for miscellaneous code 2>=-14) (<nu for miscellaneous code 3>=-15)(<nu for missing data code>=-999). Sort cases by v1_fre(d). Do if ($casenum eq 1). + compute v1_fre=1. Else if (v1_fre gt 0). + if (v1 eq lag(v1)) v1_fre=lag(v1_fre). + if (v1 ne lag(v1)) v1_fre=lag(v1_fre)+1. End if. Compute v1_fre=abs(v1_fre). Missing values v1_fre(999). * So basically, I'm doing a ranking but with a different procedure for handling ties and excluding cases with specific values. I don't know that there is any way to automate this so that lots of variables could be worked on. Given what you want, I think the crosstabs and recode steps have to be in there. AND, don't forget value labels. Perhaps someone else will see a better way. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
