Dear all,
I have 5 different variables (var1, var2, var3, var4, var 5) which all have scores from 0 to 10. Now I want to compute a new variable (biggest_new) which says: 1 if var1 has the highest score, 2 if var2 has the highest score, 3 if var3 has the highest score, 4 is var4 has the highest score, and 5 if var5 has the highest score. (if 1 and 2 have the highest score biggest_new = 6 if 1 and 3 have the highest score biggest_new =7 .. .. ..) I tried: compute biggest = MAX( var1, var2, var3, var4, var5). EXECUTE. And then in the next step: if (biggest eq var1) biggest_new = 1. if (biggest eq var2) beggest_new = 2. and so on.... but this doesn't work when for example var1 and var4 both have 10. Then the new variabel biggest just says 10. and then the computing of biggest_new doesn't work because it says 1 or 4 because both variabels 1 and 4 have score 10. Hope someone can help me out. Regards, Wall |
Administrator
|
Too early to write code ;-)
Hint:See VARSTOCASES/SORT/LAG/CASESTOVARS. HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by wall
Wall,
In what follows, I'm going to assume that your data set consists of an id and five variables, v1 to v5. I understand that it probably doesn't but suppose it did. So, restructure the file using varstocases, sort cases by id and newv, which is the result of the restructure operation, aggregate by id with mode=addvariables and new variables of varnum, which is the variable number of the last record in the set of records with the same id and which also, because of the sort operation, has the maximum value of the five records in the id set. Then restructure the file back. Syntax wise: Vartocases make newv=v1 to v5/index=varnum Sort cases by id newv. Aggregate file=* mode=addvariables/presorted/break=id/varnum=last(varnum). Casestovars id=id. If this isn't feasible given your dataset, you'll need to use a vector structure and loop through the variables or do a repeat structure. Like this: Compute maxval=max(v1 to v5). Do repeat x=v1 to v5/y=1 to 5. + if (x eq maxval) varnum=y. End repeat. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of wall Sent: Wednesday, October 19, 2011 5:44 AM To: [hidden email] Subject: compute new variable with scores from different variables Dear all, I have 5 different variables (var1, var2, var3, var4, var 5) which all have scores from 0 to 10. Now I want to compute a new variable (biggest_new) which says: 1 if var1 has the highest score, 2 if var2 has the highest score, 3 if var3 has the highest score, 4 is var4 has the highest score, and 5 if var5 has the highest score. (if 1 and 2 have the highest score biggest_new = 6 if 1 and 3 have the highest score biggest_new =7 .. .. ..) I tried: compute biggest = MAX( var1, var2, var3, var4, var5). EXECUTE. And then in the next step: if (biggest eq var1) biggest_new = 1. if (biggest eq var2) beggest_new = 2. and so on.... but this doesn't work when for example var1 and var4 both have 10. Then the new variabel biggest just says 10. and then the computing of biggest_new doesn't work because it says 1 or 4 because both variabels 1 and 4 have score 10. Hope someone can help me out. Regards, Wall -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/compute-new-variable-with-scor es-from-different-variables-tp4917091p4917091.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by wall
Here is something which will prove to be useful (it yields the max value and tied indexes as a comma delimited string).
---- data list free / id var1 to var5 morejunk. begin data 1 4 1 4 2 4 99999 2 4 2 4 5 1 123456 3 2 1 2 1 5 345622 4 2 5 2 5 6 123456 5 4 2 5 4 2 67883 6 5 4 5 3 3 56637 7 9 3 4 9 8 234556 8 5 5 5 5 5 2345 end data. FORMAT id var1 TO var5 (F2.0) SORT CASES BY ID. SAVE OUTFILE 'C:\TEMP\masterfile.sav'. MATCH FILES / FILE * / KEEP ID var1 to var5. VARSTOCASES MAKE maxval FROM var1 to var5 / INDEX=Index. SORT CASES BY ID (A) maxval (D) INDEX (A). STRING TOPS(A9). IF ANY(1,$CASENUM, ID NE LAG(ID)) TOPS=STRING(INDEX,F1.0). IF ID EQ LAG(ID) AND MAXVAL EQ LAG(MAXVAL) AND LAG(TOPS) NE " " TOPS=CONCAT(RTRIM(LAG(TOPS)),",",STRING(INDEX,F1.0)). SELECT IF TOPS NE " ". MATCH FILES / FILE * / BY ID / LAST=bottom. SELECT IF BOTTOM. MATCH FILES / FILE 'C:\TEMP\masterfile.sav' / TABLE=* / BY ID / DROP INDEX BOTTOM. LIST. ID VAR1 VAR2 VAR3 VAR4 VAR5 MOREJUNK MAXVAL TOPS 1 4 1 4 2 4 99999.00 4 1,3,5 2 4 2 4 5 1 123456.0 5 4 3 2 1 2 1 5 345622.0 5 5 4 2 5 2 5 6 123456.0 6 5 5 4 2 5 4 2 67883.00 5 3 6 5 4 5 3 3 56637.00 5 1,3 7 9 3 4 9 8 234556.0 9 1,4 8 5 5 5 5 5 2345.00 5 1,2,3,4,5 Number of cases read: 8 Number of cases listed: 8 --- Getting what you specifically requested. "Now I want to compute a new variable (biggest_new) which says: 1 if var1 has the highest score, .... 5 if var5 has the highest score. (if 1 and 2 have the highest score biggest_new = 6 if 1 and 3 have the highest score biggest_new =7 ... etc ". Is not doable without jumping through a *bunch* of hoops and IMNSHO is something which will prove to be utterly useless from any analytical perspective. You are seeking to enumerate *ALL* possible combinations and represent them as strings (utterly useless, happy typing if you opt to go that route)??? ---- HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by wall
My first thought is that this *seems* like a damnfool thing to
do. Whatever will you do with the results? ... especially if there are 31 different categories with "highest"? But it is not too difficult - see below. > Date: Wed, 19 Oct 2011 02:43:37 -0700 > From: [hidden email] > Subject: compute new variable with scores from different variables > To: [hidden email] > > Dear all, > > I have 5 different variables (var1, var2, var3, var4, var 5) which all have > scores from 0 to 10. > > Now I want to compute a new variable (biggest_new) which says: > 1 if var1 has the highest score, > 2 if var2 has the highest score, > 3 if var3 has the highest score, > 4 is var4 has the highest score, and > 5 if var5 has the highest score. > > (if 1 and 2 have the highest score biggest_new = 6 > if 1 and 3 have the highest score biggest_new =7 > .. > .. > > I tried: > compute biggest = MAX( > var1, var2, var3, var4, var5). > EXECUTE. > > And then in the next step: > > if (biggest eq var1) biggest_new = 1. > if (biggest eq var2) beggest_new = 2. > and so on.... > > but this doesn't work when for example var1 and var4 both have 10. Then the > new variabel biggest just says 10. and then the computing of biggest_new > doesn't work because it says 1 or 4 because both variabels 1 and 4 have > score 10. Consider, instead of 6 and 7, that you can think of 5 separate binary indicators, for whether var1 to var5 are equal to the max. This could be constructed as 5 binary digits, each 0 and 1, for the 5 variables; at least one will equal the Max, so there are 31 possibilities. But it is safer to use decimal digits, to make the initial results readable. First, you create your 31 categories, and then you may recode for 1-to-whatever. Instead of what you use above after finding "biggest", use COMPUTE biggest_new= 0. DO REPEAT dum= var1 to var5/ vnum= 1 2 3 4 5 IF (biggest eq dum) biggest_new = 10*biggest_new+ vnum. END REPEAT. You can use FREQ to see what values exist. These will consist of digits matching the vars that equaled biggest, such as 1, 13, 1345. The "10* " shifts the values. 1-5 will be there for 1-5 alone. You can use AUTORECODE to create a new variable that has whatever codes you want, to save whichever combinations seem important, with the encoded decimal values preserved as the Value Labels. -- Rich Ulrich |
In reply to this post by David Marso
Excellent improvement!
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Wednesday, October 19, 2011 1:50 PM To: [hidden email] Subject: Re: compute new variable with scores from different variables Here is something which will prove to be useful (it yields the max value and tied indexes as a comma delimited string). ---- data list free / id var1 to var5 morejunk. begin data 1 4 1 4 2 4 99999 2 4 2 4 5 1 123456 3 2 1 2 1 5 345622 4 2 5 2 5 6 123456 5 4 2 5 4 2 67883 6 5 4 5 3 3 56637 7 9 3 4 9 8 234556 8 5 5 5 5 5 2345 end data. FORMAT id var1 TO var5 (F2.0) SORT CASES BY ID. SAVE OUTFILE 'C:\TEMP\masterfile.sav'. MATCH FILES / FILE * / KEEP ID var1 to var5. VARSTOCASES MAKE maxval FROM var1 to var5 / INDEX=Index. SORT CASES BY ID (A) maxval (D) INDEX (A). STRING TOPS(A9). IF ANY(1,$CASENUM, ID NE LAG(ID)) TOPS=STRING(INDEX,F1.0). IF ID EQ LAG(ID) AND MAXVAL EQ LAG(MAXVAL) AND LAG(TOPS) NE " " TOPS=CONCAT(RTRIM(LAG(TOPS)),",",STRING(INDEX,F1.0)). SELECT IF TOPS NE " ". MATCH FILES / FILE * / BY ID / LAST=bottom. SELECT IF BOTTOM. MATCH FILES / FILE 'C:\TEMP\masterfile.sav' / TABLE=* / BY ID / DROP INDEX BOTTOM. LIST. ID VAR1 VAR2 VAR3 VAR4 VAR5 MOREJUNK MAXVAL TOPS 1 4 1 4 2 4 99999.00 4 1,3,5 2 4 2 4 5 1 123456.0 5 4 3 2 1 2 1 5 345622.0 5 5 4 2 5 2 5 6 123456.0 6 5 5 4 2 5 4 2 67883.00 5 3 6 5 4 5 3 3 56637.00 5 1,3 7 9 3 4 9 8 234556.0 9 1,4 8 5 5 5 5 5 2345.00 5 1,2,3,4,5 Number of cases read: 8 Number of cases listed: 8 --- Getting what you specifically requested. "Now I want to compute a new variable (biggest_new) which says: 1 if var1 has the highest score, .... 5 if var5 has the highest score. (if 1 and 2 have the highest score biggest_new = 6 if 1 and 3 have the highest score biggest_new =7 ... etc ". Is not doable without jumping through a *bunch* of hoops and IMNSHO is something which will prove to be utterly useless from any analytical perspective. You are seeking to enumerate *ALL* possible combinations and represent them as strings (utterly useless, happy typing if you opt to go that route)??? ---- HTH, David wall wrote: > > Dear all, > > I have 5 different variables (var1, var2, var3, var4, var 5) which all > have scores from 0 to 10. > > Now I want to compute a new variable (biggest_new) which says: > 1 if var1 has the highest score, > 2 if var2 has the highest score, > 3 if var3 has the highest score, > 4 is var4 has the highest score, and > 5 if var5 has the highest score. > > (if 1 and 2 have the highest score biggest_new = 6 > if 1 and 3 have the highest score biggest_new =7 > .. > .. > ..) > > I tried: > compute biggest = MAX( > var1, var2, var3, var4, var5). > EXECUTE. > > And then in the next step: > > if (biggest eq var1) biggest_new = 1. > if (biggest eq var2) beggest_new = 2. > and so on.... > > but this doesn't work when for example var1 and var4 both have 10. Then > the new variabel biggest just says 10. and then the computing of > biggest_new doesn't work because it says 1 or 4 because both variabels 1 > and 4 have score 10. > > Hope someone can help me out. > > Regards, > Wall > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/compute-new-variable-with-scor es-from-different-variables-tp4917091p4918797.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Rich Ulrich
I think I like this better than my tortured butchery and resurrection of the data ;-)
I think I started digging the hole before I had any coffee and just kept on digging the same hole later. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
OTOH: Rich's solution will get weird if there are more than 9 variables.
Here is a fusion of my earlier 'core' sans 'data bloodbath' ... ;-) data list free / id var1 to var5 morejunk. begin data 1 4 1 4 2 4 99999 2 4 2 4 5 1 123456 3 2 1 2 1 5 345622 4 2 5 2 5 6 123456 5 4 2 5 4 2 67883 6 5 4 5 3 3 56637 7 9 3 4 9 8 234556 8 5 5 5 5 5 2345 end data. STRING ties (A10). COMPUTE maxval=MAX(var1 TO var5). DO REPEAT v=var1 to var5 / index= 1 TO 5. + IF (maxval EQ v) ties=CONCAT(RTRIM(ties),",",STRING(index,F1)). END REPEAT. COMPUTE ties=SUBSTR(ties,2). LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by wall
Shalom
Here is another way of doing it . DATASET CLOSE all. input program. numeric var1 to var5 . vector var = var1 to var5 . loop case=1 to 12. leave case . loop ii=1 to 5 . compute var( ii) =trunc(RV.UNIFORM(1,5) ) . end loop. end case . end loop . end file . END INPUT PROGRAM . execute . DATASET COPY orig_file . VARSTOCASES /ID=id /MAKE score FROM var1 var2 var3 var4 var5 /INDEX=Index1(5) /KEEP=case /NULL=KEEP. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=case /score_max=MAX(score). if score eq score_max ismax=10 ** (index1-1 ) . recode ismax(10000=5)(1000=40)(100=300)(10=2000)(1=10000) . execute . DATASET DECLARE agg. AGGREGATE /OUTFILE=agg /BREAK=case /ismax=sum(ismax) . match files file=orig_file / file=agg/ by case /drop=ii . execute . Hillel Vardi BGU On 19/10/2011 11:43, wall wrote: > Dear all, > > I have 5 different variables (var1, var2, var3, var4, var 5) which all have > scores from 0 to 10. > > Now I want to compute a new variable (biggest_new) which says: > 1 if var1 has the highest score, > 2 if var2 has the highest score, > 3 if var3 has the highest score, > 4 is var4 has the highest score, and > 5 if var5 has the highest score. > > (if 1 and 2 have the highest score biggest_new = 6 > if 1 and 3 have the highest score biggest_new =7 > .. > .. > ..) > > I tried: > compute biggest = MAX( > var1, var2, var3, var4, var5). > EXECUTE. > > And then in the next step: > > if (biggest eq var1) biggest_new = 1. > if (biggest eq var2) beggest_new = 2. > and so on.... > > but this doesn't work when for example var1 and var4 both have 10. Then the > new variabel biggest just says 10. and then the computing of biggest_new > doesn't work because it says 1 or 4 because both variabels 1 and 4 have > score 10. > > Hope someone can help me out. > > Regards, > Wall > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/compute-new-variable-with-scores-from-different-variables-tp4917091p4917091.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |