I would like to select the variable name for each case based on the highest value such that:
id v1 v2 v3 newVar 1 1 5 6 v3 2 2 10 9 v2 3 6 7 5 v2
4 5 2 1 v1 I am beginning with VECTOR v =v1 to v3. LOOP #I = v1 TO v3. COMPUTE MX=MAX(v1 to v3). END IF.
END LOOP. |
COMPUTE MX=MAX(v1 to v3). DO REPEAT test= v1 to v3/literal= 'v1', 'v2', 'v3' . IF ( MX= test ) newVar= literal . END REPEAT. -- Rich Ulrich Date: Tue, 10 Sep 2013 13:47:13 -0700 From: [hidden email] Subject: select variable name with greatest value To: [hidden email] I would like to select the variable name for each case based on the highest value such that: id v1 v2 v3 newVar 1 1 5 6 v3 2 2 10 9 v2 3 6 7 5 v2
4 5 2 1 v1 I am beginning with VECTOR v =v1 to v3. LOOP #I = v1 TO v3. COMPUTE MX=MAX(v1 to v3). END IF.
END LOOP. |
Administrator
|
While Rich was posting that, I was also thinking about using DO-REPEAT. Here's my version:
STRING NewVar(a2). DO REPEAT v = v1 to v3 / New = "V1" "V2" "V3". - IF (Max(v1 to v3) EQ v) NewVar = New. END REPEAT. LIST. Output: ID V1 V2 V3 NewVar 1 1 5 6 V3 2 2 10 9 V2 3 6 7 5 V2 4 5 2 1 V1 However, if this was just a simplified example and your actual data file has a lot more than 3 variables to look at, there's probably a better way. I.e., you don't really want to list "V1" "V2" ... "V946" on your DO REPEAT command. ;-) HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
p.s. - You haven't said what you want to do if there are ties.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Just getting back. 1) Yes, this is a simplified version with more than 3 vars :) 2) Ties to other variables? For example, v1a to v3a and v1b to v3b? On Tue, Sep 10, 2013 at 2:59 PM, Bruce Weaver <[hidden email]> wrote: p.s. - You haven't said what you want to do if there are ties. |
Administrator
|
Ties as in tied scores -- as in the new case I've added to your original data below.
id v1 v2 v3 newVar 1 1 5 6 v3 2 2 10 9 v2 3 6 7 5 v2 4 5 2 1 v1 5 4 2 4 v1 or v3? Or maybe "v1, v3"? Not clear what you want here!
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
These values are more distinct. They are transnational data associated with associated with the category (v1 to v3). So not likely to have issue with ties. Say I had 946 variables, would I get rid of DO REPEAT for another command?
On Tue, Sep 10, 2013 at 3:14 PM, Bruce Weaver <[hidden email]> wrote: Ties as in tied scores -- as in the new case I've added to your original data |
Administrator
|
In reply to this post by Rich Ulrich
Of course if two variables are tied it will flag the LAST one.
If this is a problem then you will need to adapt the code. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Peter Spangler
At 04:47 PM 9/10/2013, Peter Spangler wrote:
>I would like to select the variable name for each case based on the >highest value such that: > >id v1 v2 v3 newVar >1 1 5 6 v3 >2 2 10 9 v2 >3 6 7 5 v2 >4 5 2 1 v1 And you've written elsewhere that tied largest values are not a problem; but that you have nearer 900 variables than 3, so the code should be scaleable. Here's code that works by unrolling the file into one record per ID per variable. It doesn't need to be changed when more variables are added, except to change the variable list in VARSTOCASES. |-----------------------------|---------------------------| |Output Created |10-SEP-2013 21:18:32 | |-----------------------------|---------------------------| [Input] id v1 v2 v3 1 1 5 6 2 2 10 9 3 6 7 5 4 5 2 1 Number of cases read: 4 Number of cases listed: 4 DATASET COPY Unroll. DATASET ACTIVATE Unroll WINDOW=FRONT. VARSTOCASES /MAKE Value FROM v1 TO v3 /INDEX = VName(Value) /KEEP = id /NULL = KEEP. SORT CASES BY ID (A) Value (D). AGGREGATE OUTFILE=* /BREAK=ID /HiName 'Variable with largest value' = FIRST(VName) /HiValue 'Highest value encountered' = FIRST(Value). DATASET NAME Biggest WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |10-SEP-2013 21:18:33 | |-----------------------------|---------------------------| [Biggest] id HiName HiValue 1 v3 6 2 v2 10 3 v2 7 4 v1 5 Number of cases read: 4 Number of cases listed: 4 * Now it's simple to add these values to the original file:. MATCH FILES /FILE =Input /TABLE=Biggest /BY ID. DATASET NAME Final WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |10-SEP-2013 21:18:33 | |-----------------------------|---------------------------| [Final] id v1 v2 v3 HiName HiValue 1 1 5 6 v3 6 2 2 10 9 v2 10 3 6 7 5 v2 7 4 5 2 1 v1 5 Number of cases read: 4 Number of cases listed: 4 ======================================================== APPENDIX: All code, and test data (not saved separately) ======================================================== NEW FILE. DATA LIST LIST/ id v1 v2 v3 (4F2). BEGIN DATA 1 1 5 6 v3 2 2 10 9 v2 3 6 7 5 v2 4 5 2 1 v1 END DATA. DATASET NAME Input WINDOW=FRONT. LIST. DATASET COPY Unroll. DATASET ACTIVATE Unroll WINDOW=FRONT. VARSTOCASES /MAKE Value FROM v1 TO v3 /INDEX = VName(Value) /KEEP = id /NULL = KEEP. LIST. SORT CASES BY ID (A) Value (D). AGGREGATE OUTFILE=* /BREAK=ID /HiName 'Variable with largest value' = FIRST(VName) /HiValue 'Highest value encountered' = FIRST(Value). DATASET NAME Biggest WINDOW=FRONT. LIST. * Now it's simple to add these values to the original file:. MATCH FILES /FILE =Input /TABLE=Biggest /BY ID. DATASET NAME Final WINDOW=FRONT. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Peter Spangler
If I had 946 variables, I think I would drop the luxury of hard-coding
the number with a V in front of it as a string variable, and just find the number. Getting that literal string could be an extra step where you compute a string with the number after a prefix. To get a number more efficiently, you could use a LOOP -- Loop until V(#) eq MaxVal; then save the # and BREAK out of the loop. -- Rich Ulrich Date: Tue, 10 Sep 2013 15:24:45 -0700 From: [hidden email] Subject: Re: select variable name with greatest value To: [hidden email] These values are more distinct. They are transnational data associated with associated with the category (v1 to v3). So not likely to have issue with ties. ... [snip, previous]Say I had 946 variables, would I get rid of DO REPEAT for another command?
|
Administrator
|
In reply to this post by Richard Ristow
Very nice, Richard. I cobbled together a macro that also seems to work, but this approach is much tidier.
Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Richard Ristow
Simply fantastic, Richard! Many thanks. On Tue, Sep 10, 2013 at 6:30 PM, Richard Ristow <[hidden email]> wrote:
|
Adding to the heap of solutions, here is
Python solution using the SPSSINC TRANS extension command that does not
require the varstocases transformation. One would change the two
instances of v1 to v3 below as appropriate.
begin program. import spss, spssaux vnames = spssaux.VariableDict().expand("v1 to v3") def nameOfMax(*args): return vnames[args.index(max(args))] end program. spssinc trans result=vname type=64 /variables v1 to v3 /formula "nameOfMax(<>)". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Peter Spangler <[hidden email]> To: [hidden email], Date: 09/11/2013 10:03 AM Subject: Re: [SPSSX-L] select variable name with greatest value Sent by: "SPSSX(r) Discussion" <[hidden email]> Simply fantastic, Richard! Many thanks. On Tue, Sep 10, 2013 at 6:30 PM, Richard Ristow <wrristow@...> wrote: At 04:47 PM 9/10/2013, Peter Spangler wrote: I would like to select the variable name for each case based on the highest value such that: id v1 v2 v3 newVar 1 1 5 6 v3 2 2 10 9 v2 3 6 7 5 v2 4 5 2 1 v1 And you've written elsewhere that tied largest values are not a problem; but that you have nearer 900 variables than 3, so the code should be scaleable. Here's code that works by unrolling the file into one record per ID per variable. It doesn't need to be changed when more variables are added, except to change the variable list in VARSTOCASES. |-----------------------------|---------------------------| |Output Created |10-SEP-2013 21:18:32 | |-----------------------------|---------------------------| [Input] id v1 v2 v3 1 1 5 6 2 2 10 9 3 6 7 5 4 5 2 1 Number of cases read: 4 Number of cases listed: 4 DATASET COPY Unroll. DATASET ACTIVATE Unroll WINDOW=FRONT. VARSTOCASES /MAKE Value FROM v1 TO v3 /INDEX = VName(Value) /KEEP = id /NULL = KEEP. SORT CASES BY ID (A) Value (D). AGGREGATE OUTFILE=* /BREAK=ID /HiName 'Variable with largest value' = FIRST(VName) /HiValue 'Highest value encountered' = FIRST(Value). DATASET NAME Biggest WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |10-SEP-2013 21:18:33 | |-----------------------------|---------------------------| [Biggest] id HiName HiValue 1 v3 6 2 v2 10 3 v2 7 4 v1 5 Number of cases read: 4 Number of cases listed: 4 * Now it's simple to add these values to the original <a href=file:///>file:. MATCH FILES /FILE =Input /TABLE=Biggest /BY ID. DATASET NAME Final WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |10-SEP-2013 21:18:33 | |-----------------------------|---------------------------| [Final] id v1 v2 v3 HiName HiValue 1 1 5 6 v3 6 2 2 10 9 v2 10 3 6 7 5 v2 7 4 5 2 1 v1 5 Number of cases read: 4 Number of cases listed: 4 ======================================================== APPENDIX: All code, and test data (not saved separately) ======================================================== NEW FILE. DATA LIST LIST/ id v1 v2 v3 (4F2). BEGIN DATA 1 1 5 6 v3 2 2 10 9 v2 3 6 7 5 v2 4 5 2 1 v1 END DATA. DATASET NAME Input WINDOW=FRONT. LIST. DATASET COPY Unroll. DATASET ACTIVATE Unroll WINDOW=FRONT. VARSTOCASES /MAKE Value FROM v1 TO v3 /INDEX = VName(Value) /KEEP = id /NULL = KEEP. LIST. SORT CASES BY ID (A) Value (D). AGGREGATE OUTFILE=* /BREAK=ID /HiName 'Variable with largest value' = FIRST(VName) /HiValue 'Highest value encountered' = FIRST(Value). DATASET NAME Biggest WINDOW=FRONT. LIST. * Now it's simple to add these values to the original <a href=file:///>file:. MATCH FILES /FILE =Input /TABLE=Biggest /BY ID. DATASET NAME Final WINDOW=FRONT. LIST. |
Dear Peter,
I wrote a solution (http://www.pythonforspss.org/find-within-subjects-favorite-over-several-variables/) for this a while ago. In case of ties, one of the maximum values is chosen at random and one of the new variables will hold the number of maximum values. So you can easily check afterwards whether there really weren't any ties. It also generates clearly formulated variable and value labels for the newly created variables. Obviously, all this requires a bit more code but after copying-pasting, you need to adjust only one line (which variables to take into account). HTH, Ruben |
Free forum by Nabble | Edit this page |