|
Hi
I have a row of variable (x1 to X7) and would like to keep for each case just a unique value. If a value appears a second time per case it should be replaced by the value 99 (except the missings). x1 x2 x3 x4 x5 x6 x7 8 8 8 8 8 6 8 8 4 2 4 8 4 8 4 4 2 0 2 0 2 4 5 0 5 . 0 0 8 8 8 8 8 8 8 7 4 2 . 2 . 8 8 7 2 3 4 5 8 2 0 5 0 3 5 . The above should look like this: x1 x2 x3 x4 x5 x6 x7 8 99 99 99 99 6 99 8 4 2 99 99 99 99 4 99 2 0 99 99 99 4 5 0 99 . 99 99 8 99 99 99 99 99 99 7 4 2 . 99 . 8 8 7 2 3 4 5 99 2 0 5 99 3 99 . I tried the following but did not succeed: DATA LIST FREE / x1 x2 x3 x4 x5 x6 x7 . begin data 8 8 8 8 8 6 8 8 4 2 4 8 4 8 4 4 2 0 2 0 2 4 5 0 5 . 0 0 8 8 8 8 8 8 8 7 4 2 . 2 . 8 8 7 2 3 4 5 8 2 0 5 0 3 5 . end data . Format X1 to X7 (F8.0). List. vector n = x1 to x7. COMPUTE #CUR=999. loop #i = 1 to 7. - compute #CUR = n(#i). - loop #k = 2 to 7. - Do if #CUR = n(#k). - Compute n(#k) = 99. - End if. - end loop. end loop. execute. LIST. Any help appreciated. Thanks, Christian ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
My first thought was that you could use the ANY function to look for matches with any of the prior variables. But the way I tried to use it did not work. The following approach is probably not the most efficient way to do it, but I believe it gives the result you want (for the sample data you posted).
VECTOR n = x1 to x7. LOOP #i = 2 to 7. + LOOP #j = 1 to #i-1. + IF n(#i) EQ n(#j) n(#i) = 99. + END LOOP. END LOOP. LIST. Output: x1 x2 x3 x4 x5 x6 x7 8 99 99 99 99 6 99 8 4 2 99 99 99 99 4 99 2 0 99 99 99 4 5 0 99 . 99 99 8 99 99 99 99 99 99 7 4 2 . 99 . 8 8 7 2 3 4 5 99 2 0 5 99 3 99 . Number of cases read: 8 Number of cases listed: 8
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Christian Schmidhauser
Tricky - here is one way to tackle it.
*****************************************************. DATA LIST FREE / x1 x2 x3 x4 x5 x6 x7 . begin data 8 8 8 8 8 6 8 8 4 2 4 8 4 8 4 4 2 0 2 0 2 4 5 0 5 . 0 0 8 8 8 8 8 8 8 7 4 2 . 2 . 8 8 7 2 3 4 5 8 2 0 5 0 3 5 . end data . Formats X1 to X7 (F8.0). *Compute original order. COMPUTE CaseOrd = $casenum. *Reshape wide to long. VARSTOCASES /MAKE x FROM x1 TO x7 /NULL = Keep /INDEX VarOrd (x). *Mark orig values, change others to 99 except for missing. SORT CASES BY CaseOrd x VarOrd. MATCH FILES FILE = * /FIRST = Orig /BY CaseOrd X. IF Orig = 0 AND NOT MISSING(x) x = 99. *Reshape back to wide format, drop extra created variables. CASESTOVARS /ID = CaseOrd /INDEX = VarOrd /DROP Orig CaseOrd. *****************************************************. |
|
This post was updated on .
In reply to this post by Bruce Weaver
Bruce's offer is good, for me, with your permission.
I'm saying it because this solution is used in my old macro !REGMRC (Collection Multiple response tools on http://www.spsstools.net/en/KO-spssmacros) which, among other things, trims a categorical MR set variables from duplicating in codes. The excerpt from the macro body which does that task is very similar to Bruce's vector mrc= !vars. ... loop #i= 1 to #nvars-1. -loop #i2= #i+1 to #nvars. - if mrc(#i)=mrc(#i2) mrc(#i2)= $sysmis. -end loop. end loop. 03.03.2016 16:41, Bruce Weaver пишет: > My first thought was that you could use the ANY function to look for matches > with any of the prior variables. But the way I tried to use it did not > work. The following approach is probably not the most efficient way to do > it, but I believe it gives the result you want (for the sample data you > posted). > > VECTOR n = x1 to x7. > LOOP #i = 2 to 7. > + LOOP #j = 1 to #i-1. > + IF n(#i) EQ n(#j) n(#i) = 99. > + END LOOP. > END LOOP. > LIST. > > Output: > > x1 x2 x3 x4 x5 x6 x7 > > 8 99 99 99 99 6 99 > 8 4 2 99 99 99 99 > 4 99 2 0 99 99 99 > 4 5 0 99 . 99 99 > 8 99 99 99 99 99 99 > 7 4 2 . 99 . 8 > 8 7 2 3 4 5 99 > 2 0 5 99 3 99 . > > Number of cases read: 8 Number of cases listed: 8 > > > > Christian Schmidhauser wrote >> Hi >> I have a row of variable (x1 to X7) and would like to keep for each case >> just a unique value. If a value appears a second time per case it should >> be replaced by the value 99 (except the missings). >> >> >> >> x1 x2 x3 x4 x5 x6 x7 >> 8 8 8 8 8 6 8 >> 8 4 2 4 8 4 8 >> 4 4 2 0 2 0 2 >> 4 5 0 5 . 0 0 >> 8 8 8 8 8 8 8 >> 7 4 2 . 2 . 8 >> 8 7 2 3 4 5 8 >> 2 0 5 0 3 5 . >> >> The above should look like this: >> x1 x2 x3 x4 x5 x6 x7 >> 8 99 99 99 99 6 99 >> 8 4 2 99 99 99 99 >> 4 99 2 0 99 99 99 >> 4 5 0 99 . 99 99 >> 8 99 99 99 99 99 99 >> 7 4 2 . 99 . 8 >> 8 7 2 3 4 5 99 >> 2 0 5 99 3 99 . >> >> >> I tried the following but did not succeed: >> >> DATA LIST FREE / x1 x2 x3 x4 x5 x6 x7 . >> begin data >> 8 8 8 8 8 6 8 >> 8 4 2 4 8 4 8 >> 4 4 2 0 2 0 2 >> 4 5 0 5 . 0 0 >> 8 8 8 8 8 8 8 >> 7 4 2 . 2 . 8 >> 8 7 2 3 4 5 8 >> 2 0 5 0 3 5 . >> end data . >> Format X1 to X7 (F8.0). >> List. >> >> >> vector n = x1 to x7. >> COMPUTE #CUR=999. >> loop #i = 1 to 7. >> - compute #CUR = n(#i). >> - loop #k = 2 to 7. >> - Do if #CUR = n(#k). >> - Compute n(#k) = 99. >> - End if. >> - end loop. >> end loop. >> execute. >> >> LIST. >> >> Any help appreciated. >> Thanks, Christian >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > > > ----- > -- > Bruce Weaver > bweaver@lakeheadu.ca > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Replacing-duplicates-per-case-tp5731651p5731652.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Christian Schmidhauser
Here is a solution that makes use of sets via Python and the SPSSINC TRANS extension command. begin program. def fixrepeats(*args): found = set() args = list(args) for i, v in enumerate(args): if v in found: args[i] = 99 if v is not None: found.add(v) return args end program. spssinc trans result=x1 to x7 /variables x1 to x7 /formula "fixrepeats(<>)". On Thu, Mar 3, 2016 at 3:26 AM, Christian Schmidhauser <[hidden email]> wrote: Hi |
|
Administrator
|
Sorry Jon, I can't resist! ;-)
My VECTOR-LOOP code: 129 characters (with spaces) Python + SPSSINC TRANS: 302 characters (with spaces) Obviously, terseness is not the be-all and end-all. As Art K often points out, readability and ease of maintenance are also extremely important. But frankly, I don't see the Python + SPSSINC TRANS approach being any better on that score either. It might even be worse, at least for folks who are not Python programmers. VECTOR n = x1 to x7. LOOP #i = 2 to 7. + LOOP #j = 1 to #i-1. + IF n(#i) EQ n(#j) n(#i) = 99. + END LOOP. END LOOP. LIST. 129 characters (with spaces) begin program. def fixrepeats(*args): found = set() args = list(args) for i, v in enumerate(args): if v in found: args[i] = 99 if v is not None: found.add(v) return args end program. spssinc trans result=x1 to x7 /variables x1 to x7 /formula "fixrepeats(<>)". 302 characters (with spaces) Character counts are from MS Word.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
The Python solution is a few more keystrokes, but I think the use of the set construct makes it clearer what the code is doing. In this case, the problem is pretty simple anyway, but using set logic in more complicated settings can really clarify things (as well as generally being more efficient). Python is one of the few languages in which sets are a first class construct, including set operations such as membership, intersection, union, and set difference. And since they are represented internally as hash tables with access time independent of the size of the set, they scale very well to large problems. On Thu, Mar 3, 2016 at 9:11 AM, Bruce Weaver <[hidden email]> wrote: Sorry Jon, I can't resist! ;-) |
|
In reply to this post by Bruce Weaver
Let me meddle. I support Bruce. Not because native SPSS syntax
solution is usually shorter (a longer code can be sometimes faster
than a more laconic and nice one) but because it is shame, for me,
to apply a foreign scripting language to do core tasks for which
syntax is meant.
03.03.2016 19:11, Bruce Weaver пишет:
Sorry Jon, I can't resist! ;-) My VECTOR-LOOP code: 129 characters (with spaces) Python + SPSSINC TRANS: 302 characters (with spaces) Obviously, terseness is not the be-all and end-all. As Art K often points out, readability and ease of maintenance are also extremely important. But frankly, I don't see the Python + SPSSINC TRANS approach being any better on that score either. It might even be worse, at least for folks who are not Python programmers. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
In reply to this post by Christian Schmidhauser
- loop #k = 2 to 7.
should be - loop #k = #i+1 to 7. You were VERY close to a solution ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
| Free forum by Nabble | Edit this page |
