I'm checking for duplicated values between pairs of adjacent records and the following code doesn't seem to be working. The 'real' problem is that value(lag(x)) doesn't work when variables have user missing values because lag(x) returns sysmis when x is missing. It seems like lag ought to work differently but that's beside the point. One way around the problem is to take off user missing values. But, I'd like to keep them defined. I could well be missing or not understanding something but if I'm not, is there a syntax method for this problem that doesn't involve stripping off user missing values.
DO IF (EPISODEID NE LAG(EPISODEID)). + COMPUTE DUPROW=-1. ELSE. + COMPUTE DUPROW=0. + DO REPEAT X=DischargeLevelofCare TO SpecEd8 DxA DxB1 TO DxB4 PsychMeds. + COMPUTE #V=LAG(X). + IF (VALUE(X) EQ VALUE(#V)) DUPROW=DUPROW+1. + END REPEAT. END IF. FREQUENCIES DUPROW. Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Gene,
Please mock up and post a simple data set definition to illustrate the problem. Are you attempting to identify adjacent records which are complete duplicates? IF so: MATCH FILES / FILE * / KEEP <VarlistForCaseComparisonDeletingAllOthers>. SORT CASES BY ALL. MATCH FILES /FILE * / BY ALL/ FIRST=@TOP@/LAST=@BOT@. David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Ok.
Id x1 x2 x3 x4 x5 x6 Result 10 5 8 0 9 1 0 -1 11 1 3 9 4 8 2 -1 11 2 3 9 4 8 5 4 12 3 8 8 9 8 0 -1 13 2 5 8 9 9 9 -1 13 2 5 8 9 9 9 6 Missing values x1 to x6(8 9). Re: your code. Depending on how user missing are handled in the match files that seems like it would work. Is, 'all' a valid value for the BY subcommand or do you mean all to reference the case comparison varlist? (I think no but it's always seemed like there are undocumented things in spss.) Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Thursday, May 03, 2012 11:55 AM To: [hidden email] Subject: Re: check adjacent rows with duplicate values Gene, Please mock up and post a simple data set definition to illustrate the problem. Are you attempting to identify adjacent records which are complete duplicates? IF so: MATCH FILES / FILE * / KEEP <VarlistForCaseComparisonDeletingAllOthers>. SORT CASES BY ALL. MATCH FILES /FILE * / BY ALL/ FIRST=@TOP@/LAST=@BOT@. David Maguin, Eugene wrote > > I'm checking for duplicated values between pairs of adjacent records and > the following code doesn't seem to be working. The 'real' problem is that > value(lag(x)) doesn't work when variables have user missing values because > lag(x) returns sysmis when x is missing. It seems like lag ought to work > differently but that's beside the point. One way around the problem is to > take off user missing values. But, I'd like to keep them defined. I could > well be missing or not understanding something but if I'm not, is there a > syntax method for this problem that doesn't involve stripping off user > missing values. > > > DO IF (EPISODEID NE LAG(EPISODEID)). > + COMPUTE DUPROW=-1. > ELSE. > + COMPUTE DUPROW=0. > + DO REPEAT X=DischargeLevelofCare TO SpecEd8 DxA DxB1 TO DxB4 > PsychMeds. > + COMPUTE #V=LAG(X). > + IF (VALUE(X) EQ VALUE(#V)) DUPROW=DUPROW+1. > + END REPEAT. > END IF. > FREQUENCIES DUPROW. > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/check-adjacent-rows-with-duplicate-values-tp5683555p5683588.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Gene,
Simplest construction I could devise as follows: -- DATA LIST FREE /Id x1 x2 x3 x4 x5 x6 Result. BEGIN DATA 10 5 8 0 9 1 0 -1 11 1 3 9 4 8 2 -1 11 2 3 9 4 8 5 4 12 3 8 8 9 8 0 -1 13 2 5 8 9 9 9 -1 13 2 5 8 9 9 9 6 END DATA. MISSING VALUES x1 TO x6(8 9). COMPUTE seq=$CASENUM. SAVE OUTFILE 'tmp'. MISSING VALUES X1 TO x6 (). DO REPEAT v=x1 TO x6. + COMPUTE res=sum(res,(ID EQ LAG(ID))*(v EQ LAG(v))). END REPEAT. UPDATE / FILE * / FILE 'tmp' / BY SEQ. RECODE res (0,SYSMIS=-1). EXE. I know, it is *FUGLY* but it gets the job done. I had thought of TEMP but then LAG won't work (same with CREATE under TEMP). I had my initial doubts about UPDATE wrt User Missing but in fact it works. FYI: ALL is valid on both SORT and MATCH (conditional on not exceeding the number of keys ). OTOH, your desire was to enumerate the number of data matches so the code would only end up flagging on mismatches but not give you the required result. *** it's always seemed like there are undocumented things in spss ***. Yeah, users call them bugs.. When I used to work in tech support we called them 'features'. How's this for undocumented (one of my own special requests for implementation about 15 years ago): PRESERVE. SET ERRORS OFF RESULTS OFF. *-----------*. CORR VAR <huge_list_of_variables_sufficient_to_build_MONSTER_unwanted_pivot_table> / MATRIX OUT (*). RESTORE. MATRIX. party hardy on huge correlation matrix..... END MATRIX. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Using SHIFT VALUES would get around the
missing value problem.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 05/03/2012 01:58 PM Subject: Re: [SPSSX-L] check adjacent rows with duplicate values Sent by: "SPSSX(r) Discussion" <[hidden email]> Gene, Simplest construction I could devise as follows: -- DATA LIST FREE /Id x1 x2 x3 x4 x5 x6 Result. BEGIN DATA 10 5 8 0 9 1 0 -1 11 1 3 9 4 8 2 -1 11 2 3 9 4 8 5 4 12 3 8 8 9 8 0 -1 13 2 5 8 9 9 9 -1 13 2 5 8 9 9 9 6 END DATA. MISSING VALUES x1 TO x6(8 9). COMPUTE seq=$CASENUM. SAVE OUTFILE 'tmp'. MISSING VALUES X1 TO x6 (). DO REPEAT v=x1 TO x6. + COMPUTE res=sum(res,(ID EQ LAG(ID))*(v EQ LAG(v))). END REPEAT. UPDATE / FILE * / FILE 'tmp' / BY SEQ. RECODE res (0,SYSMIS=-1). EXE. I know, it is *FUGLY* but it gets the job done. I had thought of TEMP but then LAG won't work (same with CREATE under TEMP). I had my initial doubts about UPDATE wrt User Missing but in fact it works. FYI: ALL is valid on both SORT and MATCH (conditional on not exceeding the number of keys ). OTOH, your desire was to enumerate the number of data matches so the code would only end up flagging on mismatches but not give you the required result. *** it's always seemed like there are undocumented things in spss ***. Yeah, users call them bugs.. When I used to work in tech support we called them 'features'. How's this for undocumented (one of my own special requests for implementation about 15 years ago): PRESERVE. SET ERRORS OFF RESULTS OFF. *-----------*. CORR VAR <huge_list_of_variables_sufficient_to_build_MONSTER_unwanted_pivot_table> / MATRIX OUT (*). RESTORE. MATRIX. party hardy on huge correlation matrix..... END MATRIX. --- Maguin, Eugene wrote > > Ok. > > Id x1 x2 x3 x4 x5 x6 Result > 10 5 8 0 9 1 0 -1 > 11 1 3 9 4 8 2 -1 > 11 2 3 9 4 8 5 4 > 12 3 8 8 9 8 0 -1 > 13 2 5 8 9 9 9 -1 > 13 2 5 8 9 9 9 6 > > Missing values x1 to x6(8 9). > > Re: your code. Depending on how user missing are handled in the match > files that seems like it would work. Is, 'all' a valid value for the BY > subcommand or do you mean all to reference the case comparison varlist? (I > think no but it's always seemed like there are undocumented things in > spss.) > > Gene Maguin > > > > -----Original Message----- > From: SPSSX(r) Discussion [[hidden email]] On Behalf Of David Marso > Sent: Thursday, May 03, 2012 11:55 AM > To: SPSSX-L@.UGA > Subject: Re: check adjacent rows with duplicate values > > Gene, > Please mock up and post a simple data set definition to illustrate the > problem. > Are you attempting to identify adjacent records which are complete > duplicates? > IF so: > MATCH FILES / FILE * / KEEP <VarlistForCaseComparisonDeletingAllOthers>. > SORT CASES BY ALL. > MATCH FILES /FILE * / BY ALL/ FIRST=@TOP@/LAST=@BOT@. > David > > > Maguin, Eugene wrote >> >> I'm checking for duplicated values between pairs of adjacent records and >> the following code doesn't seem to be working. The 'real' problem is that >> value(lag(x)) doesn't work when variables have user missing values >> because >> lag(x) returns sysmis when x is missing. It seems like lag ought to work >> differently but that's beside the point. One way around the problem is to >> take off user missing values. But, I'd like to keep them defined. I could >> well be missing or not understanding something but if I'm not, is there a >> syntax method for this problem that doesn't involve stripping off user >> missing values. >> >> >> DO IF (EPISODEID NE LAG(EPISODEID)). >> + COMPUTE DUPROW=-1. >> ELSE. >> + COMPUTE DUPROW=0. >> + DO REPEAT X=DischargeLevelofCare TO SpecEd8 DxA DxB1 TO DxB4 >> PsychMeds. >> + COMPUTE #V=LAG(X). >> + IF (VALUE(X) EQ VALUE(#V)) DUPROW=DUPROW+1. >> + END REPEAT. >> END IF. >> FREQUENCIES DUPROW. >> >> Thanks, Gene Maguin >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/check-adjacent-rows-with-duplicate-values-tp5683555p5683588.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/check-adjacent-rows-with-duplicate-values-tp5683555p5684168.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Maguin, Eugene
Late reply I know, but if all you are concerned is losing the missing values meta-data, you may want to look at the
I've always thought the behavior with user defined missing values was somewhat counter-intuitive, and the current agency I'm employed at avoids defining missing values for use cases like yours above (although always includes them in value labels). But the apply dictionary command seems like an easy way to get rid of the missing values and then re-apply them later on if that is what you need. Andy |
Administrator
|
Yes! Apply Dictionary ;-)!!!
----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |