I may be missing something in the documentation and skimming the output window.
The output nicely tells how many mismatches for a variable. Before I cobble something together I thought I would check with the discussion list. I am looking for (1) a table with a column of IDs, a column with a count of mismatches for the ID, and possibly a column with the % of comparisons that were mismatches. ID # % 1 1 33 2 2 66 (2) addition column(s) in the case by case comparison that give the count and percentage of mismatches for the ID. In the example syntax below they would be between "compare" and x1. the column headings would be Id Active Compare Misses Misses x1 x2 x3 data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 3 2 1 2 2 3 3 2 3 end data. dataset name entry1. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 1 2 2 2 1 3 3 2 3 end data. dataset name entry2. DATASET ACTIVATE entry2. SORT CASES BY id . DATASET ACTIVATE entry1 WINDOW=ASIS. SORT CASES BY id . COMPARE DATASETS /COMPDATASET = entry2 /VARIABLES x1 x2 x3 /CASEID id /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare MATCHDATASET=NO MISMATCHDATASET=NO /OUTPUT VARPROPERTIES=NONE CASETABLE=YES TABLELIMIT=100.
Art Kendall
Social Research Consultants |
You can get a case by case comparison table
that lists all the differences for each variable, but if you just want
a count, have the procedure save the mismatch count - /SAVE FLAGMISMATCHES=YES
VARNAME=CasesCompare - and list that variable.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email] Date: 09/21/2014 05:20 AM Subject: [SPSSX-L] Dataset compare count of variables with differences Sent by: "SPSSX(r) Discussion" <[hidden email]> I may be missing something in the documentation and skimming the output window. The output nicely tells how many mismatches for a variable. Before I cobble something together I thought I would check with the discussion list. I am looking for (1) a table with a column of IDs, a column with a count of mismatches for the ID, and possibly a column with the % of comparisons that were mismatches. ID # % 1 1 33 2 2 66 (2) addition column(s) in the case by case comparison that give the count and percentage of mismatches for the ID. In the example syntax below they would be between "compare" and x1. the column headings would be Id Active Compare Misses Misses x1 x2 x3 data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 3 2 1 2 2 3 3 2 3 end data. dataset name entry1. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 1 2 2 2 1 3 3 2 3 end data. dataset name entry2. DATASET ACTIVATE entry2. SORT CASES BY id . DATASET ACTIVATE entry1 WINDOW=ASIS. SORT CASES BY id . COMPARE DATASETS /COMPDATASET = entry2 /VARIABLES x1 x2 x3 /CASEID id /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare MATCHDATASET=NO MISMATCHDATASET=NO /OUTPUT VARPROPERTIES=NONE CASETABLE=YES TABLELIMIT=100. ----- Art Kendall Social Research Consultants -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Dataset-compare-count-of-variables-with-differences-tp5727333.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks that should work.. I thought I had obtained that info before but could not recall how.
If enough others would need that additional columns in the case by case table, that would be a suggestion for future development.
Art Kendall
Social Research Consultants |
OOPS that was what I already had. It just flags whether or not there was a difference in the comparison.
It is not the number of comparisons that did not match with what is supposed to be the same case entered by two people. What I am looking for is a number that goes from zero to the number of variables that are being compared. for the example syntax I posted I am looking for info like: # is # of mismatches ID # % 1 1 33 2 2 66 ID # % 1 1 33 2 2 66 3 0 0
Art Kendall
Social Research Consultants |
Just compute the sum of the difference
dummies.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email] Date: 09/21/2014 09:33 AM Subject: Re: [SPSSX-L] Dataset compare count of variables with differences Sent by: "SPSSX(r) Discussion" <[hidden email]> OOPS that was what I already had. It just flags whether or not there was a difference in the comparison. It is not the number of comparisons that did not match with what is supposed to be the same case entered by two people. What I am looking for is a number that goes from zero to the number of variables that are being compared. for the example syntax I posted I am looking for info like: # is # of mismatches ID # % 1 1 33 2 2 66 ID # % 1 1 33 2 2 66 3 0 0 ----- Art Kendall Social Research Consultants -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Dataset-compare-count-of-variables-with-differences-tp5727333p5727339.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
in case anybody check the archives in the future, here is the old fashioned way of getting the count of mismatches between pairs cases that are supposed to entry of the same data.
This would augment the info from the COMPARE DATASETS in my first post. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 3 2 1 2 2 3 3 2 3 end data. dataset name entry1. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 1 2 2 2 1 3 3 2 3 end data. dataset name entry2. add files file=entry1 /in=in1 /file = entry2 in=in2. dataset name BothEntries. list. sort cases by id(a). compute mismatches = 0. do repeat myvar=x1 to x3. if in2 and myvar ne lag(myvar) mismatches = mismatches +1. end repeat. list. sort cases by ID(d) in1. do repeat myvar=x1 to x3. if In1 mismatches = lag(mismatches). end repeat. list. temporary. select if in1. frequencies variables = mismatches.
Art Kendall
Social Research Consultants |
At 04:36 PM 9/21/2014, Art Kendall wrote:
>Here is the old fashioned way of getting the count of mismatches >between pairs of cases that are supposed to be entry of the same data. Here, if you like, is a variation that also lists mismatch count by variable -- a quantity that is frequently useful. (It only works if all the variables compared are numeric.) Using your test data: DATASET ACTIVATE entry1 WINDOW=FRONT. DATASET COPY Long1. DATASET ACTIVATE Long1 WINDOW=FRONT. VARSTOCASES /MAKE value FROM x1 x2 x3 /INDEX = VarName(value) /KEEP = id /NULL = KEEP. Variables to Cases |----------------------------|----------------------------| |Output Created |21-SEP-2014 20:13:52 | |----------------------------|----------------------------| [Long1] Generated Variables |-------|------| |Name |Label | |-------|------| |VarName|<none>| |value |<none>| |-------|------| Processing Statistics [suppressed] DATASET ACTIVATE entry2 WINDOW=FRONT. DATASET COPY Long2. DATASET ACTIVATE Long2 WINDOW=FRONT. VARSTOCASES /MAKE value FROM x1 x2 x3 /INDEX = VarName(value) /KEEP = id /NULL = KEEP. Variables to Cases |----------------------------|----------------------------| |Output Created |21-SEP-2014 20:13:53 | |----------------------------|----------------------------| [Long2] Generated Variables |-------|------| |Name |Label | |-------|------| |VarName|<none>| |value |<none>| |-------|------| Processing Statistics [suppressed] MATCH FILES /FILE =Long1 /RENAME=(value=value1) /FILE =Long2 /RENAME=(value=value2) /BY id Varname. COMPUTE MisMatch = value1 NE value2. FORMATS MisMatch (F2). DATASET NAME Compare WINDOW=FRONT. DATASET DECLARE ByID. AGGREGATE OUTFILE=ByID /BREAK = id /MisCount = SUM(MisMatch). DATASET DECLARE ByVar. AGGREGATE OUTFILE=ByVar /BREAK = VarName /MisCount = SUM(MisMatch). DATASET ACTIVATE ByID WINDOW=Front. FORMATS MisCount (F4). LIST. List |-----------------------------|----------------------------| |Output Created |21-SEP-2014 20:13:54 | |-----------------------------|----------------------------| [ByID] id MisCount 1 1 2 2 3 0 Number of cases read: 3 Number of cases listed: 3 DATASET ACTIVATE ByVar WINDOW=Front. FORMATS MisCount (F4). LIST. List |-----------------------------|----------------------------| |Output Created |21-SEP-2014 20:13:55 | |-----------------------------|----------------------------| [ByVar] VarName MisCount x1 1 x2 0 x3 2 Number of cases read: 3 Number of cases listed: 3 ================================ APPENDIX: Test data and all code ================================ data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 3 2 1 2 2 3 3 2 3 end data. dataset name entry1. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 1 2 2 2 1 3 3 2 3 end data. dataset name entry2. DATASET ACTIVATE entry1 WINDOW=FRONT. DATASET COPY Long1. DATASET ACTIVATE Long1 WINDOW=FRONT. VARSTOCASES /MAKE value FROM x1 x2 x3 /INDEX = VarName(value) /KEEP = id /NULL = KEEP. . /**/ LIST /*-*/. DATASET ACTIVATE entry2 WINDOW=FRONT. DATASET COPY Long2. DATASET ACTIVATE Long2 WINDOW=FRONT. VARSTOCASES /MAKE value FROM x1 x2 x3 /INDEX = VarName(value) /KEEP = id /NULL = KEEP. . /**/ LIST /*-*/. MATCH FILES /FILE =Long1 /RENAME=(value=value1) /FILE =Long2 /RENAME=(value=value2) /BY id Varname. COMPUTE MisMatch = value1 NE value2. FORMATS MisMatch (F2). DATASET NAME Compare WINDOW=FRONT. DATASET DECLARE ByID. AGGREGATE OUTFILE=ByID /BREAK = id /MisCount = SUM(MisMatch). DATASET DECLARE ByVar. AGGREGATE OUTFILE=ByVar /BREAK = VarName /MisCount = SUM(MisMatch). DATASET ACTIVATE ByID WINDOW=Front. FORMATS MisCount (F4). LIST. DATASET ACTIVATE ByVar WINDOW=Front. FORMATS MisCount (F4). LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks for your post.
I was looking to augment the information from COMPARE DATASETS. You syntax will be very useful for the many people who are using versions of SPSS before that command was available.
Art Kendall
Social Research Consultants |
Administrator
|
While we are at it, here's a MATRIX solution.
NEW FILE. DATASET CLOSE ALL. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 3 2 1 2 2 3 3 2 3 end data. dataset name entry1. SAVE OUTFILE "C:\TEMP\entry1.sav". data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 1 2 2 2 1 3 3 2 3 end data. dataset name entry2. SAVE OUTFILE "C:\TEMP\entry2.sav". DATASET DECLARE iddiff. DATASET DECLARE vardiff. MATRIX. GET data1 /FILE "C:\TEMP\entry1.sav"/VARIABLES x1 TO x3/NAMES=varnames. GET data2/FILE "C:\TEMP\entry2.sav"/VARIABLES x1 TO x3. GET id / /FILE "C:\TEMP\entry1.sav"/VARIABLES id. COMPUTE nc=NCOL(data1). COMPUTE diff=data1-data2. LOOP r=1 TO NROW(diff). LOOP c=1 TO NCOL(diff). DO IF diff(r,c) NE 0 . COMPUTE diff(r,c)=1. END IF. END LOOP. END LOOP. COMPUTE vdiff=CSUM(diff). COMPUTE idiff=RSUM(diff). COMPUTE vdiff={vdiff;vdiff/NROW(data1)}. COMPUTE idiff={id,idiff,idiff/nc}. SAVE vdiff /OUTFILE vardiff /NAMES=varnames. SAVE idiff /OUTFILE iddiff /VARIABLES id ndiff pctdiff. END MATRIX.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |