SPSSX Discussion

Dataset compare count of variables with differences

Classic

List

Threaded

9 messages Options

Art Kendall

Dataset compare count of variables with differences

Art Kendall
Social Research Consultants

Jon K Peck

Re: Dataset compare count of variables with differences

You can get a case by case comparison table that lists all the differences for each variable, but if you just want a count, have the procedure save the mismatch count - /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare - and list that variable.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Art Kendall <[hidden email]>
To: [hidden email]
Date: 09/21/2014 05:20 AM
Subject: [SPSSX-L] Dataset compare count of variables with differences
Sent by: "SPSSX(r) Discussion" <[hidden email]>

I may be missing something in the documentation and skimming the output window. The output nicely tells how many mismatches for a variable. Before I cobble something together I thought I would check with the discussion list. I am looking for (1) a table with a column of IDs, a column with a count of mismatches for the ID, and possibly a column with the % of comparisons that were mismatches. ID # % 1 1 33 2 2 66 (2) addition column(s) in the case by case comparison that give the count and percentage of mismatches for the ID. In the example syntax below they would be between "compare" and x1. the column headings would be Id Active Compare Misses Misses x1 x2 x3 data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 3 2 1 2 2 3 3 2 3 end data. dataset name entry1. data list list /id (f2) x1 to x3 (3f1). begin data 1 1 2 1 2 2 2 1 3 3 2 3 end data. dataset name entry2. DATASET ACTIVATE entry2. SORT CASES BY id . DATASET ACTIVATE entry1 WINDOW=ASIS. SORT CASES BY id . COMPARE DATASETS /COMPDATASET = entry2 /VARIABLES x1 x2 x3 /CASEID id /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare MATCHDATASET=NO MISMATCHDATASET=NO /OUTPUT VARPROPERTIES=NONE CASETABLE=YES TABLELIMIT=100. ----- Art Kendall Social Research Consultants -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Dataset-compare-count-of-variables-with-differences-tp5727333.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall

Re: Dataset compare count of variables with differences

Thanks that should work.. I thought I had obtained that info before but could not recall how.

If enough others would need that additional columns in the case by case table, that would be a suggestion for future development.

Art Kendall
Social Research Consultants

Art Kendall

Re: Dataset compare count of variables with differences

Art Kendall
Social Research Consultants

Jon K Peck

Re: Dataset compare count of variables with differences

Just compute the sum of the difference dummies.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Art Kendall <[hidden email]>
To: [hidden email]
Date: 09/21/2014 09:33 AM
Subject: Re: [SPSSX-L] Dataset compare count of variables with differences
Sent by: "SPSSX(r) Discussion" <[hidden email]>

OOPS that was what I already had. It just flags whether or not there was a difference in the comparison. It is not the number of comparisons that did not match with what is supposed to be the same case entered by two people. What I am looking for is a number that goes from zero to the number of variables that are being compared. for the example syntax I posted I am looking for info like: # is # of mismatches ID # % 1 1 33 2 2 66 ID # % 1 1 33 2 2 66 3 0 0 ----- Art Kendall Social Research Consultants -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Dataset-compare-count-of-variables-with-differences-tp5727333p5727339.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall

Re: Dataset compare count of variables with differences

in case anybody check the archives in the future, here is the old fashioned way of getting the count of mismatches between pairs cases that are supposed to entry of the same data.

This would augment the info from the COMPARE DATASETS in my first post.
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 3
2 1 2 2
3 3 2 3
end data.
dataset name entry1.
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 1
2 2 2 1
3 3 2 3
end data.
dataset name entry2.

add files file=entry1 /in=in1
/file = entry2 in=in2.
dataset name BothEntries.
list.

sort cases by id(a).
compute mismatches = 0.
do repeat myvar=x1 to x3.
if in2 and myvar ne lag(myvar) mismatches = mismatches +1.
end repeat.
list.
sort cases by ID(d) in1.
do repeat myvar=x1 to x3.
if In1 mismatches = lag(mismatches).
end repeat.
list.
temporary.
select if in1.
frequencies variables = mismatches.

Art Kendall
Social Research Consultants

Richard Ristow

Re: Dataset compare count of variables with differences

At 04:36 PM 9/21/2014, Art Kendall wrote:

>Here is the old fashioned way of getting the count of mismatches
>between pairs of cases that are supposed to be entry of the same data.

Here, if you like, is a variation that also lists mismatch count by
variable -- a quantity that is frequently useful. (It only works if
all the variables compared are numeric.) Using your test data:

DATASET ACTIVATE entry1 WINDOW=FRONT.
DATASET COPY Long1.
DATASET ACTIVATE Long1 WINDOW=FRONT.
VARSTOCASES
/MAKE value FROM x1 x2 x3
/INDEX = VarName(value)
/KEEP = id
/NULL = KEEP.

Variables to Cases
|----------------------------|----------------------------|
|Output Created |21-SEP-2014 20:13:52 |
|----------------------------|----------------------------|
[Long1]

Generated Variables
|-------|------|
|Name |Label |
|-------|------|
|VarName|<none>|
|value |<none>|
|-------|------|

Processing Statistics [suppressed]

DATASET ACTIVATE entry2 WINDOW=FRONT.
DATASET COPY Long2.
DATASET ACTIVATE Long2 WINDOW=FRONT.
VARSTOCASES
/MAKE value FROM x1 x2 x3
/INDEX = VarName(value)
/KEEP = id
/NULL = KEEP.

Variables to Cases
|----------------------------|----------------------------|
|Output Created |21-SEP-2014 20:13:53 |
|----------------------------|----------------------------|
[Long2]

Generated Variables
|-------|------|
|Name |Label |
|-------|------|
|VarName|<none>|
|value |<none>|
|-------|------|

Processing Statistics [suppressed]

MATCH FILES
/FILE =Long1
/RENAME=(value=value1)
/FILE =Long2
/RENAME=(value=value2)
/BY id Varname.

COMPUTE MisMatch = value1 NE value2.
FORMATS MisMatch (F2).

DATASET NAME Compare WINDOW=FRONT.

DATASET DECLARE ByID.
AGGREGATE OUTFILE=ByID
/BREAK = id
/MisCount = SUM(MisMatch).

DATASET DECLARE ByVar.
AGGREGATE OUTFILE=ByVar
/BREAK = VarName
/MisCount = SUM(MisMatch).

DATASET ACTIVATE ByID WINDOW=Front.
FORMATS MisCount (F4).
LIST.

List
|-----------------------------|----------------------------|
|Output Created |21-SEP-2014 20:13:54 |
|-----------------------------|----------------------------|
[ByID]

id MisCount

1 1
2 2
3 0

Number of cases read: 3 Number of cases listed: 3

DATASET ACTIVATE ByVar WINDOW=Front.
FORMATS MisCount (F4).
LIST.

List
|-----------------------------|----------------------------|
|Output Created |21-SEP-2014 20:13:55 |
|-----------------------------|----------------------------|
[ByVar]

VarName MisCount

x1 1
x2 0
x3 2

Number of cases read: 3 Number of cases listed: 3
================================
APPENDIX: Test data and all code
================================
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 3
2 1 2 2
3 3 2 3
end data.
dataset name entry1.
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 1
2 2 2 1
3 3 2 3
end data.
dataset name entry2.

DATASET ACTIVATE entry1 WINDOW=FRONT.

DATASET COPY Long1.
DATASET ACTIVATE Long1 WINDOW=FRONT.

VARSTOCASES
/MAKE value FROM x1 x2 x3
/INDEX = VarName(value)
/KEEP = id
/NULL = KEEP.

. /**/ LIST /*-*/.

DATASET ACTIVATE entry2 WINDOW=FRONT.

DATASET COPY Long2.
DATASET ACTIVATE Long2 WINDOW=FRONT.

VARSTOCASES
/MAKE value FROM x1 x2 x3
/INDEX = VarName(value)
/KEEP = id
/NULL = KEEP.

. /**/ LIST /*-*/.

MATCH FILES
/FILE =Long1
/RENAME=(value=value1)
/FILE =Long2
/RENAME=(value=value2)
/BY id Varname.

COMPUTE MisMatch = value1 NE value2.
FORMATS MisMatch (F2).

DATASET NAME Compare WINDOW=FRONT.

DATASET DECLARE ByID.

AGGREGATE OUTFILE=ByID
/BREAK = id
/MisCount = SUM(MisMatch).

DATASET DECLARE ByVar.

AGGREGATE OUTFILE=ByVar
/BREAK = VarName
/MisCount = SUM(MisMatch).

DATASET ACTIVATE ByID WINDOW=Front.
FORMATS MisCount (F4).
LIST.

DATASET ACTIVATE ByVar WINDOW=Front.
FORMATS MisCount (F4).
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Dataset compare count of variables with differences

Thanks for your post.

I was looking to augment the information from COMPARE DATASETS.

You syntax will be very useful for the many people who are using versions of SPSS before that command was available.

Art Kendall
Social Research Consultants

David Marso

Re: Dataset compare count of variables with differences

Administrator

While we are at it, here's a MATRIX solution.
NEW FILE.
DATASET CLOSE ALL.
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 3
2 1 2 2
3 3 2 3
end data.
dataset name entry1.
SAVE OUTFILE "C:\TEMP\entry1.sav".
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 1
2 2 2 1
3 3 2 3
end data.
dataset name entry2.
SAVE OUTFILE "C:\TEMP\entry2.sav".
DATASET DECLARE iddiff.
DATASET DECLARE vardiff.
MATRIX.
GET data1 /FILE "C:\TEMP\entry1.sav"/VARIABLES x1 TO x3/NAMES=varnames.
GET data2/FILE "C:\TEMP\entry2.sav"/VARIABLES x1 TO x3.
GET id / /FILE "C:\TEMP\entry1.sav"/VARIABLES id.
COMPUTE nc=NCOL(data1).
COMPUTE diff=data1-data2.
LOOP r=1 TO NROW(diff).
LOOP c=1 TO NCOL(diff).
DO IF diff(r,c) NE 0 .
COMPUTE diff(r,c)=1.
END IF.
END LOOP.
END LOOP.
COMPUTE vdiff=CSUM(diff).
COMPUTE idiff=RSUM(diff).
COMPUTE vdiff={vdiff;vdiff/NROW(data1)}.
COMPUTE idiff={id,idiff,idiff/nc}.
SAVE vdiff /OUTFILE vardiff /NAMES=varnames.
SAVE idiff /OUTFILE iddiff /VARIABLES id ndiff pctdiff.
END MATRIX.

Art Kendall wrote

Thanks for your post.

I was looking to augment the information from COMPARE DATASETS.

You syntax will be very useful for the many people who are using versions of SPSS before that command was available.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"