Dataset compare count of variables with differences

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Dataset compare count of variables with differences

Art Kendall
I may be missing something in the documentation and skimming the  output window.
The output nicely tells how many mismatches for a variable.
Before I cobble something together I thought I would check with the discussion list.

 I am looking for
 (1) a table with a column of IDs, a column with a count of mismatches for the ID, and possibly a column with the  % of comparisons that were mismatches.
ID # %
1   1 33
2   2 66

(2)  addition column(s) in the case by case comparison that give the count and percentage of mismatches for the ID.

In the example syntax below they would be between "compare" and x1.
the column headings would be
Id Active Compare Misses Misses x1 x2 x3

data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 3
2 1 2 2
3 3 2 3
end data.
dataset name entry1.
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 1
2 2 2 1
3 3 2 3
end data.
dataset name entry2.

DATASET ACTIVATE entry2.

SORT CASES BY id .

DATASET ACTIVATE entry1 WINDOW=ASIS.

SORT CASES BY id .
 
COMPARE DATASETS  
  /COMPDATASET = entry2
  /VARIABLES  x1 x2 x3
  /CASEID  id
  /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare MATCHDATASET=NO MISMATCHDATASET=NO
  /OUTPUT VARPROPERTIES=NONE CASETABLE=YES TABLELIMIT=100.



Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Jon K Peck
You can get a case by case comparison table that lists all the differences for each variable, but if you just want a count, have the procedure save the mismatch count - /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare - and list that variable.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        09/21/2014 05:20 AM
Subject:        [SPSSX-L] Dataset compare  count of variables with differences
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I may be missing something in the documentation and skimming the  output
window.
The output nicely tells how many mismatches for a variable.
Before I cobble something together I thought I would check with the
discussion list.

I am looking for
(1) a table with a column of IDs, a column with a count of mismatches for
the ID, and possibly a column with the  % of comparisons that were
mismatches.
ID # %
1   1 33
2   2 66

(2)  addition column(s) in the case by case comparison that give the count
and percentage of mismatches for the ID.

In the example syntax below they would be between "compare" and x1.
the column headings would be
Id Active Compare Misses Misses x1 x2 x3

data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 3
2 1 2 2
3 3 2 3
end data.
dataset name entry1.
data list list /id (f2) x1 to x3 (3f1).
begin data
1 1 2 1
2 2 2 1
3 3 2 3
end data.
dataset name entry2.

DATASET ACTIVATE entry2.

SORT CASES BY id .

DATASET ACTIVATE entry1 WINDOW=ASIS.

SORT CASES BY id .

COMPARE DATASETS  
 /COMPDATASET = entry2
 /VARIABLES  x1 x2 x3
 /CASEID  id
 /SAVE FLAGMISMATCHES=YES VARNAME=CasesCompare MATCHDATASET=NO
MISMATCHDATASET=NO
 /OUTPUT VARPROPERTIES=NONE CASETABLE=YES TABLELIMIT=100.







-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Dataset-compare-count-of-variables-with-differences-tp5727333.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Art Kendall
Thanks that should work.. I thought I had obtained that info before but could not recall how.

If enough others would need that additional columns in the case by case table, that would be a suggestion for future development.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Art Kendall
OOPS that was what I already had. It just flags whether or not there was a difference in the comparison.

It is not the number of comparisons that did not match with what is supposed to be the same case entered by two people.
What I am looking for is a number that goes from zero to the number of variables that are being compared.


for the example syntax I posted I am looking for info like: # is # of mismatches

ID # %
1   1 33
2   2 66

ID # %
1   1 33
2   2 66
3   0   0
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Jon K Peck
Just compute the sum of the difference dummies.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        09/21/2014 09:33 AM
Subject:        Re: [SPSSX-L] Dataset compare  count of variables with differences
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




OOPS that was what I already had. It just flags whether or not there was a
difference in the comparison.

It is not the number of comparisons that did not match with what is supposed
to be the same case entered by two people.
What I am looking for is a number that goes from zero to the number of
variables that are being compared.


for the example syntax I posted I am looking for info like: # is # of
mismatches

ID # %
1   1 33
2   2 66

ID # %
1   1 33
2   2 66
3   0   0




-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Dataset-compare-count-of-variables-with-differences-tp5727333p5727339.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Art Kendall
in case anybody check the archives in the future, here is the old fashioned way of getting the count of mismatches between pairs cases that are supposed to entry of the same data.

This would augment the info from the COMPARE DATASETS in my first post.
data list list /id (f2) x1 to x3 (3f1).
begin data
   1 1 2 3
   2 1 2 2
   3 3 2 3
end data.
dataset name entry1.
data list list /id (f2) x1 to x3 (3f1).
begin data
   1 1 2 1
   2 2 2 1
   3 3 2 3
end data.
dataset name entry2.

add files file=entry1 /in=in1
   /file = entry2 in=in2.
dataset name BothEntries.
list.

sort cases by id(a).
compute mismatches = 0.
do repeat myvar=x1 to x3.
   if in2 and myvar ne lag(myvar) mismatches = mismatches +1.
end repeat.
list.
sort cases by ID(d) in1.
do repeat myvar=x1 to x3.
   if In1 mismatches = lag(mismatches).
end repeat.
list.
temporary.
select if  in1.
frequencies variables = mismatches.

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Richard Ristow
At 04:36 PM 9/21/2014, Art Kendall wrote:

>Here is the old fashioned way of getting the count of mismatches
>between pairs of cases that are supposed to be entry of the same data.

Here, if you like, is a variation that also lists mismatch count by
variable -- a quantity that is frequently useful. (It only works if
all the variables compared are numeric.) Using your test data:

DATASET ACTIVATE  entry1 WINDOW=FRONT.
DATASET COPY      Long1.
DATASET ACTIVATE  Long1  WINDOW=FRONT.
VARSTOCASES
    /MAKE value FROM x1 x2 x3
    /INDEX = VarName(value)
    /KEEP  =  id
    /NULL  = KEEP.


Variables to Cases
|----------------------------|----------------------------|
|Output Created              |21-SEP-2014  20:13:52       |
|----------------------------|----------------------------|
  [Long1]

Generated Variables
|-------|------|
|Name   |Label |
|-------|------|
|VarName|<none>|
|value  |<none>|
|-------|------|

Processing Statistics [suppressed]


DATASET ACTIVATE  entry2 WINDOW=FRONT.
DATASET COPY      Long2.
DATASET ACTIVATE  Long2  WINDOW=FRONT.
VARSTOCASES
    /MAKE value FROM x1 x2 x3
    /INDEX = VarName(value)
    /KEEP  =  id
    /NULL  = KEEP.

Variables to Cases
|----------------------------|----------------------------|
|Output Created              |21-SEP-2014  20:13:53       |
|----------------------------|----------------------------|
  [Long2]

Generated Variables
|-------|------|
|Name   |Label |
|-------|------|
|VarName|<none>|
|value  |<none>|
|-------|------|

Processing Statistics [suppressed]


MATCH FILES
    /FILE  =Long1
    /RENAME=(value=value1)
    /FILE  =Long2
    /RENAME=(value=value2)
    /BY id Varname.

COMPUTE   MisMatch = value1 NE value2.
FORMATS   MisMatch (F2).

DATASET NAME      Compare WINDOW=FRONT.

DATASET DECLARE   ByID.
AGGREGATE OUTFILE=ByID
    /BREAK    = id
    /MisCount = SUM(MisMatch).


DATASET DECLARE   ByVar.
AGGREGATE OUTFILE=ByVar
    /BREAK    = VarName
    /MisCount = SUM(MisMatch).

DATASET ACTIVATE  ByID  WINDOW=Front.
FORMATS MisCount (F4).
LIST.

List
|-----------------------------|----------------------------|
|Output Created               |21-SEP-2014  20:13:54       |
|-----------------------------|----------------------------|
  [ByID]

id MisCount

  1      1
  2      2
  3      0

Number of cases read:  3    Number of cases listed:  3


DATASET ACTIVATE  ByVar WINDOW=Front.
FORMATS MisCount (F4).
LIST.

List
|-----------------------------|----------------------------|
|Output Created               |21-SEP-2014  20:13:55       |
|-----------------------------|----------------------------|
  [ByVar]


VarName MisCount

x1           1
x2           0
x3           2

Number of cases read:  3    Number of cases listed:  3
================================
APPENDIX: Test data and all code
================================
data list list /id (f2) x1 to x3 (3f1).
begin data
    1 1 2 3
    2 1 2 2
    3 3 2 3
end data.
dataset name entry1.
data list list /id (f2) x1 to x3 (3f1).
begin data
    1 1 2 1
    2 2 2 1
    3 3 2 3
end data.
dataset name entry2.

DATASET ACTIVATE  entry1 WINDOW=FRONT.

DATASET COPY      Long1.
DATASET ACTIVATE  Long1  WINDOW=FRONT.

VARSTOCASES
    /MAKE value FROM x1 x2 x3
    /INDEX = VarName(value)
    /KEEP  =  id
    /NULL  = KEEP.

.  /**/  LIST /*-*/.


DATASET ACTIVATE  entry2 WINDOW=FRONT.

DATASET COPY      Long2.
DATASET ACTIVATE  Long2  WINDOW=FRONT.

VARSTOCASES
    /MAKE value FROM x1 x2 x3
    /INDEX = VarName(value)
    /KEEP  =  id
    /NULL  = KEEP.

.  /**/  LIST /*-*/.

MATCH FILES
    /FILE  =Long1
    /RENAME=(value=value1)
    /FILE  =Long2
    /RENAME=(value=value2)
    /BY id Varname.

COMPUTE   MisMatch = value1 NE value2.
FORMATS   MisMatch (F2).

DATASET NAME      Compare WINDOW=FRONT.

DATASET DECLARE   ByID.

AGGREGATE OUTFILE=ByID
    /BREAK    = id
    /MisCount = SUM(MisMatch).


DATASET DECLARE   ByVar.

AGGREGATE OUTFILE=ByVar
    /BREAK    = VarName
    /MisCount = SUM(MisMatch).

DATASET ACTIVATE  ByID  WINDOW=Front.
FORMATS MisCount (F4).
LIST.


DATASET ACTIVATE  ByVar WINDOW=Front.
FORMATS MisCount (F4).
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

Art Kendall
Thanks for your post.

I was looking to augment the information from COMPARE DATASETS.

You syntax will be very useful for the many people who are using versions of SPSS before that command was available.


Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Dataset compare count of variables with differences

David Marso
Administrator
While we are at it, here's a MATRIX solution.
NEW FILE.
DATASET CLOSE ALL.
data list list /id (f2) x1 to x3 (3f1).
begin data
    1 1 2 3
    2 1 2 2
    3 3 2 3
end data.
dataset name entry1.
SAVE OUTFILE "C:\TEMP\entry1.sav".
data list list /id (f2) x1 to x3 (3f1).
begin data
    1 1 2 1
    2 2 2 1
    3 3 2 3
end data.
dataset name entry2.
SAVE OUTFILE  "C:\TEMP\entry2.sav".
DATASET DECLARE iddiff.
DATASET DECLARE vardiff.
MATRIX.
GET data1 /FILE  "C:\TEMP\entry1.sav"/VARIABLES x1 TO x3/NAMES=varnames.
GET data2/FILE  "C:\TEMP\entry2.sav"/VARIABLES x1 TO x3.
GET id /  /FILE  "C:\TEMP\entry1.sav"/VARIABLES id.
COMPUTE nc=NCOL(data1).
COMPUTE diff=data1-data2.
LOOP r=1 TO NROW(diff).
LOOP c=1 TO NCOL(diff).
DO IF diff(r,c) NE 0 .
COMPUTE diff(r,c)=1.
END IF.
END LOOP.
END LOOP.
COMPUTE vdiff=CSUM(diff).
COMPUTE idiff=RSUM(diff).
COMPUTE vdiff={vdiff;vdiff/NROW(data1)}.
COMPUTE idiff={id,idiff,idiff/nc}.
SAVE vdiff /OUTFILE vardiff /NAMES=varnames.
SAVE idiff /OUTFILE iddiff /VARIABLES id ndiff pctdiff.
END MATRIX.









Art Kendall wrote
Thanks for your post.

I was looking to augment the information from COMPARE DATASETS.

You syntax will be very useful for the many people who are using versions of SPSS before that command was available.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"