Comparing two records

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Comparing two records

Mark Antrobus
Hi All,

I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are...

I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric.

Thanks,
Kent Bowers.
Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Jon K Peck
The SPSSINC COMPARE DATASETS extension command can compare two datasets.  It can produce a summary table in the Viewer and add variables to the dataset with information on differences.  You can compare the metadata and/or the case values.  It respects any filters set on the data, so you can compare a subset of records.  The command has a dialog box interface (Data>Compare Datasets) as well as syntax.  Here's an example.

SPSSINC COMPARE DATASETS  DS2=DataSet2 VARIABLES=x y z
/DATA ID = id DIFFCOUNT=differences
/DICTIONARY ATTRIBUTES FORMAT INDEX MEASLEVEL MISSINGVALUES TYPE VARLABEL VALUELABELS.

This requires the Python Essentials/plugin, which is available from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) or, for older versions, from DevCentral (www.spss.com/devcentral).

With Statistics 19, this command is automatically installed with the Essentials.  For older versions, it also needs to be downloaded from the SPSS Community and installed.

REMINDER: I have been getting a steady stream of "where did it go" questions about DevCentral content.  Most of the material has been moved to the SPSS Community and is no longer available on the old site.  It says this on the front page of DevCentral, but people are overlooking this.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Mark Antrobus <[hidden email]>
To:        [hidden email]
Date:        01/27/2011 07:51 AM
Subject:        [SPSSX-L] Comparing two records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi All,

I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are...

I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric.

Thanks,
Kent Bowers.

Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Bruce Weaver
Administrator
In reply to this post by Mark Antrobus
How about Data --> Identify Duplicate Cases (in a stacked data file with two rows per case, one for each data entry person)?  The more common use of this is to find unwanted duplicates.  In your case, you're looking for problematic non-duplicates.  Also see examples 3 & 4 here:

  http://spsstools.net/SampleSyntax.htm#Matching

HTH.


Mark Antrobus wrote
Hi All,

I want to compare two records which have been manually entered for
discrepancies between the two. There are about 180 variables, both string
and numeric, and the strings are all of various lengths. Any ideas on how to
achieve this? First thing would be just a count of how many differences
there are...

I wanted to use a VECTOR and loop through the complete variable list, but
this only accepts all string (of the same length) or all numeric.

Thanks,
Kent Bowers.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Mark Antrobus
In reply to this post by Jon K Peck
Thanks Jon. Where can I find the detailed explanation of these commands? For instance what do all the sub-commands do? I've looked on the new IBM site but there's nowhere obvious to look...

Thanks,
Antro.

On 27 January 2011 09:13, Jon K Peck <[hidden email]> wrote:
The SPSSINC COMPARE DATASETS extension command can compare two datasets.  It can produce a summary table in the Viewer and add variables to the dataset with information on differences.  You can compare the metadata and/or the case values.  It respects any filters set on the data, so you can compare a subset of records.  The command has a dialog box interface (Data>Compare Datasets) as well as syntax.  Here's an example.

SPSSINC COMPARE DATASETS  DS2=DataSet2 VARIABLES=x y z
/DATA ID = id DIFFCOUNT=differences
/DICTIONARY ATTRIBUTES FORMAT INDEX MEASLEVEL MISSINGVALUES TYPE VARLABEL VALUELABELS.

This requires the Python Essentials/plugin, which is available from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) or, for older versions, from DevCentral (www.spss.com/devcentral).

With Statistics 19, this command is automatically installed with the Essentials.  For older versions, it also needs to be downloaded from the SPSS Community and installed.

REMINDER: I have been getting a steady stream of "where did it go" questions about DevCentral content.  Most of the material has been moved to the SPSS Community and is no longer available on the old site.  It says this on the front page of DevCentral, but people are overlooking this.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Mark Antrobus <[hidden email]>
To:        [hidden email]
Date:        01/27/2011 07:51 AM
Subject:        [SPSSX-L] Comparing two records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi All,

I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are...

I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric.

Thanks,
Kent Bowers.


Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Jon K Peck
1. You can use the help in the dialog box.
2. You can run any extension command created by SPSS with the /HELP subcommand to see the detailed syntax help in the Viewer, e.g.,
SPSSINC COMPARE DATASETS /HELP.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Mark Antrobus <[hidden email]>
To:        [hidden email]
Date:        01/27/2011 09:55 AM
Subject:        Re: [SPSSX-L] Comparing two records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thanks Jon. Where can I find the detailed explanation of these commands? For instance what do all the sub-commands do? I've looked on the new IBM site but there's nowhere obvious to look...

Thanks,
Antro.

On 27 January 2011 09:13, Jon K Peck <peck@...> wrote:
The SPSSINC COMPARE DATASETS extension command can compare two datasets.  It can produce a summary table in the Viewer and add variables to the dataset with information on differences.  You can compare the metadata and/or the case values.  It respects any filters set on the data, so you can compare a subset of records.  The command has a dialog box interface (Data>Compare Datasets) as well as syntax.  Here's an example.

SPSSINC COMPARE DATASETS  DS2=DataSet2 VARIABLES=x y z

/DATA ID = id DIFFCOUNT=differences
/DICTIONARY ATTRIBUTES FORMAT INDEX MEASLEVEL MISSINGVALUES TYPE VARLABEL VALUELABELS.


This requires the Python Essentials/plugin, which is available from the SPSS Community (
www.ibm.com/developerworks/spssdevcentral) or, for older versions, from DevCentral (www.spss.com/devcentral).

With Statistics 19, this command is automatically installed with the Essentials.  For older versions, it also needs to be downloaded from the SPSS Community and installed.


REMINDER: I have been getting a steady stream of "where did it go" questions about DevCentral content.  Most of the material has been moved to the SPSS Community and is no longer available on the old site.  It says this on the front page of DevCentral, but people are overlooking this.


HTH,


Jon Peck
Senior Software Engineer, IBM

peck@...
312-651-3435




From:        
Mark Antrobus <bzcensus@...>
To:        
[hidden email]
Date:        
01/27/2011 07:51 AM
Subject:        
[SPSSX-L] Comparing two records
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>





Hi All,

I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are...

I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric.

Thanks,
Kent Bowers.


Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Julie
In reply to this post by Jon K Peck
Hi Jon,

I am also trying to compare two datasets and I was wondering if you could help me.
Sorry for the very basic question, but I am not familiar with using the syntax. The output keeps stating "invalid variable or TO usage". I am unsure which variable to write behind DATA ID =
I am not more successful when using the  command (I am using SPSS 19).
Many thanks,
Julie
Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Jon K Peck
To get a definitive answer, you need to post the syntax that is failing, but the error means that the variable name given for the ID variable does not match a variable in the dataset.  One common reason for this is a mismatch in case.  If your variable is named ID in SPSS, you must write it in capitals in the compare datasets in command.  This also applies to dataset names.

HTH,

Jon Peck (no "h")
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Julie <[hidden email]>
To:        [hidden email]
Date:        11/03/2011 08:13 AM
Subject:        Re: [SPSSX-L] Comparing two records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Jon,

I am also trying to compare two datasets and I was wondering if you could
help me.
Sorry for the very basic question, but I am not familiar with using the
syntax. The output keeps stating "invalid variable or TO usage". I am unsure
which variable to write behind DATA ID =
I am not more successful when using the  command (I am using SPSS 19).
Many thanks,
Julie


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Comparing-two-records-tp3359824p4961117.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

beny
I suspect I am missing some basic understanding of how the two versions of help work.

I have successful downloaded, installed and used the Python extensions with SPSS 19.0.
Code like this works correctly.

BEGIN PROGRAM.
import spss, spssaux
help(spssaux.GetValuesFromXMLWorkspace)
END PROGRAM.

However, when I try to execute the following line out of my syntax editor

SPSSINC COMPARE DATASETS /HELP.

I get the error that the first word "SPSSINC" is not recognized as an SPSS Statistics command.

Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

Jon K Peck
This means that the SPSSINC COMPARE DATASETS command is not installed, You need to get it from the SPSS Community and install it.  You can find it in the Extension Commands collection.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        beny <[hidden email]>
To:        [hidden email],
Date:        10/23/2012 12:27 PM
Subject:        Re: [SPSSX-L] Comparing two records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I suspect I am missing some basic understanding of how the two versions of
help work.

I have successful downloaded, installed and used the Python extensions with
SPSS 19.0.
Code like this works correctly.

BEGIN PROGRAM.
import spss, spssaux
help(spssaux.GetValuesFromXMLWorkspace)
END PROGRAM.

However, when I try to execute the following line out of my syntax editor

SPSSINC COMPARE DATASETS /HELP.

I get the error that the first word "SPSSINC" is not recognized as an SPSS
Statistics command.





--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Comparing-two-records-tp3359824p5715803.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Comparing two records

beny
Thanks. I mistakenly thought it was part of the Python extensions.

On Tue, Oct 23, 2012 at 8:40 PM, Jon K Peck <[hidden email]> wrote:
This means that the SPSSINC COMPARE DATASETS command is not installed, You need to get it from the SPSS Community and install it.  You can find it in the Extension Commands collection.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: <a href="tel:720-342-5621" value="+17203425621" target="_blank">720-342-5621




From:        beny <[hidden email]>
To:        [hidden email],
Date:        10/23/2012 12:27 PM
Subject:        Re: [SPSSX-L] Comparing two records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I suspect I am missing some basic understanding of how the two versions of
help work.

I have successful downloaded, installed and used the Python extensions with
SPSS 19.0.
Code like this works correctly.

BEGIN PROGRAM.
import spss, spssaux
help(spssaux.GetValuesFromXMLWorkspace)
END PROGRAM.

However, when I try to execute the following line out of my syntax editor

SPSSINC COMPARE DATASETS /HELP.

I get the error that the first word "SPSSINC" is not recognized as an SPSS
Statistics command.





--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Comparing-two-records-tp3359824p5715803.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD





--
Ben Yuhas
www.yuhasgroup.com
410-467-9387