Hi All,
I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are... I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric. Thanks, Kent Bowers. |
The SPSSINC COMPARE DATASETS extension
command can compare two datasets. It can produce a summary table
in the Viewer and add variables to the dataset with information on differences.
You can compare the metadata and/or the case values. It respects
any filters set on the data, so you can compare a subset of records. The
command has a dialog box interface (Data>Compare Datasets) as well as
syntax. Here's an example.
SPSSINC COMPARE DATASETS DS2=DataSet2 VARIABLES=x y z /DATA ID = id DIFFCOUNT=differences /DICTIONARY ATTRIBUTES FORMAT INDEX MEASLEVEL MISSINGVALUES TYPE VARLABEL VALUELABELS. This requires the Python Essentials/plugin, which is available from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) or, for older versions, from DevCentral (www.spss.com/devcentral). With Statistics 19, this command is automatically installed with the Essentials. For older versions, it also needs to be downloaded from the SPSS Community and installed. REMINDER: I have been getting a steady stream of "where did it go" questions about DevCentral content. Most of the material has been moved to the SPSS Community and is no longer available on the old site. It says this on the front page of DevCentral, but people are overlooking this. HTH, Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: Mark Antrobus <[hidden email]> To: [hidden email] Date: 01/27/2011 07:51 AM Subject: [SPSSX-L] Comparing two records Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi All, I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are... I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric. Thanks, Kent Bowers. |
Administrator
|
In reply to this post by Mark Antrobus
How about Data --> Identify Duplicate Cases (in a stacked data file with two rows per case, one for each data entry person)? The more common use of this is to find unwanted duplicates. In your case, you're looking for problematic non-duplicates. Also see examples 3 & 4 here:
http://spsstools.net/SampleSyntax.htm#Matching HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Jon K Peck
Thanks Jon. Where can I find the detailed explanation of these commands? For instance what do all the sub-commands do? I've looked on the new IBM site but there's nowhere obvious to look...
Thanks, Antro. On 27 January 2011 09:13, Jon K Peck <[hidden email]> wrote: The SPSSINC COMPARE DATASETS extension command can compare two datasets. It can produce a summary table in the Viewer and add variables to the dataset with information on differences. You can compare the metadata and/or the case values. It respects any filters set on the data, so you can compare a subset of records. The command has a dialog box interface (Data>Compare Datasets) as well as syntax. Here's an example. |
1. You can use the help in the dialog box.
2. You can run any extension command created by SPSS with the /HELP subcommand to see the detailed syntax help in the Viewer, e.g., SPSSINC COMPARE DATASETS /HELP. HTH, Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: Mark Antrobus <[hidden email]> To: [hidden email] Date: 01/27/2011 09:55 AM Subject: Re: [SPSSX-L] Comparing two records Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks Jon. Where can I find the detailed explanation of these commands? For instance what do all the sub-commands do? I've looked on the new IBM site but there's nowhere obvious to look... Thanks, Antro. On 27 January 2011 09:13, Jon K Peck <peck@...> wrote: The SPSSINC COMPARE DATASETS extension command can compare two datasets. It can produce a summary table in the Viewer and add variables to the dataset with information on differences. You can compare the metadata and/or the case values. It respects any filters set on the data, so you can compare a subset of records. The command has a dialog box interface (Data>Compare Datasets) as well as syntax. Here's an example. SPSSINC COMPARE DATASETS DS2=DataSet2 VARIABLES=x y z /DATA ID = id DIFFCOUNT=differences /DICTIONARY ATTRIBUTES FORMAT INDEX MEASLEVEL MISSINGVALUES TYPE VARLABEL VALUELABELS. This requires the Python Essentials/plugin, which is available from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) or, for older versions, from DevCentral (www.spss.com/devcentral). With Statistics 19, this command is automatically installed with the Essentials. For older versions, it also needs to be downloaded from the SPSS Community and installed. REMINDER: I have been getting a steady stream of "where did it go" questions about DevCentral content. Most of the material has been moved to the SPSS Community and is no longer available on the old site. It says this on the front page of DevCentral, but people are overlooking this. HTH, Jon Peck Senior Software Engineer, IBM peck@... 312-651-3435 From: Mark Antrobus <bzcensus@...> To: [hidden email] Date: 01/27/2011 07:51 AM Subject: [SPSSX-L] Comparing two records Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi All, I want to compare two records which have been manually entered for discrepancies between the two. There are about 180 variables, both string and numeric, and the strings are all of various lengths. Any ideas on how to achieve this? First thing would be just a count of how many differences there are... I wanted to use a VECTOR and loop through the complete variable list, but this only accepts all string (of the same length) or all numeric. Thanks, Kent Bowers. |
In reply to this post by Jon K Peck
Hi Jon,
I am also trying to compare two datasets and I was wondering if you could help me. Sorry for the very basic question, but I am not familiar with using the syntax. The output keeps stating "invalid variable or TO usage". I am unsure which variable to write behind DATA ID = I am not more successful when using the command (I am using SPSS 19). Many thanks, Julie |
To get a definitive answer, you need to
post the syntax that is failing, but the error means that the variable
name given for the ID variable does not match a variable in the dataset.
One common reason for this is a mismatch in case. If your variable
is named ID in SPSS, you must write it in capitals in the compare datasets
in command. This also applies to dataset names.
HTH, Jon Peck (no "h") Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Julie <[hidden email]> To: [hidden email] Date: 11/03/2011 08:13 AM Subject: Re: [SPSSX-L] Comparing two records Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Jon, I am also trying to compare two datasets and I was wondering if you could help me. Sorry for the very basic question, but I am not familiar with using the syntax. The output keeps stating "invalid variable or TO usage". I am unsure which variable to write behind DATA ID = I am not more successful when using the command (I am using SPSS 19). Many thanks, Julie -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Comparing-two-records-tp3359824p4961117.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I suspect I am missing some basic understanding of how the two versions of help work.
I have successful downloaded, installed and used the Python extensions with SPSS 19.0. Code like this works correctly. BEGIN PROGRAM. import spss, spssaux help(spssaux.GetValuesFromXMLWorkspace) END PROGRAM. However, when I try to execute the following line out of my syntax editor SPSSINC COMPARE DATASETS /HELP. I get the error that the first word "SPSSINC" is not recognized as an SPSS Statistics command. |
This means that the SPSSINC COMPARE DATASETS
command is not installed, You need to get it from the SPSS Community and
install it. You can find it in the Extension Commands collection.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: beny <[hidden email]> To: [hidden email], Date: 10/23/2012 12:27 PM Subject: Re: [SPSSX-L] Comparing two records Sent by: "SPSSX(r) Discussion" <[hidden email]> I suspect I am missing some basic understanding of how the two versions of help work. I have successful downloaded, installed and used the Python extensions with SPSS 19.0. Code like this works correctly. BEGIN PROGRAM. import spss, spssaux help(spssaux.GetValuesFromXMLWorkspace) END PROGRAM. However, when I try to execute the following line out of my syntax editor SPSSINC COMPARE DATASETS /HELP. I get the error that the first word "SPSSINC" is not recognized as an SPSS Statistics command. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Comparing-two-records-tp3359824p5715803.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks. I mistakenly thought it was part of the Python extensions.
On Tue, Oct 23, 2012 at 8:40 PM, Jon K Peck <[hidden email]> wrote: This means that the SPSSINC COMPARE DATASETS command is not installed, You need to get it from the SPSS Community and install it. You can find it in the Extension Commands collection. Ben Yuhas www.yuhasgroup.com 410-467-9387 |
Free forum by Nabble | Edit this page |