Hello, Context: Survey data is collected each year for multiple years, each year there are minor changes to the survey are made (questions are added or dropped). Most of the variables (~80%) do not change. It would be very helpful to automate the process for comparing variable names between two (or more) data files so that myself and others can more easily determine which variables are candidates for longitudinal analysis. I have looked into the Compare Datasets command, but the focus of that procedure seems to be on comparing records & data for specified variables, not the variables per se. I can always wrangle the data file info into excel and make side-by-side comparisons, which would be fine but with over 800 variables per file human error is inevitable. Any assistance is greatly appreciated. Thanks, John |
Administrator
|
NEW FILE.
DATASET CLOSE ALL. DATA LIST FREE/a b c f g h k l m. BEGIN DATA 1 2 3 4 5 6 7 8 9 END DATA. DATASET NAME data1. DATA LIST FREE/a b c d e g h k l . BEGIN DATA 1 2 3 4 5 6 7 8 9 END DATA. DATASET NAME data2. DATASET ACTIVATE data1. SELECT IF $CASENUM EQ 1. FLIP. SORT CASES BY CASE_LBL. DATASET NAME flipped1. DATASET ACTIVATE data2. SELECT IF $CASENUM EQ 1. FLIP. SORT CASES BY CASE_LBL. DATASET NAME flipped2. MATCH FILES FILE=flipped1/IN=In1/ FILE=flipped2/IN=In2/BY CASE_LBL. EXECUTE. DATASET NAME compare. LIST. CASE_LBL var001 In1 In2 a 1.00 1 1 b 2.00 1 1 c 3.00 1 1 d 4.00 0 1 e 5.00 0 1 f 4.00 1 0 g 5.00 1 1 h 6.00 1 1 k 7.00 1 1 l 8.00 1 1 m 9.00 1 0 Number of cases read: 11 Number of cases listed: 11
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by J P
You can sort of get this information by
specifying only 1 case be compared, but the following Python code compares
the variable names in the active dataset with the names in an external
sav file. It prints a sorted list of the names in both and a second
list of the names in only one or the other.
begin program. import spss, spssaux, textwrap def comparenames(externalfile): activedsdict = set(spssaux.VariableDict().variables) spssaux.OpenDataFile(externalfile) externaldict = set(spssaux.VariableDict().variables) common = " ".join(sorted(activedsdict.intersection(externaldict))) disjoint = " ".join(sorted(activedsdict.symmetric_difference(externaldict))) print "\nVariables in Common\n", "\n".join(textwrap.wrap(common, 100)) print "\nVariables in only one set\n", "\n".join(textwrap.wrap(disjoint, 100)) # invoke function with the name of the external file (using /, not backslash) comparenames("c:/spss22/samples/english/employee data.sav") end program. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: J P <[hidden email]> To: [hidden email] Date: 09/17/2014 09:29 AM Subject: [SPSSX-L] Compare variable names between two files Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello, Context: Survey data is collected each year for multiple years, each year there are minor changes to the survey are made (questions are added or dropped). Most of the variables (~80%) do not change. It would be very helpful to automate the process for comparing variable names between two (or more) data files so that myself and others can more easily determine which variables are candidates for longitudinal analysis. I have looked into the Compare Datasets command, but the focus of that procedure seems to be on comparing records & data for specified variables, not the variables per se. I can always wrangle the data file info into excel and make side-by-side comparisons, which would be fine but with over 800 variables per file human error is inevitable. Any assistance is greatly appreciated. Thanks, John ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |