I created a cumulative file for all waves of the British Social Attitudes Survey 1983 to 2015. There was already a file for 1983-2014, but when I combined it with the 2015 wave the metadata from 2015 took precedence, resulting in erroneous measurement levels. A quick solution was to use APPLY DICTIONARY from the earlier 1983-2014 file, but there are some new variables unique to 2015 I’ve already found a couple at the beginning of the file, but is there a quick way to identify such variables or do I have to eyeball all 10973 variables? COMPARE DATASETS doesn’t do what I want. Basically I want to identify variable names in dataset2 which do not appear in dataset1. Dataset1 “bsa1983-2014.sav” Dataset2 “bsa1983-2015.sav” Thanks in advance John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop |
When you use ADD FILES to concatenate them include the subcommand MAP.
***************************************. DATA LIST FREE / A B C. BEGIN DATA 1 2 3 END DATA. DATASET NAME A. DATA LIST FREE / A B Z. BEGIN DATA 1 2 3 END DATA. DATASET NAME Z. ADD FILES FILE = 'A' /FILE = 'Z' /MAP. ***************************************. Also won't all the new variables be at the end of the file? So if you do: ADD FILES FILE = 'Old /FILE = 'New'. Just figure out the last variable in Old, and then see which ones are after it in the concatenated file. |
Two further comments - the dialog box for COMPARE DATASETS has a subdialog that shows you the variables in each dataset that are not in the other. - If you want to print a list, this small Python program will do it, where dataset1 and dataset2 are the dataset names. begin program. import spss, spssaux spss.Submit("dataset activate dataset1") var1 = set(spssaux.VariableDict().variables) spss.Submit("dataset activate dataset2") var2 = set(spssaux.VariableDict().variables) diff = var1.symmetric_difference(var2) print diff end program. If there are many such variables, change the print diff line to read print "\n".join(diff) On Mon, Mar 6, 2017 at 7:07 AM, Andy W <[hidden email]> wrote: When you use ADD FILES to concatenate them include the subcommand MAP. -- |
Jon, Andy Thanks for these. There seem to be about 100 new names, so it will take me a while to check. The problem really lies in the fact that data sets for each new wave are deposited and distributed without being properly compiled and checked, so that levels, missing values and other metadata are not only incorrect, but also incompatible with earlier files which took me several months to clean and create (with a lot of Python code provided by Jon). If any of my students had submitted files like that, they would have been heavily penalised for that component, if not actually failed. I sometimes wonder why I bother, but my versions will save future teachers, researchers and students a lot of grief. John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck Two further comments - the dialog box for COMPARE DATASETS has a subdialog that shows you the variables in each dataset that are not in the other. - If you want to print a list, this small Python program will do it, where dataset1 and dataset2 are the dataset names. begin program. spss.Submit("dataset activate dataset1") diff = var1.symmetric_difference(var2) If there are many such variables, change the print diff line to read print "\n".join(diff) On Mon, Mar 6, 2017 at 7:07 AM, Andy W <[hidden email]> wrote:
Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Andy W
Andy
There are 99958 cases and 10973 variables in the combined file. I'm working with *.sav files, not raw data, so I don't see how your syntax can work, but I'll play with the /MAP idea. John -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Andy W Sent: 06 March 2017 15:07 To: [hidden email] Subject: Re: Unique variable names When you use ADD FILES to concatenate them include the subcommand MAP. ***************************************. DATA LIST FREE / A B C. BEGIN DATA 1 2 3 END DATA. DATASET NAME A. DATA LIST FREE / A B Z. BEGIN DATA 1 2 3 END DATA. DATASET NAME Z. ADD FILES FILE = 'A' /FILE = 'Z' /MAP. ***************************************. Also won't all the new variables be at the end of the file? So if you do: ADD FILES FILE = 'Old /FILE = 'New'. Just figure out the last variable in Old, and then see which ones are after it in the concatenated file. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp 5733946p5733947.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Wow John,
I can't believe you actually posted this. Obviously the DATA LIST commands simply create test files. Andy is spot on with the following: "Also won't all the new variables be at the end of the file? So if you do: ADD FILES FILE = 'Old /FILE = 'New'. Just figure out the last variable in Old, and then see which ones are after it in the concatenated file."
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
John, you said you have the following datasets:
Dataset1 “bsa1983-2014.sav” Dataset2 “bsa1983-2015.sav” Are the 1983-2014 records in Dataset2 the same as the records in Dataset1? If so, surely you want to delete all records prior to 2015 from Dataset2 before stacking the files, do you not? If not, please explain. Here (just to make Jon's day) is a no-Python-required (NPR) approach that will give you a list of the variables that are unique to dataset 2. * Open your two data files and name them * Dataset1 and Dataset2. DATASET ACTIVATE Dataset1. NUMERIC @LastD1var@ (F1). DATASET ACTIVATE Dataset2. **********************************************************. * Get rid of data for 2014 or earlier to avoid duplication. SELECT IF Year > 2014. **********************************************************. ADD FILES FILE = Dataset1 / FILE = Dataset2 / MAP. EXECUTE. DATASET NAME d1d2. * In the MAP output from ADD FILES, variables * listed after @LastD1var@ are unique to Dataset2.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
With 10,000 variables, this could be painful. On Mon, Mar 6, 2017 at 4:31 PM, Bruce Weaver <[hidden email]> wrote: John, you said you have the following datasets: -- |
Bruce, Jon, David, Andy I’ll have a shot with the solutions suggested, possibly with a tweak or two. Another approach would be to display both Data Editors side by side with both Names columns visible, then systematically cut them halfway, quarter-way etc to see if the line numbers are the same (a variation on an EDT trick I used way back when to check for missing or duplicate records in raw data). I should explain that the BSAS now has 32 waves with many variables replicated in several years using the same mnemonic varnames. A major problem arose from incompatible formats for the same variable in different waves, but others included 1) anything up to seven values to be treated as missing, 2) missing values labelled, but not declared, 3) other metadata incorrect or incomplete. I spent several months last year resolving these to create a cumulative “mother” *.sav file for the first 31 waves. The waves were edited and added in reverse year order. For a detailed account see: http://surveyresearch.weebly.com/british-social-attitudes-1983-onwards-cumulative-spss-file.html and http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_the_distributed_spss_file_for_british_social_attitudes_2011.pdf The *.sav file for 2015 (wave 32) has many of the same metadata problems and, in addition to identifying and dealing with unique variables, will need dozens of transformations of variables common to the 1983-2014 “mother” file. This needs to be done before merging with the mother file. Users acknowledge this as a valuable resource for teachers, researchers and students: one senior Professor has already described the undertaking as Herculean, but even that is an understatement. John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop |
In reply to this post by John F Hall
Mid-part of Bruce’s syntax identified 201 variables unique to 2015: ADD FILES FILE = Dataset2 / FILE = Dataset1 / MAP. EXECUTE. DATASET NAME d1d When I’ve checked these I’ll rerun the transformations on the shared variables in 2015 to make them compatible with 1983 to 2014. Tedious, but I still have my SPSS and Jon Peck’s Python code. To be honest, the files should have been correctly compiled and checked before they were deposited and distributed. However, I hope I’ve saved future users the frustration of finding and having to correct inconsistencies and errors in the metadata themselves. John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop |
Administrator
|
John, after sleeping on it, I decided that you probably don't need the data from Dataset1, you just need the metadata. Therefore, before stacking the files, you could delete all of the data from Dataset1. (This worked with a couple of small datasets I generated for testing.)
* Open your two data files and name them * Dataset1 and Dataset2. DATASET ACTIVATE Dataset1. NUMERIC @LastD1var@ (F1). ***********************************************************. * Delete all data from Dataset1--we just need the metadata. SELECT IF $CASENUM < 1. EXECUTE. ***********************************************************. DATASET ACTIVATE Dataset2. ADD FILES FILE = Dataset1 / FILE = Dataset2 /BY MAP. EXECUTE. DATASET NAME d1d2. * In the MAP output from ADD FILES, variables * listed after @LastD1var@ are unique to Dataset2. And yes, Jon, this will no doubt be a bit painful with so many variables.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by John F Hall
OK John,
Here's what I would do. Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file. SORT these files and MATCH them . Should be trivial to discern the discrepencies. Would it be possible for you to post the 2 metadata files (without data) to this thread (just the metadata). GET FLE xxxxx1. SELECT IF $CASENUM= 1. SAVE OUTFILE xxxx1META.sav. GET FLE xxxxx2. SELECT IF $CASENUM= 1. SAVE OUTFILE xxxx2META.sav. Attach xxxx1META.sav. and xxxx2META.sav. will see how feasible this is. Template for the OMS. Repeat for each of the 2 files with appropriate substitutions for filenames. GET FILE='C:\Program Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'. DATASET DECLARE VarLabels1. DATASET DECLARE VarInfo1. OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='VarLabels1' VIEWER=NO. * OMS. OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='VarInfo1' VIEWER=NO. DISPLAY DICTIONARY. OMSEND. OMSEND.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
For existing variables, this seems to be a long way of doing what the COMPARE DATASETS command does. On Tue, Mar 7, 2017 at 7:09 AM, David Marso <[hidden email]> wrote: OK John, -- |
Administrator
|
I just tried COMPARE DATASETS and it does give a nice general overview of data discrepancies.
I'll bet combined with OMS one could use the results to build syntax to modify the target file to conform to the base. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Jon Peck
Jon, David This is going to take me some time. I tried COMPARE DATASETS but it didn’t give me what I needed. I’ll try David’s suggestion of OMS with DISPLAY DICTIONARY, but SORT on which variable(s)? I’m also going to eyeball the 201 variables identified by ADD FILES ~~~/MAP as being unique to 2015. In creating the existing cumulative mother file I recoded all positive missing values to negative, eg: RECODE <varlist> (8=-8)(9=-9) (98=-98) (99=-99)(998=-998)( (999=-999). [etc. etc.] Some admin/sampling/filter variables already had -3, -2,-1. Thus: MISSING VALUES <varlist> (lo thru -1). Some variables also had 7, 97 and 997. RECODE <varlist> (7=-7)(8=-8)(9=-9)(97=-97) (98=-98)(99=-99)(997=-997)(998=-998)(999=-999). For some variables 0 was also a missing value, thus: MISSING VALUES <varlist> (lo thru 0). ADD VALUE LABELS was used in combination with the RECODE commands, but this left ghost labels. Jon Peck provided some amazing Python code to semi-automate the above process which I used in combination with Excel: too complex to describe here, but it worked and saved me weeks of time. For many variables, measurement levels were incorrect, possibly as a result of automated processing, but they still had to be identified and corrected with VARIABLE LEVELS. The 2015 file still has positive missing values for variables shared with the 1983-2014 mother file: these will have to be identified and recoded as above. Most of them will be 5-point Agree-Disagree items. For this and other tasks I can rerun all the syntax I used for: RECODE ADD VALUE LABELS VARIABLE LEVELS Meanwhile I also have over 320,000 lines of syntax produced from the mother file (by Stats/Transfer during a free trial period). It includes: FILE HANDLE (line 8) DATA LIST (lines 10-3599) FORMATS (lines 3602-3612) VARIABLE LABELS (lines 3614-36176) VALUE LABELS (lines 36179-117393) . . and user-defined MISSING VALUES (lines 117395-121329) John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck For existing variables, this seems to be a long way of doing what the COMPARE DATASETS command does. On Tue, Mar 7, 2017 at 7:09 AM, David Marso <[hidden email]> wrote:
Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
" I tried COMPARE DATASETS but it didn’t give me what I needed."
Please elaborate. How is it NOT giviung youy what you needed? Perhaps you need to step back and make that very explicit. From what I can tell it gives you explicit analyses of the 2 sets of metadata and reports ALL of the discrepencies. If you go with OMS DISPLAY Dictionary it doesn't really give you ANYTHING more than OMS with COMPARE DATASETS aside from YOU having to write code to mop up after the MATCH. Sort on what variables? What would imagine would be necessary if you want to align the two sets of metadata? Please inspect the OMS results and scratch your head abit. OTOH I think COMPARE DATASETS is likely the better way to go.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Just tried COMPARE DATASETS again, but it looks at cases not variables. Also I can't get it to run or PASTE. The first time I ran it I got an enormous list of variables which I found completely meaningless. I got what I need from the following syntax based on Bruce's suggestion.
GET FILE='C:\data\4_Research\4 Surveys\British Social Attitudes\bsa15c.sav'. DATASET NAME DataSet1 WINDOW=FRONT. GET FILE='C:\data\4_Research\4 Surveys\British Social Attitudes\mdlast\bsa1983-2014mother16.sav'. DATASET NAME DataSet2 WINDOW=FRONT. NUMERIC @LastD1var@ (F1). ADD FILES FILE = Dataset2 / FILE = Dataset1 / MAP. Map of the result file Result Input1 Input2 ------ ------ ------ caseid caseid caseid year year year yearorder yearorder waveorder waveorder waveorder Serial Serial SSerial SSerial SSerial SPoint SPoint SPoint ~ ~ ~ ~ ~ [Gaps in this column because of my derived vars] zDcldrnam zDcldrnam zxnamdcbc zxnamdcbc @LastD1var@ @LastD1var@ [Bruce's marker variable] [Variables after this point are unique to 2015] URINDEW URINDEW URINDSC URINDSC RAgecat4 RAgecat4 RInEduc RInEduc ~ ~ ~ qsimd qsimd dsimd dsimd EXECUTE. DATASET NAME d1d2. This is what gave me the unique variable names. DISP LAB var urindew to dsimd. Variable Position Label URINDEW 10776 Urban/Rural Indicator 2011 (England and Wales) URINDSC 10777 Urban/Rural Indicator 2011 (Scotland) RAgecat4 10778 Age of respondent (grouped) (7 categories) dv ~ ~ ~ chngwk 10934 Agree/disagree: given the chance I would change present type of work: Versions B, D proudwk 10935 Agree/disagree: I am proud of the type of work I do: Versions B, D avunemp5 10936 Agree/disagree: Willing to move within Britain to avoid unemployment: Versions B, D avunemp6 10937 Agree/disagree: Willing to move abroad to avoid unemployment: Versions B, D ~ ~ ~ WhyDis1 10825 Reason for NHS dissatisfaction: quality of NHS care: Version B WhyDis2 10826 Reason for NHS dissatisfaction: long wait for appointment: Version B WhyDis3 10827 Reason for NHS dissatisfaction: attitudes/behaviour of staff: Version B ~ ~ ~ qwimd 10971 Wales: IMD 2011 - Quintiles dwimd 10972 Wales: WIMD 2011 - Deciles qsimd 10973 Scottish Index of Multiple Deprivation quintiles dsimd 10974 Scottish Index of Multiple Deprivation deciles On preliminary inspection of the var labels, some of them look suspiciously like the same variables as in the mother file, but with different names. I sincerely hope not, but I'll check. As well as 1-5 or 1-7 rating scales, a lot of them have values which are binary codes for multiple responses, so even with 201 variables it's not an enormous task to set levels, missing values and recode, add labels etc. substituting varnames in my original syntax. Let me crack on with this and I'll post short accounts to the list of how I eventually fare. John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: 07 March 2017 21:33 To: [hidden email] Subject: Re: Unique variable names " I tried COMPARE DATASETS but it didn’t give me what I needed." Please elaborate. How is it NOT giviung youy what you needed? Perhaps you need to step back and make that very explicit. From what I can tell it gives you explicit analyses of the 2 sets of metadata and reports ALL of the discrepencies. If you go with OMS DISPLAY Dictionary it doesn't really give you ANYTHING more than OMS with COMPARE DATASETS aside from YOU having to write code to mop up after the MATCH. Sort on what variables? What would imagine would be necessary if you want to align the two sets of metadata? Please inspect the OMS results and scratch your head abit. OTOH I think COMPARE DATASETS is likely the better way to go. John F Hall wrote > Jon, David > > This is going to take me some time. I tried COMPARE DATASETS but it > didn’t give me what I needed. I’ll try David’s suggestion of OMS with > DISPLAY DICTIONARY, but SORT on which variable(s)? I’m also going to > eyeball the 201 variables identified by ADD FILES ~~~/MAP as being > unique to 2015. > In creating the existing cumulative mother file I recoded all positive > missing values to negative, eg: > RECODE > <varlist> > (8=-8)(9=-9) (98=-98) (99=-99)(998=-998)( (999=-999). [etc. etc.] > > Some admin/sampling/filter variables already had -3, -2,-1. Thus: > MISSING VALUES > <varlist> > (lo thru -1). > > Some variables also had 7, 97 and 997. > RECODE > <varlist> > (7=-7)(8=-8)(9=-9)(97=-97) > (98=-98)(99=-99)(997=-997)(998=-998)(999=-999). > > For some variables 0 was also a missing value, thus: > MISSING VALUES > <varlist> > (lo thru 0). > > ADD VALUE LABELS was used in combination with the RECODE commands, but > this left ghost labels. Jon Peck provided some amazing Python code to > semi-automate the above process which I used in combination with Excel: > too complex to describe here, but it worked and saved me weeks of time. > For many variables, measurement levels were incorrect, possibly as a > result of automated processing, but they still had to be identified > and corrected with VARIABLE LEVELS. > > The 2015 file still has positive missing values for variables shared > with the 1983-2014 mother file: these will have to be identified and > recoded as above. Most of them will be 5-point Agree-Disagree items. > For this and other tasks I can rerun all the syntax I used for: > > RECODE > ADD VALUE LABELS > VARIABLE LEVELS > > Meanwhile I also have over 320,000 lines of syntax produced from the > mother file (by Stats/Transfer during a free trial period). It includes: > > FILE HANDLE (line 8) > DATA LIST (lines 10-3599) > FORMATS (lines 3602-3612) > VARIABLE LABELS (lines 3614-36176) > VALUE LABELS (lines 36179-117393) > . . and user-defined > MISSING VALUES (lines 117395-121329) > > John F Hall (Mr) > [Retired academic survey researcher] > > Email: > johnfhall@ > <mailto: > johnfhall@ > > > Website: www.surveyresearch.weebly.com > <http://www.surveyresearch.weebly.com/> > SPSS start page: > www.surveyresearch.weebly.com/1-survey-analysis-workshop > <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&g > t; > > > > From: SPSSX(r) Discussion [mailto: > SPSSX-L@.UGA > ] On Behalf Of Jon Peck > Sent: 07 March 2017 15:28 > To: > SPSSX-L@.UGA > Subject: Re: Unique variable names > > For existing variables, this seems to be a long way of doing what the > COMPARE DATASETS command does. > > On Tue, Mar 7, 2017 at 7:09 AM, David Marso < > david.marso@ > <mailto: > david.marso@ > > > wrote: > OK John, > Here's what I would do. > > Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file. > SORT these files and MATCH them . > Should be trivial to discern the discrepencies. > Would it be possible for you to post the 2 metadata files (without > data) to this thread (just the metadata). > > GET FLE xxxxx1. > SELECT IF $CASENUM= 1. > SAVE OUTFILE xxxx1META.sav. > GET FLE xxxxx2. > SELECT IF $CASENUM= 1. > SAVE OUTFILE xxxx2META.sav. > > Attach xxxx1META.sav. > and xxxx2META.sav. > will see how feasible this is. > > Template for the OMS. > Repeat for each of the 2 files with appropriate substitutions for > filenames. > > GET > FILE='C:\Program > Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'. > DATASET DECLARE VarLabels1. > DATASET DECLARE VarInfo1. > OMS > /SELECT TABLES > /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values'] > /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ > OUTFILE='VarLabels1' VIEWER=NO. > * OMS. > OMS > /SELECT TABLES > /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information'] > /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ > OUTFILE='VarInfo1' VIEWER=NO. > DISPLAY DICTIONARY. > OMSEND. > OMSEND. > > > > John F Hall wrote >> Bruce, Jon, David, Andy >> >> I’ll have a shot with the solutions suggested, possibly with a tweak >> or two. >> >> Another approach would be to display both Data Editors side by side >> with both Names columns visible, then systematically cut them >> halfway, quarter-way etc to see if the line numbers are the same (a >> variation on an EDT trick I used way back when to check for missing >> or duplicate records in raw data). >> >> I should explain that the BSAS now has 32 waves with many variables >> replicated in several years using the same mnemonic varnames. A >> major problem arose from incompatible formats for the same variable >> in different waves, but others included 1) anything up to seven >> values to be treated as missing, 2) missing values labelled, but not >> declared, 3) other metadata incorrect or incomplete. I spent several >> months last year resolving these to create a cumulative “mother” >> *.sav file for the first 31 waves. >> >> The waves were edited and added in reverse year order. For a >> detailed account see: >> http://surveyresearch.weebly.com/british-social-attitudes-1983-onward >> s-cumulative-spss-file.html >> and >> http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_ >> the_distributed_spss_file_for_british_social_attitudes_2011.pdf >> The *.sav file for 2015 (wave 32) has many of the same metadata >> problems and, in addition to identifying and dealing with unique >> variables, will need dozens of transformations of variables common to >> the 1983-2014 “mother” file. This needs to be done before merging >> with the mother file. >> >> Users acknowledge this as a valuable resource for teachers, >> researchers and students: one senior Professor has already described >> the undertaking as Herculean, but even that is an understatement. >> >> John F Hall (Mr) >> [Retired academic survey researcher] >> >> Email: > >> johnfhall@ >> <mailto: > >> johnfhall@ > >> > >> Website: www.surveyresearch.weebly.com >> <http://www.surveyresearch.weebly.com> >> <http://www.surveyresearch.weebly.com/ >> <http://www.surveyresearch.weebly.com/&gt> > SPSS start >> page: >> www.surveyresearch.weebly.com/1-survey-analysis-workshop >> <http://www.surveyresearch.weebly.com/1-survey-analysis-workshop&g >> t; >> <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html >> <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html& >> amp;gt> >> > >> >> >> >> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA <mailto:LISTSERV@.UGA> > >> (not to SPSSX-L), with no body text except the command. To leave the >> list, send the command SIGNOFF SPSSX-L For a list of commands to >> manage subscriptions, send the command INFO REFCARD > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante > porcos ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum > cliff in abyssum?" > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp > 5733946p5733960.html Sent from the SPSSX Discussion mailing list > archive at Nabble.com. > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > <mailto: > LISTSERV@.UGA > > (not to SPSSX-L), with no body text except the command. To leave > the list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD > > > > -- > Jon K Peck > jkpeck@ > <mailto: > jkpeck@ > > > ===================== To manage your subscription to SPSSX-L, send a > message to > LISTSERV@.UGA > <mailto: > LISTSERV@.UGA > > (not to SPSSX-L), with no body text except the command. To leave > the list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733969.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
John,
Please attach the metadata for the 2 files and I'll take a look at them. D.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by John F Hall
"Just tried COMPARE DATASETS again, but it looks at cases not variables. " This is not correct. You specify what variable properties in the two files to compare, and you can omit the case comparisons. For example, COMPARE DATASETS /COMPDATASET = 'c:\temp\empdate.sav' /VARIABLES ALL /SAVE FLAGMISMATCHES=NO MATCHDATASET=NO MISMATCHDATASET=NO /OUTPUT VARPROPERTIES=VALUELABELS MISSING MEASURE ROLE CASETABLE=NO TABLELIMIT=1. On Tue, Mar 7, 2017 at 11:51 PM, John F Hall <[hidden email]> wrote: Just tried COMPARE DATASETS again, but it looks at cases not variables. Also I can't get it to run or PASTE. The first time I ran it I got an enormous list of variables which I found completely meaningless. I got what I need from the following syntax based on Bruce's suggestion. -- |
Free forum by Nabble | Edit this page |