Unique variable names

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Unique variable names

John F Hall

I created a cumulative file for all waves of the British Social Attitudes Survey 1983 to 2015.  There was already a file for 1983-2014, but when I combined it with the 2015 wave the metadata from 2015 took precedence, resulting in erroneous measurement levels.  A quick solution was to use APPLY DICTIONARY from the earlier 1983-2014 file, but there are some new variables unique to 2015   I’ve already found a couple at the beginning of the file, but is there a quick way to identify such variables or do I have to eyeball all 10973 variables?  COMPARE DATASETS doesn’t do what I want.

 

Basically I want to identify variable names in dataset2 which do not appear in dataset1.

 

Dataset1 “bsa1983-2014.sav”

Dataset2 “bsa1983-2015.sav”

 

Thanks in advance

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Andy W
When you use ADD FILES to concatenate them include the subcommand MAP.

***************************************.
DATA LIST FREE / A B C.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME A.

DATA LIST FREE / A B Z.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME Z.


ADD FILES FILE = 'A'
  /FILE = 'Z'
  /MAP.
***************************************.

Also won't all the new variables be at the end of the file? So if you do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are after it in the concatenated file.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Jon Peck
Two further comments

- the dialog box for COMPARE DATASETS has a subdialog that shows you the variables in each dataset that are not in the other.

- If you want to print a list, this small Python program will do it, where dataset1 and dataset2 are the dataset names.


begin program.
import spss, spssaux
spss.Submit("dataset activate dataset1")
var1 = set(spssaux.VariableDict().variables)
spss.Submit("dataset activate dataset2")
var2 = set(spssaux.VariableDict().variables)
diff = var1.symmetric_difference(var2)
print diff
end program.

If there are many such variables, change the print diff line to read
print "\n".join(diff)



On Mon, Mar 6, 2017 at 7:07 AM, Andy W <[hidden email]> wrote:
When you use ADD FILES to concatenate them include the subcommand MAP.

***************************************.
DATA LIST FREE / A B C.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME A.

DATA LIST FREE / A B Z.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME Z.


ADD FILES FILE = 'A'
  /FILE = 'Z'
  /MAP.
***************************************.

Also won't all the new variables be at the end of the file? So if you do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are after
it in the concatenated file.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733947.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

John F Hall

Jon, Andy

 

Thanks for these.  There seem to be about 100 new names, so it will take me a while to check.  The problem really lies in the fact that data sets for each new wave are deposited and distributed without being properly compiled and checked, so that levels, missing values and other metadata are not only incorrect, but also incompatible with earlier files which took me several months to clean and create (with a lot of Python code provided by Jon).  If any of my students had submitted files like that, they would have been heavily penalised for that component, if not actually failed.  I sometimes wonder why I bother, but my versions will save future teachers, researchers and students a lot of grief.

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: 06 March 2017 15:37
To: [hidden email]
Subject: Re: Unique variable names

 

Two further comments

 

- the dialog box for COMPARE DATASETS has a subdialog that shows you the variables in each dataset that are not in the other.

 

- If you want to print a list, this small Python program will do it, where dataset1 and dataset2 are the dataset names.

 

 

begin program.
import spss, spssaux

spss.Submit("dataset activate dataset1")
var1 = set(spssaux.VariableDict().variables)
spss.Submit("dataset activate dataset2")
var2 = set(spssaux.VariableDict().variables)

diff = var1.symmetric_difference(var2)
print diff
end program.

 

If there are many such variables, change the print diff line to read

print "\n".join(diff)

 

 

 

On Mon, Mar 6, 2017 at 7:07 AM, Andy W <[hidden email]> wrote:

When you use ADD FILES to concatenate them include the subcommand MAP.

***************************************.
DATA LIST FREE / A B C.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME A.

DATA LIST FREE / A B Z.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME Z.


ADD FILES FILE = 'A'
  /FILE = 'Z'
  /MAP.
***************************************.

Also won't all the new variables be at the end of the file? So if you do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are after
it in the concatenated file.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733947.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

John F Hall
In reply to this post by Andy W
Andy

There are 99958 cases and 10973 variables in the combined file.  I'm
working with *.sav files, not raw data, so I don't see how your syntax
can work, but I'll play with the /MAP idea.

John

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of Andy W
Sent: 06 March 2017 15:07
To: [hidden email]
Subject: Re: Unique variable names

When you use ADD FILES to concatenate them include the subcommand MAP.

***************************************.
DATA LIST FREE / A B C.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME A.

DATA LIST FREE / A B Z.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME Z.


ADD FILES FILE = 'A'
  /FILE = 'Z'
  /MAP.
***************************************.

Also won't all the new variables be at the end of the file? So if you
do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are
after it in the concatenated file.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp
5733946p5733947.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
the command. To leave the list, send the command SIGNOFF SPSSX-L For a
list of commands to manage subscriptions, send the command INFO
REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

David Marso
Administrator
Wow John,
  I can't believe you actually posted this.

Obviously the DATA LIST commands simply create test files.

Andy is spot on with the following:

"Also won't all the new variables be at the end of the file? So if you
do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are
after it
 in the concatenated file."


John F Hall wrote
Andy

There are 99958 cases and 10973 variables in the combined file.  I'm
working with *.sav files, not raw data, so I don't see how your syntax
can work, but I'll play with the /MAP idea.

John


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of Andy W
Sent: 06 March 2017 15:07
To: [hidden email]
Subject: Re: Unique variable names

When you use ADD FILES to concatenate them include the subcommand MAP.

***************************************.
DATA LIST FREE / A B C.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME A.

DATA LIST FREE / A B Z.
BEGIN DATA
1 2 3
END DATA.
DATASET NAME Z.


ADD FILES FILE = 'A'
  /FILE = 'Z'
  /MAP.
***************************************.

Also won't all the new variables be at the end of the file? So if you
do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are
after it in the concatenated file.



-----
Andy W
<SNIP>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Bruce Weaver
Administrator
John, you said you have the following datasets:

Dataset1 “bsa1983-2014.sav”
Dataset2 “bsa1983-2015.sav”

Are the 1983-2014 records in Dataset2 the same as the records in Dataset1?  If so, surely you want to delete all records prior to 2015 from Dataset2 before stacking the files, do you not?  If not, please explain.

Here (just to make Jon's day) is a no-Python-required (NPR) approach that will give you a list of the variables that are unique to dataset 2.  

* Open your two data files and name them
* Dataset1 and Dataset2.

DATASET ACTIVATE Dataset1.
NUMERIC @LastD1var@ (F1).
DATASET ACTIVATE Dataset2.
**********************************************************.
* Get rid of data for 2014 or earlier to avoid duplication.
SELECT IF Year > 2014.
**********************************************************.
ADD FILES
 FILE = Dataset1 /
 FILE = Dataset2 /
 MAP.
EXECUTE.
DATASET NAME d1d2.
* In the MAP output from ADD FILES, variables
* listed after @LastD1var@ are unique to Dataset2.



David Marso wrote
Wow John,
  I can't believe you actually posted this.

Obviously the DATA LIST commands simply create test files.

Andy is spot on with the following:

"Also won't all the new variables be at the end of the file? So if you
do:

ADD FILES FILE = 'Old
  /FILE = 'New'.

Just figure out the last variable in Old, and then see which ones are
after it
 in the concatenated file."


John F Hall wrote
Andy

There are 99958 cases and 10973 variables in the combined file.  I'm
working with *.sav files, not raw data, so I don't see how your syntax
can work, but I'll play with the /MAP idea.

John
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Jon Peck
With 10,000 variables, this could be painful.

On Mon, Mar 6, 2017 at 4:31 PM, Bruce Weaver <[hidden email]> wrote:
John, you said you have the following datasets:

Dataset1 “bsa1983-2014.sav”
Dataset2 “bsa1983-2015.sav”

Are the 1983-2014 records in Dataset2 the same as the records in Dataset1?
If so, surely you want to delete all records prior to 2015 from Dataset2
before stacking the files, do you not?  If not, please explain.

Here (just to make Jon's day) is a no-Python-required (NPR) approach that
will give you a list of the variables that are unique to dataset 2.

* Open your two data files and name them
* Dataset1 and Dataset2.

DATASET ACTIVATE Dataset1.
NUMERIC @LastD1var@ (F1).
DATASET ACTIVATE Dataset2.
**********************************************************.
* Get rid of data for 2014 or earlier to avoid duplication.
SELECT IF Year > 2014.
**********************************************************.
ADD FILES
 FILE = Dataset1 /
 FILE = Dataset2 /
 MAP.
EXECUTE.
DATASET NAME d1d2.
* In the MAP output from ADD FILES, variables
* listed after @LastD1var@ are unique to Dataset2.




David Marso wrote
> Wow John,
>   I can't believe you actually posted this.
>
> Obviously the DATA LIST commands simply create test files.
>
> Andy is spot on with the following:
>
> "Also won't all the
*
>  new variables be at the end of the file?
*
>  So if you
> do:
>
> ADD FILES FILE = 'Old
>   /FILE = 'New'.
>
> Just figure out the last variable in Old, and then see
*
> which ones are
> after it
*
>  in the concatenated file."
>
> John F Hall wrote
>> Andy
>>
>> There are 99958 cases and 10973 variables in the combined file.  I'm
>> working with *.sav files, not raw data, so I don't see how your syntax
>> can work, but I'll play with the /MAP idea.
>>
>> John





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733952.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

John F Hall

Bruce, Jon, David, Andy

 

I’ll have a shot with the solutions suggested, possibly with a tweak or two. 

 

Another approach would be to display both Data Editors side by side with both Names columns visible, then systematically cut them halfway, quarter-way etc to see if the line numbers are the same (a variation on an EDT trick I used way back when to check for missing or duplicate records in raw data).

 

I should explain that the BSAS now has 32 waves with many variables replicated in several years using the same mnemonic varnames.  A major problem arose from incompatible formats for the same variable in different waves, but others included 1) anything up to seven values to be treated as missing, 2) missing values labelled, but not declared, 3) other metadata incorrect or incomplete.  I spent several months last year resolving these to create a cumulative “mother” *.sav file for the first 31 waves. 

 

The waves were edited and added in reverse year order.  For a detailed account see: http://surveyresearch.weebly.com/british-social-attitudes-1983-onwards-cumulative-spss-file.html  and http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_the_distributed_spss_file_for_british_social_attitudes_2011.pdf

The *.sav file for 2015 (wave 32) has many of the same metadata problems and, in addition to identifying and dealing with unique variables, will need dozens of transformations of variables common to the 1983-2014 “mother” file.  This needs to be done before merging with the mother file.

 

Users acknowledge this as a valuable resource for teachers, researchers and students: one senior Professor has already described the undertaking as Herculean, but even that is an understatement.

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

John F Hall
In reply to this post by John F Hall

Mid-part of Bruce’s syntax identified 201 variables unique to 2015:

 

ADD FILES

FILE = Dataset2 /

FILE = Dataset1 /

MAP.

EXECUTE.

DATASET NAME d1d

 

When I’ve checked these I’ll rerun the transformations on the shared variables in 2015 to make them compatible with 1983 to 2014.  Tedious, but I still have my SPSS and Jon Peck’s Python code.  To be honest, the files should have been correctly compiled and checked before they were deposited and distributed.  However, I hope I’ve saved future users the frustration of finding and having to correct inconsistencies and errors in the metadata themselves.

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Bruce Weaver
Administrator
John, after sleeping on it, I decided that you probably don't need the data from Dataset1, you just need the metadata.  Therefore, before stacking the files, you could delete all of the data from Dataset1.  (This worked with a couple of small datasets I generated for testing.)  

* Open your two data files and name them
* Dataset1 and Dataset2.

DATASET ACTIVATE Dataset1.
NUMERIC @LastD1var@ (F1).
***********************************************************.
* Delete all data from Dataset1--we just need the metadata.
SELECT IF $CASENUM < 1.
EXECUTE.
***********************************************************.
DATASET ACTIVATE Dataset2.
ADD FILES
 FILE = Dataset1 /
 FILE = Dataset2 /BY
 MAP.
EXECUTE.
DATASET NAME d1d2.
* In the MAP output from ADD FILES, variables
* listed after @LastD1var@ are unique to Dataset2.


And yes, Jon, this will no doubt be a bit painful with so many variables.



John F Hall wrote
Mid-part of Bruce's syntax identified 201 variables unique to 2015:
 
ADD FILES
FILE = Dataset2 /
FILE = Dataset1 /
MAP.
EXECUTE.
DATASET NAME d1d
 
When I've checked these I'll rerun the transformations on the shared
variables in 2015 to make them compatible with 1983 to 2014.  Tedious,
but I still have my SPSS and Jon Peck's Python code.  To be honest,
the files should have been correctly compiled and checked before they
were deposited and distributed.  However, I hope I've saved future
users the frustration of finding and having to correct inconsistencies
and errors in the metadata themselves.
 
John F Hall (Mr)
[Retired academic survey researcher]
 
Email:   [hidden email] <mailto:[hidden email]>  
Website: www.surveyresearch.weebly.com
<http://www.surveyresearch.weebly.com/> 
SPSS start page:
www.surveyresearch.weebly.com/1-survey-analysis-workshop
<http://surveyresearch.weebly.com/1-survey-analysis-workshop.html> 
 
 
 

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

David Marso
Administrator
In reply to this post by John F Hall
OK John,
Here's what I would do.

Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
SORT these files and MATCH them .
Should be trivial to discern the discrepencies.
Would it be possible for you to post the 2 metadata files (without data) to this thread (just the metadata).

GET FLE xxxxx1.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx1META.sav.
GET FLE xxxxx2.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx2META.sav.

Attach xxxx1META.sav.
and xxxx2META.sav.
will see how feasible this is.

Template for the OMS.
Repeat for each of the 2 files with appropriate substitutions for filenames.

GET
  FILE='C:\Program Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
DATASET DECLARE  VarLabels1.
DATASET DECLARE  VarInfo1.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarLabels1' VIEWER=NO.
* OMS.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarInfo1' VIEWER=NO.
DISPLAY DICTIONARY.
OMSEND.
OMSEND.


John F Hall wrote
Bruce, Jon, David, Andy
 
I’ll have a shot with the solutions suggested, possibly with a tweak or two.  
 
Another approach would be to display both Data Editors side by side with both Names columns visible, then systematically cut them halfway, quarter-way etc to see if the line numbers are the same (a variation on an EDT trick I used way back when to check for missing or duplicate records in raw data).
 
I should explain that the BSAS now has 32 waves with many variables replicated in several years using the same mnemonic varnames.  A major problem arose from incompatible formats for the same variable in different waves, but others included 1) anything up to seven values to be treated as missing, 2) missing values labelled, but not declared, 3) other metadata incorrect or incomplete.  I spent several months last year resolving these to create a cumulative “mother” *.sav file for the first 31 waves.  
 
The waves were edited and added in reverse year order.  For a detailed account see: http://surveyresearch.weebly.com/british-social-attitudes-1983-onwards-cumulative-spss-file.html  and http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_the_distributed_spss_file_for_british_social_attitudes_2011.pdf
The *.sav file for 2015 (wave 32) has many of the same metadata problems and, in addition to identifying and dealing with unique variables, will need dozens of transformations of variables common to the 1983-2014 “mother” file.  This needs to be done before merging with the mother file.
 
Users acknowledge this as a valuable resource for teachers, researchers and students: one senior Professor has already described the undertaking as Herculean, but even that is an understatement.
 
John F Hall (Mr)
[Retired academic survey researcher]
 
Email:   [hidden email] <mailto:[hidden email]>  
Website: www.surveyresearch.weebly.com <http://www.surveyresearch.weebly.com/> 
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html> 
 
 
 
 

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Jon Peck
For existing variables, this seems to be a long way of doing what the COMPARE DATASETS command does.

On Tue, Mar 7, 2017 at 7:09 AM, David Marso <[hidden email]> wrote:
OK John,
Here's what I would do.

Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
SORT these files and MATCH them .
Should be trivial to discern the discrepencies.
Would it be possible for you to post the 2 metadata files (without data) to
this thread (just the metadata).

GET FLE xxxxx1.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx1META.sav.
GET FLE xxxxx2.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx2META.sav.

Attach xxxx1META.sav.
and xxxx2META.sav.
will see how feasible this is.

Template for the OMS.
Repeat for each of the 2 files with appropriate substitutions for filenames.

GET
  FILE='C:\Program
Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
DATASET DECLARE  VarLabels1.
DATASET DECLARE  VarInfo1.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarLabels1' VIEWER=NO.
* OMS.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarInfo1' VIEWER=NO.
DISPLAY DICTIONARY.
OMSEND.
OMSEND.



John F Hall wrote
> Bruce, Jon, David, Andy
>
> I’ll have a shot with the solutions suggested, possibly with a tweak or
> two.
>
> Another approach would be to display both Data Editors side by side with
> both Names columns visible, then systematically cut them halfway,
> quarter-way etc to see if the line numbers are the same (a variation on an
> EDT trick I used way back when to check for missing or duplicate records
> in raw data).
>
> I should explain that the BSAS now has 32 waves with many variables
> replicated in several years using the same mnemonic varnames.  A major
> problem arose from incompatible formats for the same variable in different
> waves, but others included 1) anything up to seven values to be treated as
> missing, 2) missing values labelled, but not declared, 3) other metadata
> incorrect or incomplete.  I spent several months last year resolving these
> to create a cumulative “mother” *.sav file for the first 31 waves.
>
> The waves were edited and added in reverse year order.  For a detailed
> account see:
> http://surveyresearch.weebly.com/british-social-attitudes-1983-onwards-cumulative-spss-file.html
> and
> http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_the_distributed_spss_file_for_british_social_attitudes_2011.pdf
> The *.sav file for 2015 (wave 32) has many of the same metadata problems
> and, in addition to identifying and dealing with unique variables, will
> need dozens of transformations of variables common to the 1983-2014
> “mother” file.  This needs to be done before merging with the mother file.
>
> Users acknowledge this as a valuable resource for teachers, researchers
> and students: one senior Professor has already described the undertaking
> as Herculean, but even that is an understatement.
>
> John F Hall (Mr)
> [Retired academic survey researcher]
>
> Email:

> johnfhall@

>  &lt;mailto:

> johnfhall@

> &gt;
> Website: www.surveyresearch.weebly.com
> &lt;http://www.surveyresearch.weebly.com/&gt;
> SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop
> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&gt;
>
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733960.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

David Marso
Administrator
I just tried COMPARE DATASETS and it does give a nice general overview of data discrepancies.
I'll bet combined with OMS one could use the results to build syntax to modify the target file to conform to the base.
---
Jon Peck wrote
For existing variables, this seems to be a long way of doing what the
COMPARE DATASETS command does.

On Tue, Mar 7, 2017 at 7:09 AM, David Marso <[hidden email]> wrote:

> OK John,
> Here's what I would do.
>
> Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
> SORT these files and MATCH them .
> Should be trivial to discern the discrepencies.
> Would it be possible for you to post the 2 metadata files (without data) to
> this thread (just the metadata).
>
> GET FLE xxxxx1.
> SELECT IF $CASENUM= 1.
> SAVE OUTFILE xxxx1META.sav.
> GET FLE xxxxx2.
> SELECT IF $CASENUM= 1.
> SAVE OUTFILE xxxx2META.sav.
>
> Attach xxxx1META.sav.
> and xxxx2META.sav.
> will see how feasible this is.
>
> Template for the OMS.
> Repeat for each of the 2 files with appropriate substitutions for
> filenames.
>
> GET
>   FILE='C:\Program
> Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
> DATASET DECLARE  VarLabels1.
> DATASET DECLARE  VarInfo1.
> OMS
>   /SELECT TABLES
>   /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
>   /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
>    OUTFILE='VarLabels1' VIEWER=NO.
> * OMS.
> OMS
>   /SELECT TABLES
>   /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
>   /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
>    OUTFILE='VarInfo1' VIEWER=NO.
> DISPLAY DICTIONARY.
> OMSEND.
> OMSEND.
>
>
>
> John F Hall wrote
> > Bruce, Jon, David, Andy
> >
> > I’ll have a shot with the solutions suggested, possibly with a tweak or
> > two.
> >
> > Another approach would be to display both Data Editors side by side with
> > both Names columns visible, then systematically cut them halfway,
> > quarter-way etc to see if the line numbers are the same (a variation on
> an
> > EDT trick I used way back when to check for missing or duplicate records
> > in raw data).
> >
> > I should explain that the BSAS now has 32 waves with many variables
> > replicated in several years using the same mnemonic varnames.  A major
> > problem arose from incompatible formats for the same variable in
> different
> > waves, but others included 1) anything up to seven values to be treated
> as
> > missing, 2) missing values labelled, but not declared, 3) other metadata
> > incorrect or incomplete.  I spent several months last year resolving
> these
> > to create a cumulative “mother” *.sav file for the first 31 waves.
> >
> > The waves were edited and added in reverse year order.  For a detailed
> > account see:
> > http://surveyresearch.weebly.com/british-social-attitudes-
> 1983-onwards-cumulative-spss-file.html
> > and
> > http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/
> comments_on_the_distributed_spss_file_for_british_social_
> attitudes_2011.pdf
> > The *.sav file for 2015 (wave 32) has many of the same metadata problems
> > and, in addition to identifying and dealing with unique variables, will
> > need dozens of transformations of variables common to the 1983-2014
> > “mother” file.  This needs to be done before merging with the mother
> file.
> >
> > Users acknowledge this as a valuable resource for teachers, researchers
> > and students: one senior Professor has already described the undertaking
> > as Herculean, but even that is an understatement.
> >
> > John F Hall (Mr)
> > [Retired academic survey researcher]
> >
> > Email:
>
> > johnfhall@
>
> >  <mailto:
>
> > johnfhall@
>
> > >
> > Website: www.surveyresearch.weebly.com
> > <http://www.surveyresearch.weebly.com/>
> > SPSS start page:  www.surveyresearch.weebly.com/
> 1-survey-analysis-workshop
> > <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html>
> >
> >
> >
> >
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
>
> > LISTSERV@.UGA
>
> >  (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context: http://spssx-discussion.
> 1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733960.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Jon K Peck
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

John F Hall
In reply to this post by Jon Peck

Jon, David

 

This is going to take me some time.  I tried COMPARE DATASETS but it didn’t give me what I needed.  I’ll try David’s suggestion of OMS with DISPLAY DICTIONARY, but SORT on which variable(s)?   I’m also going to eyeball the 201 variables identified by ADD FILES ~~~/MAP as being unique to 2015.

In creating the existing cumulative mother file I recoded all positive missing values to negative, eg:

RECODE <varlist> (8=-8)(9=-9) (98=-98) (99=-99)(998=-998)( (999=-999).     [etc. etc.]

 

Some admin/sampling/filter variables already had -3, -2,-1. Thus:

MISSING VALUES <varlist> (lo thru -1).

 

Some variables also had 7, 97 and 997.

RECODE <varlist> (7=-7)(8=-8)(9=-9)(97=-97) (98=-98)(99=-99)(997=-997)(998=-998)(999=-999).

 

For some variables 0 was also a missing value, thus:          

MISSING VALUES <varlist> (lo thru 0).

 

ADD VALUE LABELS was used in combination with the RECODE commands, but this left ghost labels.  Jon Peck provided some amazing Python code to semi-automate the above process which I used in combination with Excel: too complex to describe here, but it worked and saved me weeks of time.  For many variables, measurement levels were incorrect, possibly as a result of automated processing, but they still had to be identified and corrected with VARIABLE LEVELS.

 

The 2015 file still has positive missing values for variables shared with the 1983-2014 mother file: these will have to be identified and recoded as above.  Most of them will be 5-point Agree-Disagree items.  For this and other tasks I can rerun all the syntax I used for:

 

RECODE

ADD VALUE LABELS

VARIABLE LEVELS

 

Meanwhile I also have over 320,000 lines of syntax produced from the mother file (by Stats/Transfer during a free trial period).  It includes:

 

FILE HANDLE                     (line 8)

DATA LIST                           (lines 10-3599)  

FORMATS                           (lines 3602-3612)

VARIABLE LABELS             (lines 3614-36176)

VALUE LABELS                   (lines 36179-117393)

. . and user-defined

MISSING VALUES              (lines 117395-121329)

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: 07 March 2017 15:28
To: [hidden email]
Subject: Re: Unique variable names

 

For existing variables, this seems to be a long way of doing what the COMPARE DATASETS command does.

 

On Tue, Mar 7, 2017 at 7:09 AM, David Marso <[hidden email]> wrote:

OK John,
Here's what I would do.

Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
SORT these files and MATCH them .
Should be trivial to discern the discrepencies.
Would it be possible for you to post the 2 metadata files (without data) to
this thread (just the metadata).

GET FLE xxxxx1.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx1META.sav.
GET FLE xxxxx2.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx2META.sav.

Attach xxxx1META.sav.
and xxxx2META.sav.
will see how feasible this is.

Template for the OMS.
Repeat for each of the 2 files with appropriate substitutions for filenames.

GET
  FILE='C:\Program
Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
DATASET DECLARE  VarLabels1.
DATASET DECLARE  VarInfo1.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarLabels1' VIEWER=NO.
* OMS.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarInfo1' VIEWER=NO.
DISPLAY DICTIONARY.
OMSEND.
OMSEND.



John F Hall wrote

> Bruce, Jon, David, Andy


>
> I’ll have a shot with the solutions suggested, possibly with a tweak or
> two.
>
> Another approach would be to display both Data Editors side by side with
> both Names columns visible, then systematically cut them halfway,
> quarter-way etc to see if the line numbers are the same (a variation on an
> EDT trick I used way back when to check for missing or duplicate records
> in raw data).
>
> I should explain that the BSAS now has 32 waves with many variables
> replicated in several years using the same mnemonic varnames.  A major
> problem arose from incompatible formats for the same variable in different
> waves, but others included 1) anything up to seven values to be treated as
> missing, 2) missing values labelled, but not declared, 3) other metadata
> incorrect or incomplete.  I spent several months last year resolving these
> to create a cumulative “mother” *.sav file for the first 31 waves.
>
> The waves were edited and added in reverse year order.  For a detailed
> account see:
> http://surveyresearch.weebly.com/british-social-attitudes-1983-onwards-cumulative-spss-file.html
> and
> http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_the_distributed_spss_file_for_british_social_attitudes_2011.pdf
> The *.sav file for 2015 (wave 32) has many of the same metadata problems
> and, in addition to identifying and dealing with unique variables, will
> need dozens of transformations of variables common to the 1983-2014
> “mother” file.  This needs to be done before merging with the mother file.
>
> Users acknowledge this as a valuable resource for teachers, researchers
> and students: one senior Professor has already described the undertaking
> as Herculean, but even that is an understatement.
>
> John F Hall (Mr)
> [Retired academic survey researcher]
>
> Email:

> johnfhall@

>  &lt;mailto:

> johnfhall@


> &gt;
> Website: www.surveyresearch.weebly.com
> &lt;http://www.surveyresearch.weebly.com/&gt;
> SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop
> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&gt;
>
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733960.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

David Marso
Administrator
" I tried COMPARE DATASETS but it didn’t give me what I needed."

Please elaborate.  How is it NOT giviung youy what you needed?  Perhaps you need to step back and make that very explicit.

From what I can tell it gives you explicit analyses of the 2 sets of metadata and reports ALL of the discrepencies.
 
If you go with OMS DISPLAY Dictionary it doesn't really give you ANYTHING more than OMS with COMPARE DATASETS  aside from YOU having to write code to mop up after the MATCH.

Sort on what variables?  What would imagine would be necessary if you want to align the two sets of metadata?  

Please inspect the OMS results and scratch your head abit.  OTOH I think COMPARE DATASETS  is likely the better way to go.

John F Hall wrote
Jon, David
 
This is going to take me some time.  I tried COMPARE DATASETS but it didn’t give me what I needed.  I’ll try David’s suggestion of OMS with DISPLAY DICTIONARY, but SORT on which variable(s)?   I’m also going to eyeball the 201 variables identified by ADD FILES ~~~/MAP as being unique to 2015.
In creating the existing cumulative mother file I recoded all positive missing values to negative, eg:
RECODE <varlist> (8=-8)(9=-9) (98=-98) (99=-99)(998=-998)( (999=-999).     [etc. etc.]
 
Some admin/sampling/filter variables already had -3, -2,-1. Thus:
MISSING VALUES <varlist> (lo thru -1).
 
Some variables also had 7, 97 and 997.
RECODE <varlist> (7=-7)(8=-8)(9=-9)(97=-97) (98=-98)(99=-99)(997=-997)(998=-998)(999=-999).
 
For some variables 0 was also a missing value, thus:          
MISSING VALUES <varlist> (lo thru 0).
 
ADD VALUE LABELS was used in combination with the RECODE commands, but this left ghost labels.  Jon Peck provided some amazing Python code to semi-automate the above process which I used in combination with Excel: too complex to describe here, but it worked and saved me weeks of time.  For many variables, measurement levels were incorrect, possibly as a result of automated processing, but they still had to be identified and corrected with VARIABLE LEVELS.
 
The 2015 file still has positive missing values for variables shared with the 1983-2014 mother file: these will have to be identified and recoded as above.  Most of them will be 5-point Agree-Disagree items.  For this and other tasks I can rerun all the syntax I used for:
 
RECODE
ADD VALUE LABELS
VARIABLE LEVELS
 
Meanwhile I also have over 320,000 lines of syntax produced from the mother file (by Stats/Transfer during a free trial period).  It includes:
 
FILE HANDLE                     (line 8)
DATA LIST                           (lines 10-3599)  
FORMATS                           (lines 3602-3612)
VARIABLE LABELS             (lines 3614-36176)
VALUE LABELS                   (lines 36179-117393)
. . and user-defined
MISSING VALUES              (lines 117395-121329)
 
John F Hall (Mr)
[Retired academic survey researcher]
 
Email:   [hidden email] <mailto:[hidden email]>  
Website: www.surveyresearch.weebly.com <http://www.surveyresearch.weebly.com/> 
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html> 
 
 
 
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: 07 March 2017 15:28
To: [hidden email]
Subject: Re: Unique variable names
 
For existing variables, this seems to be a long way of doing what the COMPARE DATASETS command does.
 
On Tue, Mar 7, 2017 at 7:09 AM, David Marso <[hidden email] <mailto:[hidden email]> > wrote:
OK John,
Here's what I would do.

Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
SORT these files and MATCH them .
Should be trivial to discern the discrepencies.
Would it be possible for you to post the 2 metadata files (without data) to
this thread (just the metadata).

GET FLE xxxxx1.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx1META.sav.
GET FLE xxxxx2.
SELECT IF $CASENUM= 1.
SAVE OUTFILE xxxx2META.sav.

Attach xxxx1META.sav.
and xxxx2META.sav.
will see how feasible this is.

Template for the OMS.
Repeat for each of the 2 files with appropriate substitutions for filenames.

GET
  FILE='C:\Program
Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
DATASET DECLARE  VarLabels1.
DATASET DECLARE  VarInfo1.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarLabels1' VIEWER=NO.
* OMS.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE='VarInfo1' VIEWER=NO.
DISPLAY DICTIONARY.
OMSEND.
OMSEND.



John F Hall wrote
> Bruce, Jon, David, Andy
>
> I’ll have a shot with the solutions suggested, possibly with a tweak or
> two.
>
> Another approach would be to display both Data Editors side by side with
> both Names columns visible, then systematically cut them halfway,
> quarter-way etc to see if the line numbers are the same (a variation on an
> EDT trick I used way back when to check for missing or duplicate records
> in raw data).
>
> I should explain that the BSAS now has 32 waves with many variables
> replicated in several years using the same mnemonic varnames.  A major
> problem arose from incompatible formats for the same variable in different
> waves, but others included 1) anything up to seven values to be treated as
> missing, 2) missing values labelled, but not declared, 3) other metadata
> incorrect or incomplete.  I spent several months last year resolving these
> to create a cumulative “mother” *.sav file for the first 31 waves.
>
> The waves were edited and added in reverse year order.  For a detailed
> account see:
> http://surveyresearch.weebly.com/british-social-attitudes-1983-onwards-cumulative-spss-file.html
> and
> http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_the_distributed_spss_file_for_british_social_attitudes_2011.pdf
> The *.sav file for 2015 (wave 32) has many of the same metadata problems
> and, in addition to identifying and dealing with unique variables, will
> need dozens of transformations of variables common to the 1983-2014
> “mother” file.  This needs to be done before merging with the mother file.
>
> Users acknowledge this as a valuable resource for teachers, researchers
> and students: one senior Professor has already described the undertaking
> as Herculean, but even that is an understatement.
>
> John F Hall (Mr)
> [Retired academic survey researcher]
>
> Email:

> johnfhall@
>  <mailto:

> johnfhall@

> >
> Website: www.surveyresearch.weebly.com <http://www.surveyresearch.weebly.com> 
> <http://www.surveyresearch.weebly.com/ <http://www.surveyresearch.weebly.com/&gt> >
> SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop <http://www.surveyresearch.weebly.com/1-survey-analysis-workshop> 
> <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&gt> >
>
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA <mailto:LISTSERV@.UGA>

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733960.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] <mailto:[hidden email]>  (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email] <mailto:[hidden email]>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] <mailto:[hidden email]>  (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

John F Hall
Just tried COMPARE DATASETS again, but it looks at cases not variables.  Also I can't get it to run or PASTE.  The first time I ran it I got an enormous list of variables which I found completely meaningless.  I got what I need from the following syntax based on Bruce's suggestion.

GET
  FILE='C:\data\4_Research\4 Surveys\British Social Attitudes\bsa15c.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
GET
  FILE='C:\data\4_Research\4 Surveys\British Social Attitudes\mdlast\bsa1983-2014mother16.sav'.
DATASET NAME DataSet2 WINDOW=FRONT.
NUMERIC @LastD1var@ (F1).
ADD FILES
 FILE = Dataset2 /
 FILE = Dataset1 /
 MAP.

Map of the result file

Result                           Input1                           Input2
------                           ------                           ------

caseid                           caseid                           caseid
year                             year                             year
yearorder                        yearorder
waveorder                        waveorder                   waveorder
Serial                           Serial
SSerial                           SSerial                          SSerial
SPoint                           SPoint                           SPoint

~ ~ ~ ~ ~ [Gaps in this column because of my derived vars]
zDcldrnam                        zDcldrnam
zxnamdcbc                        zxnamdcbc

@LastD1var@                 @LastD1var@ [Bruce's marker variable]

[Variables after this point are unique to 2015]
URINDEW                                                           URINDEW
URINDSC                                                             URINDSC
RAgecat4                                                             RAgecat4
RInEduc                                                                RInEduc
~ ~ ~
qsimd                                                                  qsimd
dsimd                                                                  dsimd

EXECUTE.
DATASET NAME d1d2.

This is what gave me the unique variable names.  

DISP LAB  var urindew to dsimd.

Variable Position Label
URINDEW 10776 Urban/Rural Indicator 2011 (England and Wales)
URINDSC 10777 Urban/Rural Indicator 2011 (Scotland)
RAgecat4 10778 Age of respondent (grouped) (7 categories) dv
~ ~ ~
chngwk 10934 Agree/disagree: given the chance I would change present type of work: Versions B, D
proudwk 10935 Agree/disagree: I am proud of the type of work I do: Versions B, D
avunemp5 10936 Agree/disagree: Willing to move within Britain to avoid unemployment: Versions B, D
avunemp6 10937 Agree/disagree: Willing to move abroad to avoid unemployment: Versions B, D
~ ~ ~
WhyDis1 10825 Reason for NHS dissatisfaction: quality of NHS care: Version B
WhyDis2 10826 Reason for NHS dissatisfaction: long wait for appointment: Version B
WhyDis3 10827 Reason for NHS dissatisfaction: attitudes/behaviour of staff: Version B
~ ~ ~
qwimd 10971 Wales: IMD 2011  - Quintiles
dwimd 10972 Wales: WIMD 2011  - Deciles
qsimd 10973 Scottish Index of Multiple Deprivation quintiles
dsimd 10974 Scottish Index of Multiple Deprivation deciles
       

On preliminary inspection of the var labels, some of them look suspiciously like the same variables as in the mother file, but with different names.  I sincerely hope not, but I'll check.  As well as 1-5 or 1-7 rating scales,  a lot of them have values which are binary codes for multiple responses, so even with 201 variables it's not an enormous task to set levels, missing values and recode, add labels etc. substituting varnames in my original syntax.  Let me crack on with this and I'll post short accounts to the list of how I eventually fare.

John F Hall (Mr)
[Retired academic survey researcher]

Email:   [hidden email]  
Website: www.surveyresearch.weebly.com
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop





-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: 07 March 2017 21:33
To: [hidden email]
Subject: Re: Unique variable names

" I tried COMPARE DATASETS but it didn’t give me what I needed."

Please elaborate.  How is it NOT giviung youy what you needed?  Perhaps you need to step back and make that very explicit.

From what I can tell it gives you explicit analyses of the 2 sets of metadata and reports ALL of the discrepencies.
 
If you go with OMS DISPLAY Dictionary it doesn't really give you ANYTHING more than OMS with COMPARE DATASETS  aside from YOU having to write code to mop up after the MATCH.

Sort on what variables?  What would imagine would be necessary if you want to align the two sets of metadata?  

Please inspect the OMS results and scratch your head abit.  OTOH I think COMPARE DATASETS  is likely the better way to go.


John F Hall wrote

> Jon, David
>  
> This is going to take me some time.  I tried COMPARE DATASETS but it
> didn’t give me what I needed.  I’ll try David’s suggestion of OMS with
> DISPLAY DICTIONARY, but SORT on which variable(s)?   I’m also going to
> eyeball the 201 variables identified by ADD FILES ~~~/MAP as being
> unique to 2015.
> In creating the existing cumulative mother file I recoded all positive
> missing values to negative, eg:
> RECODE
> <varlist>
>  (8=-8)(9=-9) (98=-98) (99=-99)(998=-998)( (999=-999).     [etc. etc.]
>  
> Some admin/sampling/filter variables already had -3, -2,-1. Thus:
> MISSING VALUES
> <varlist>
>  (lo thru -1).
>  
> Some variables also had 7, 97 and 997.
> RECODE
> <varlist>
>  (7=-7)(8=-8)(9=-9)(97=-97)
> (98=-98)(99=-99)(997=-997)(998=-998)(999=-999).
>  
> For some variables 0 was also a missing value, thus:          
> MISSING VALUES
> <varlist>
>  (lo thru 0).
>  
> ADD VALUE LABELS was used in combination with the RECODE commands, but
> this left ghost labels.  Jon Peck provided some amazing Python code to
> semi-automate the above process which I used in combination with Excel:
> too complex to describe here, but it worked and saved me weeks of time.
> For many variables, measurement levels were incorrect, possibly as a
> result of automated processing, but they still had to be identified
> and corrected with VARIABLE LEVELS.
>  
> The 2015 file still has positive missing values for variables shared
> with the 1983-2014 mother file: these will have to be identified and
> recoded as above.  Most of them will be 5-point Agree-Disagree items.  
> For this and other tasks I can rerun all the syntax I used for:
>  
> RECODE
> ADD VALUE LABELS
> VARIABLE LEVELS
>  
> Meanwhile I also have over 320,000 lines of syntax produced from the
> mother file (by Stats/Transfer during a free trial period).  It includes:
>  
> FILE HANDLE                     (line 8)
> DATA LIST                           (lines 10-3599)  
> FORMATS                           (lines 3602-3612)
> VARIABLE LABELS             (lines 3614-36176)
> VALUE LABELS                   (lines 36179-117393)
> . . and user-defined
> MISSING VALUES              (lines 117395-121329)
>  
> John F Hall (Mr)
> [Retired academic survey researcher]
>  
> Email:  

> johnfhall@

>  &lt;mailto:

> johnfhall@

> &gt;  
> Website: www.surveyresearch.weebly.com
> &lt;http://www.surveyresearch.weebly.com/&gt;
> SPSS start page:  
> www.surveyresearch.weebly.com/1-survey-analysis-workshop
> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&g
> t;
>  
>  
>  
> From: SPSSX(r) Discussion [mailto:

> SPSSX-L@.UGA

> ] On Behalf Of Jon Peck
> Sent: 07 March 2017 15:28
> To:

> SPSSX-L@.UGA

> Subject: Re: Unique variable names
>  
> For existing variables, this seems to be a long way of doing what the
> COMPARE DATASETS command does.
>  
> On Tue, Mar 7, 2017 at 7:09 AM, David Marso &lt;

> david.marso@

>  &lt;mailto:

> david.marso@

> &gt; > wrote:
> OK John,
> Here's what I would do.
>
> Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
> SORT these files and MATCH them .
> Should be trivial to discern the discrepencies.
> Would it be possible for you to post the 2 metadata files (without
> data) to this thread (just the metadata).
>
> GET FLE xxxxx1.
> SELECT IF $CASENUM= 1.
> SAVE OUTFILE xxxx1META.sav.
> GET FLE xxxxx2.
> SELECT IF $CASENUM= 1.
> SAVE OUTFILE xxxx2META.sav.
>
> Attach xxxx1META.sav.
> and xxxx2META.sav.
> will see how feasible this is.
>
> Template for the OMS.
> Repeat for each of the 2 files with appropriate substitutions for
> filenames.
>
> GET
>   FILE='C:\Program
> Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
> DATASET DECLARE  VarLabels1.
> DATASET DECLARE  VarInfo1.
> OMS
>   /SELECT TABLES
>   /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
>   /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
>    OUTFILE='VarLabels1' VIEWER=NO.
> * OMS.
> OMS
>   /SELECT TABLES
>   /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
>   /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
>    OUTFILE='VarInfo1' VIEWER=NO.
> DISPLAY DICTIONARY.
> OMSEND.
> OMSEND.
>
>
>
> John F Hall wrote
>> Bruce, Jon, David, Andy
>>
>> I’ll have a shot with the solutions suggested, possibly with a tweak
>> or two.
>>
>> Another approach would be to display both Data Editors side by side
>> with both Names columns visible, then systematically cut them
>> halfway, quarter-way etc to see if the line numbers are the same (a
>> variation on an EDT trick I used way back when to check for missing
>> or duplicate records in raw data).
>>
>> I should explain that the BSAS now has 32 waves with many variables
>> replicated in several years using the same mnemonic varnames.  A
>> major problem arose from incompatible formats for the same variable
>> in different waves, but others included 1) anything up to seven
>> values to be treated as missing, 2) missing values labelled, but not
>> declared, 3) other metadata incorrect or incomplete.  I spent several
>> months last year resolving these to create a cumulative “mother”
>> *.sav file for the first 31 waves.
>>
>> The waves were edited and added in reverse year order.  For a
>> detailed account see:
>> http://surveyresearch.weebly.com/british-social-attitudes-1983-onward
>> s-cumulative-spss-file.html
>> and
>> http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_
>> the_distributed_spss_file_for_british_social_attitudes_2011.pdf
>> The *.sav file for 2015 (wave 32) has many of the same metadata
>> problems and, in addition to identifying and dealing with unique
>> variables, will need dozens of transformations of variables common to
>> the 1983-2014 “mother” file.  This needs to be done before merging
>> with the mother file.
>>
>> Users acknowledge this as a valuable resource for teachers,
>> researchers and students: one senior Professor has already described
>> the undertaking as Herculean, but even that is an understatement.
>>
>> John F Hall (Mr)
>> [Retired academic survey researcher]
>>
>> Email:
>
>> johnfhall@
>>  &lt;mailto:
>
>> johnfhall@
>
>> &gt;
>> Website: www.surveyresearch.weebly.com
>> &lt;http://www.surveyresearch.weebly.com&gt;
>> &lt;http://www.surveyresearch.weebly.com/
>> &lt;http://www.surveyresearch.weebly.com/&amp;gt&gt; &gt; SPSS start
>> page:
>> www.surveyresearch.weebly.com/1-survey-analysis-workshop
>> &lt;http://www.surveyresearch.weebly.com/1-survey-analysis-workshop&g
>> t;
>> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html
>> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&
>> amp;gt&gt;
>> &gt;
>>
>>
>>
>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA &lt;mailto:LISTSERV@.UGA&gt;
>
>>  (not to SPSSX-L), with no body text except the command. To leave the
>> list, send the command SIGNOFF SPSSX-L For a list of commands to
>> manage subscriptions, send the command INFO REFCARD
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
> porcos ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum
> cliff in abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp
> 5733946p5733960.html Sent from the SPSSX Discussion mailing list
> archive at Nabble.com.
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  &lt;mailto:

> LISTSERV@.UGA

> &gt;  (not to SPSSX-L), with no body text except the command. To leave
> the list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD
>
>
>
> --
> Jon K Peck

> jkpeck@

>  &lt;mailto:

> jkpeck@

> &gt;
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

>  &lt;mailto:

> LISTSERV@.UGA

> &gt;  (not to SPSSX-L), with no body text except the command. To leave
> the list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733969.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

David Marso
Administrator
John,
Please attach the metadata for the 2 files and I'll take a look at them.
D.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Unique variable names

Jon Peck
In reply to this post by John F Hall
"Just tried COMPARE DATASETS again, but it looks at cases not variables. "

This is not correct.  You  specify what variable properties in the two files to compare, and you can omit the  case comparisons.  For example,
COMPARE DATASETS
  /COMPDATASET = 'c:\temp\empdate.sav'
  /VARIABLES  ALL
  /SAVE FLAGMISMATCHES=NO MATCHDATASET=NO MISMATCHDATASET=NO
  /OUTPUT VARPROPERTIES=VALUELABELS MISSING MEASURE ROLE  CASETABLE=NO TABLELIMIT=1.

On Tue, Mar 7, 2017 at 11:51 PM, John F Hall <[hidden email]> wrote:
Just tried COMPARE DATASETS again, but it looks at cases not variables.  Also I can't get it to run or PASTE.  The first time I ran it I got an enormous list of variables which I found completely meaningless.  I got what I need from the following syntax based on Bruce's suggestion.

GET
  FILE='C:\data\4_Research\4 Surveys\British Social Attitudes\bsa15c.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
GET
  FILE='C:\data\4_Research\4 Surveys\British Social Attitudes\mdlast\bsa1983-2014mother16.sav'.
DATASET NAME DataSet2 WINDOW=FRONT.
NUMERIC @LastD1var@ (F1).
ADD FILES
 FILE = Dataset2 /
 FILE = Dataset1 /
 MAP.

Map of the result file

Result                           Input1                           Input2
------                           ------                           ------

caseid                                  caseid                           caseid
year                                     year                             year
yearorder                        yearorder
waveorder                        waveorder                   waveorder
Serial                                   Serial
SSerial                                 SSerial                          SSerial
SPoint                                  SPoint                           SPoint

~       ~       ~       ~       ~       [Gaps in this column because of my derived vars]
zDcldrnam                        zDcldrnam
zxnamdcbc                        zxnamdcbc

@LastD1var@                 @LastD1var@ [Bruce's marker variable]

[Variables after this point are unique to 2015]
URINDEW                                                           URINDEW
URINDSC                                                             URINDSC
RAgecat4                                                             RAgecat4
RInEduc                                                                RInEduc
~       ~       ~
qsimd                                                                  qsimd
dsimd                                                                  dsimd

EXECUTE.
DATASET NAME d1d2.

This is what gave me the unique variable names.

DISP LAB  var urindew to dsimd.

Variable        Position        Label
URINDEW 10776   Urban/Rural Indicator 2011 (England and Wales)
URINDSC 10777   Urban/Rural Indicator 2011 (Scotland)
RAgecat4        10778   Age of respondent (grouped) (7 categories) dv
~       ~       ~
chngwk          10934   Agree/disagree: given the chance I would change present type of work: Versions B, D
proudwk 10935   Agree/disagree: I am proud of the type of work I do: Versions B, D
avunemp5        10936   Agree/disagree: Willing to move within Britain to avoid unemployment: Versions B, D
avunemp6        10937   Agree/disagree: Willing to move abroad to avoid unemployment: Versions B, D
~       ~       ~
WhyDis1 10825   Reason for NHS dissatisfaction: quality of NHS care: Version B
WhyDis2 10826   Reason for NHS dissatisfaction: long wait for appointment: Version B
WhyDis3 10827   Reason for NHS dissatisfaction: attitudes/behaviour of staff: Version B
~       ~       ~
qwimd           10971   Wales: IMD 2011  - Quintiles
dwimd           10972   Wales: WIMD 2011  - Deciles
qsimd           10973   Scottish Index of Multiple Deprivation quintiles
dsimd           10974   Scottish Index of Multiple Deprivation deciles


On preliminary inspection of the var labels, some of them look suspiciously like the same variables as in the mother file, but with different names.  I sincerely hope not, but I'll check.  As well as 1-5 or 1-7 rating scales,  a lot of them have values which are binary codes for multiple responses, so even with 201 variables it's not an enormous task to set levels, missing values and recode, add labels etc. substituting varnames in my original syntax.  Let me crack on with this and I'll post short accounts to the list of how I eventually fare.

John F Hall (Mr)
[Retired academic survey researcher]

Email:   [hidden email]
Website: www.surveyresearch.weebly.com
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop





-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: 07 March 2017 21:33
To: [hidden email]
Subject: Re: Unique variable names

" I tried COMPARE DATASETS but it didn’t give me what I needed."

Please elaborate.  How is it NOT giviung youy what you needed?  Perhaps you need to step back and make that very explicit.

From what I can tell it gives you explicit analyses of the 2 sets of metadata and reports ALL of the discrepencies.

If you go with OMS DISPLAY Dictionary it doesn't really give you ANYTHING more than OMS with COMPARE DATASETS  aside from YOU having to write code to mop up after the MATCH.

Sort on what variables?  What would imagine would be necessary if you want to align the two sets of metadata?

Please inspect the OMS results and scratch your head abit.  OTOH I think COMPARE DATASETS  is likely the better way to go.


John F Hall wrote
> Jon, David
>
> This is going to take me some time.  I tried COMPARE DATASETS but it
> didn’t give me what I needed.  I’ll try David’s suggestion of OMS with
> DISPLAY DICTIONARY, but SORT on which variable(s)?   I’m also going to
> eyeball the 201 variables identified by ADD FILES ~~~/MAP as being
> unique to 2015.
> In creating the existing cumulative mother file I recoded all positive
> missing values to negative, eg:
> RECODE
> <varlist>
>  (8=-8)(9=-9) (98=-98) (99=-99)(998=-998)( (999=-999).     [etc. etc.]
>
> Some admin/sampling/filter variables already had -3, -2,-1. Thus:
> MISSING VALUES
> <varlist>
>  (lo thru -1).
>
> Some variables also had 7, 97 and 997.
> RECODE
> <varlist>
>  (7=-7)(8=-8)(9=-9)(97=-97)
> (98=-98)(99=-99)(997=-997)(998=-998)(999=-999).
>
> For some variables 0 was also a missing value, thus:
> MISSING VALUES
> <varlist>
>  (lo thru 0).
>
> ADD VALUE LABELS was used in combination with the RECODE commands, but
> this left ghost labels.  Jon Peck provided some amazing Python code to
> semi-automate the above process which I used in combination with Excel:
> too complex to describe here, but it worked and saved me weeks of time.
> For many variables, measurement levels were incorrect, possibly as a
> result of automated processing, but they still had to be identified
> and corrected with VARIABLE LEVELS.
>
> The 2015 file still has positive missing values for variables shared
> with the 1983-2014 mother file: these will have to be identified and
> recoded as above.  Most of them will be 5-point Agree-Disagree items.
> For this and other tasks I can rerun all the syntax I used for:
>
> RECODE
> ADD VALUE LABELS
> VARIABLE LEVELS
>
> Meanwhile I also have over 320,000 lines of syntax produced from the
> mother file (by Stats/Transfer during a free trial period).  It includes:
>
> FILE HANDLE                     (line 8)
> DATA LIST                           (lines 10-3599)
> FORMATS                           (lines 3602-3612)
> VARIABLE LABELS             (lines 3614-36176)
> VALUE LABELS                   (lines 36179-117393)
> . . and user-defined
> MISSING VALUES              (lines 117395-121329)
>
> John F Hall (Mr)
> [Retired academic survey researcher]
>
> Email:

> johnfhall@

>  &lt;mailto:

> johnfhall@

> &gt;
> Website: www.surveyresearch.weebly.com
> &lt;http://www.surveyresearch.weebly.com/&gt;
> SPSS start page:
> www.surveyresearch.weebly.com/1-survey-analysis-workshop
> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&g
> t;
>
>
>
> From: SPSSX(r) Discussion [mailto:

> SPSSX-L@.UGA

> ] On Behalf Of Jon Peck
> Sent: 07 March 2017 15:28
> To:

> SPSSX-L@.UGA

> Subject: Re: Unique variable names
>
> For existing variables, this seems to be a long way of doing what the
> COMPARE DATASETS command does.
>
> On Tue, Mar 7, 2017 at 7:09 AM, David Marso &lt;

> david.marso@

>  &lt;mailto:

> david.marso@

> &gt; > wrote:
> OK John,
> Here's what I would do.
>
> Use OMS with DISPLAY DICTIONARY to fetch the metadata from each file.
> SORT these files and MATCH them .
> Should be trivial to discern the discrepencies.
> Would it be possible for you to post the 2 metadata files (without
> data) to this thread (just the metadata).
>
> GET FLE xxxxx1.
> SELECT IF $CASENUM= 1.
> SAVE OUTFILE xxxx1META.sav.
> GET FLE xxxxx2.
> SELECT IF $CASENUM= 1.
> SAVE OUTFILE xxxx2META.sav.
>
> Attach xxxx1META.sav.
> and xxxx2META.sav.
> will see how feasible this is.
>
> Template for the OMS.
> Repeat for each of the 2 files with appropriate substitutions for
> filenames.
>
> GET
>   FILE='C:\Program
> Files\IBM\SPSS\Statistics\22\Samples\English\customer_dbase.sav'.
> DATASET DECLARE  VarLabels1.
> DATASET DECLARE  VarInfo1.
> OMS
>   /SELECT TABLES
>   /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
>   /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
>    OUTFILE='VarLabels1' VIEWER=NO.
> * OMS.
> OMS
>   /SELECT TABLES
>   /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
>   /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
>    OUTFILE='VarInfo1' VIEWER=NO.
> DISPLAY DICTIONARY.
> OMSEND.
> OMSEND.
>
>
>
> John F Hall wrote
>> Bruce, Jon, David, Andy
>>
>> I’ll have a shot with the solutions suggested, possibly with a tweak
>> or two.
>>
>> Another approach would be to display both Data Editors side by side
>> with both Names columns visible, then systematically cut them
>> halfway, quarter-way etc to see if the line numbers are the same (a
>> variation on an EDT trick I used way back when to check for missing
>> or duplicate records in raw data).
>>
>> I should explain that the BSAS now has 32 waves with many variables
>> replicated in several years using the same mnemonic varnames.  A
>> major problem arose from incompatible formats for the same variable
>> in different waves, but others included 1) anything up to seven
>> values to be treated as missing, 2) missing values labelled, but not
>> declared, 3) other metadata incorrect or incomplete.  I spent several
>> months last year resolving these to create a cumulative “mother”
>> *.sav file for the first 31 waves.
>>
>> The waves were edited and added in reverse year order.  For a
>> detailed account see:
>> http://surveyresearch.weebly.com/british-social-attitudes-1983-onward
>> s-cumulative-spss-file.html
>> and
>> http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/comments_on_
>> the_distributed_spss_file_for_british_social_attitudes_2011.pdf
>> The *.sav file for 2015 (wave 32) has many of the same metadata
>> problems and, in addition to identifying and dealing with unique
>> variables, will need dozens of transformations of variables common to
>> the 1983-2014 “mother” file.  This needs to be done before merging
>> with the mother file.
>>
>> Users acknowledge this as a valuable resource for teachers,
>> researchers and students: one senior Professor has already described
>> the undertaking as Herculean, but even that is an understatement.
>>
>> John F Hall (Mr)
>> [Retired academic survey researcher]
>>
>> Email:
>
>> johnfhall@
>>  &lt;mailto:
>
>> johnfhall@
>
>> &gt;
>> Website: www.surveyresearch.weebly.com
>> &lt;http://www.surveyresearch.weebly.com&gt;
>> &lt;http://www.surveyresearch.weebly.com/
>> &lt;http://www.surveyresearch.weebly.com/&amp;gt&gt; &gt; SPSS start
>> page:
>> www.surveyresearch.weebly.com/1-survey-analysis-workshop
>> &lt;http://www.surveyresearch.weebly.com/1-survey-analysis-workshop&g
>> t;
>> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html
>> &lt;http://surveyresearch.weebly.com/1-survey-analysis-workshop.html&
>> amp;gt&gt;
>> &gt;
>>
>>
>>
>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA &lt;mailto:[hidden email].UGA&gt;
>
>>  (not to SPSSX-L), with no body text except the command. To leave the
>> list, send the command SIGNOFF SPSSX-L For a list of commands to
>> manage subscriptions, send the command INFO REFCARD
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
> porcos ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum
> cliff in abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp
> 5733946p5733960.html Sent from the SPSSX Discussion mailing list
> archive at Nabble.com.
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  &lt;mailto:

> LISTSERV@.UGA

> &gt;  (not to SPSSX-L), with no body text except the command. To leave
> the list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD
>
>
>
> --
> Jon K Peck

> jkpeck@

>  &lt;mailto:

> jkpeck@

> &gt;
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

>  &lt;mailto:

> LISTSERV@.UGA

> &gt;  (not to SPSSX-L), with no body text except the command. To leave
> the list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unique-variable-names-tp5733946p5733969.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD