SPSSX Discussion

Merging databases by renaming variables

Classic

List

Threaded

20 messages Options

Bob Schacht-3

Merging databases by renaming variables

I have two largish databases (250 fields, 80-300 records) that both have
data from the same questionnaire-- one is an electronic database collected
via Zoomerang, the other built from the pencil & paper version. Most (240?)
of the fields should be identical, but some are different. Assume for
present purposes that there is no overlap of cases.

My problem is this: the field names are all different between the two
databases because they were assigned by different people at different
times. Furthermore, both databases are still 'alive'-- cases are being
added, although one will be closed as of November 30.

I guess I can use the RENAME VARIABLES command in a syntax file that will
change the variable names from one file's set of names to the other, and if
more cases come in, I can use the same syntax file with those.

I have already generated a list of the variable names, copying from the
"variable view" window to columns in an Excel spreadsheet, and from there
to a table in Word. then I mainly have to be sure that the variable names
align properly (I already know that in some cases, they don't), so that the
RENAME lists are correctly set up. Then I can use Data/Merge Files/Add
cases, right?

Any other suggestions, comments, or warnings?

Thanks,
Bob in HI

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

Hal 9000

Re: Merging databases by renaming variables

> Bob,

It sounds like you've got the right idea. Once you've got the rename vars
command set up, you can re-use it later as long as everything stays the
same. To echo the wise words of others, 'protect the original files' before
you add them/change them. Whenever possible/practical, I like to set
permissions on the original file to read-only and make all the changes using
syntax, saving out to a seperate file. Doing it this way has the benefit of
being able to easily undo changes, or scrutinize prior decisions about
what/how data was altered. The syntax file is a sort of 'RNA' for producing
current, accurate data.

-Gary

Richard Ristow

Re: Merging databases by renaming variables

In reply to this post by Bob Schacht-3

At 06:53 PM 11/17/2006, Bob Schacht asked:

>I have two largish databases (250 fields, 80-300 records) with that
>have data from the same questionnaire
>
>My problem is this: the 'field' [i.e., variable] names are all
>different between the two databases. Furthermore, both databases are
>still 'alive'-- cases are being added, although one will be closed as
>of November 30.
>
>I guess I can use the RENAME VARIABLES command in a syntax file that
>will change the variable names from one file's set of names to the
>other, and if more cases come in, I can use the same syntax file with
>those.
>
>Any other suggestions, comments, or warnings?

Your way will work; see also the advice from Hal 9000, especially the
note about protecting the original data.

My taste would be to use the /RENAME subcommand on my ADD FILES, rather
than a separate RENAME VARIABLES. That does mean writing the ADD FILES
in syntax, rather than using the menu version "Data/Merge Files/Add
cases".

Bob Schacht-3

Re: Merging databases by renaming variables

At 11:22 AM 11/20/2006, Richard Ristow wrote:
>At 06:53 PM 11/17/2006, Bob Schacht asked:
>
>>I have two largish databases (250 fields, 80-300 records) with that have
>>data from the same questionnaire
>>
>>My problem is this: the 'field' [i.e., variable] names are all different
>>between the two databases. Furthermore, both databases are still
>>'alive'-- cases are being added, although one will be closed as of November 30.

Richard and Hal,
Thank you for your previous responses. I have discovered that my problem is
a bit more complicated than this, in that the fields that are supposed to
be the same often differ in type, width or "measure" (Nominal or Scale).
See more below.

>>. . . I can use the RENAME VARIABLES command in a syntax file that will
>>change the variable names from one file's set of names to the other, and
>>if more cases come in, I can use the same syntax file with those. . . .
>
>Your way will work; see also the advice from Hal 9000, especially the note
>about protecting the original data.
>
>My taste would be to use the /RENAME subcommand on my ADD FILES, rather
>than a separate RENAME VARIABLES. That does mean writing the ADD FILES in
>syntax, rather than using the menu version "Data/Merge Files/Add cases".

Because some of the fields that are supposed to match, don't match in type,
width, or measure, do I need to handle that issue first, before using the
Add Files?

Also, I seem to recall from dialogue here on the L that changing a
variable's type is not possible with syntax, but only through the user
interface via "Variable view", and clicking on the gray box on the right
side of the "variable type" cell, which brings up a menu of variable types.
Do I remember correctly?

The way I understand it, then, is that I need to
1. Open the Variable View for the donor database
2. Manually change as needed the type, width and measure of each variable
so that it will correspond with the type, width and measure of the intended
variables in the receiving database
3. Use the Add Files command with the /Rename subcommand to match the donor
and recipient variables, and add the cases from the donor database.

Is this right? Any easier way to do it?

Thanks,
Bob

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

Richard Ristow

Re: Merging databases by renaming variables

At 02:26 PM 11/30/2006, Bob Schacht wrote:

>>>I have two largish databases (250 fields, 80-300 records) with that
>>>have
>>>data from the same questionnaire [to be concanated].
>>>
>>>The 'field' [i.e., variable] names are all different between the two
>>>databases.
>
>I have discovered that my problem is a bit more complicated, in that
>the fields that are supposed to be the same often differ in type,
>width or "measure" (Nominal or Scale).
>See more below.

I wrote that
>>my taste would be to use the /RENAME subcommand on my ADD FILES,
>>rather than a separate RENAME VARIABLES.
>
>Because some of the fields that are supposed to match, don't match in
>type, width, or measure, do I need to handle that issue first, before
>using the Add Files?

No. As I wrote before, having incompatible variable names actually
makes this problem easier.

Suppose you have File_A and File_B. Variable names for corresponding
quantities are different; for example, the answer to question 1 is
called A_Ques1 in file A, and B_Ques1 in file B. (Of course, the
correspondence of names wouldn't be that neat.) Suppose that you want
to keep the variable names and attributes as they are in File_A.

Then,

ADD FILES
/FILE=File_A
/FILE=File_B.

For variables with identical types and lengths in the two files (don't
worry about measurement level), add RENAME clauses for File_B. If, say,
variables for questions 1 and 3 match this way,

ADD FILES
/FILE=File_A
/FILE=File_B/RENAME=(B_Ques1, B_Ques3
= A_Ques1, A_Ques3).

For variables that don't have identical types and lengths, after the
ADD FILES command,

* If they're both strings and the lengths are different, use simple
COMPUTE statements:
. COMPUTE A_Ques2 = B_Ques2.
(But be careful of losing text if the variable from file B is longer.)

* If the variable in file A is numeric and the variable from file B is
character, say two spaces long,
. COMPUTE A_Ques4 = NUMBER(B_Ques4,F2).
(But be careful of any values in file B that can't be converted to
numbers.)

* Similarly, if the variable in file A is string and the variable from
file B is numeric,
. COMPUTE A_Ques6 = STRING(B_Ques6,F2).

After, you may want to delete all variables from file B that were
converted by such COMPUTE statements. But that's as you prefer.

>Also, I seem to recall from dialogue here on the L that changing a
>variable's type is not possible with syntax, but only through the user
>interface via "Variable view", and clicking on the gray box on the
>right
>side of the "variable type" cell, which brings up a menu of variable
>types.
>Do I remember correctly?
>
>The way I understand it, then, is that I need to
>1. Open the Variable View for the donor database
>2. Manually change as needed the type, width and measure of each
>variable
>so that it will correspond with the type, width and measure of the
>intended
>variables in the receiving database
>3. Use the Add Files command with the /Rename subcommand to match the
>donor
>and recipient variables, and add the cases from the donor database.
>
>Is this right? Any easier way to do it?
>
>Thanks,
>Bob
>
>Robert M. Schacht, Ph.D. <[hidden email]>
>Pacific Basin Rehabilitation Research & Training Center
>1268 Young Street, Suite #204
>Research Center, University of Hawaii
>Honolulu, HI 96814

Gary Oliver

Inserting Section Headings as nodes in Output file

Colleagues

My syntax files are full of comments either clustering similar
calculations on similar variables together or reminding me what I am
doing with SPSS commands

Of course SPSS then does the calculations and they then appear in the
output file (and in the explorer view of the output file) as a series of
nodes. I would like to insert before some of those nodes in my output
file node saying things like "Attitude Variables Measured Directly"
"Attitude Variables Measured Indirectly" etc.

Pls can anyone suggest ways for me to achieve this

TIA/gary

Unit Co-ordinator for Business Information Systems
(A Postgraduate Master of Commerce and Master of Business unit)
School of Business
The University of Sydney
------------------------
,-_|\ Building H69, Office 437
/ \ Corner Codrington Street
\_,-._* & Rose Street
Darlington 2006
@ Australia
--------------------------------------
E-mail: [hidden email]
------------------------
Location details:
Travelling from Broadway, turn south off City Road
Navigate toward the Acquatic Centre
------------------------
University Map: http://db.auth.usyd.edu.au/directories/map/index.stm
University Website:
www.usyd.edu.au
Faculty Website
www.econ.usyd.edu.au
------------------------
Faculty Student Information Office
(Timetables, Special Consideration)
Merewether Building
Enter from City Road side
e-mail: [hidden email]
Phone: 9351-3076
----------------------------------
Executive Officer for Business Information Systems
Katy Roy
Room 347, Building H69
E-mail: [hidden email]
Phone: 9036 9432
---------------------------------

David Wasserman

Re: Inserting Section Headings as nodes in Output file

Have you tried the ECHO command?

----- Original Message -----
From: "Gary Oliver" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, November 30, 2006 6:32 PM
Subject: Inserting Section Headings as nodes in Output file

> Colleagues
>
> My syntax files are full of comments either clustering similar
> calculations on similar variables together or reminding me what I am
> doing with SPSS commands
>
> Of course SPSS then does the calculations and they then appear in the
> output file (and in the explorer view of the output file) as a series of
> nodes. I would like to insert before some of those nodes in my output
> file node saying things like "Attitude Variables Measured Directly"
> "Attitude Variables Measured Indirectly" etc.
>
> Pls can anyone suggest ways for me to achieve this
>
> TIA/gary
>
> Unit Co-ordinator for Business Information Systems
> (A Postgraduate Master of Commerce and Master of Business unit)
> School of Business
> The University of Sydney
> ------------------------
> ,-_|\ Building H69, Office 437
> / \ Corner Codrington Street
> \_,-._* & Rose Street
> Darlington 2006
> @ Australia
> --------------------------------------
> E-mail: [hidden email]
> ------------------------
> Location details:
> Travelling from Broadway, turn south off City Road
> Navigate toward the Acquatic Centre
> ------------------------
> University Map: http://db.auth.usyd.edu.au/directories/map/index.stm
> University Website:
> www.usyd.edu.au
> Faculty Website
> www.econ.usyd.edu.au
> ------------------------
> Faculty Student Information Office
> (Timetables, Special Consideration)
> Merewether Building
> Enter from City Road side
> e-mail: [hidden email]
> Phone: 9351-3076
> ----------------------------------
> Executive Officer for Business Information Systems
> Katy Roy
> Room 347, Building H69
> E-mail: [hidden email]
> Phone: 9036 9432
> ---------------------------------
>

Peck, Jon

Re: Inserting Section Headings as nodes in Output file

In reply to this post by Gary Oliver

If you want these comments to appear in the log items in the Viewer, you can use echo or, with preferences set to show the syntax in the log (which is now the default in SPSS 15), comment commands will do this.

If you want text blocks inserted, which would make the text stand out from syntax echoing and such and you want this synchronized with your actual output (which you probably do want), then you need a scripting solution. For synchronization, you need SPSS 14.0.2 or later and the Python programmability Plug-In.

But then it is very simple to insert these text blocks using the viewer module ViewerText.insert method. I can post details if you want to go this route.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Oliver
Sent: Thursday, November 30, 2006 7:32 PM
To: [hidden email]
Subject: [SPSSX-L] Inserting Section Headings as nodes in Output file

Colleagues

My syntax files are full of comments either clustering similar
calculations on similar variables together or reminding me what I am
doing with SPSS commands

Of course SPSS then does the calculations and they then appear in the
output file (and in the explorer view of the output file) as a series of
nodes. I would like to insert before some of those nodes in my output
file node saying things like "Attitude Variables Measured Directly"
"Attitude Variables Measured Indirectly" etc.

Pls can anyone suggest ways for me to achieve this

TIA/gary

Unit Co-ordinator for Business Information Systems
(A Postgraduate Master of Commerce and Master of Business unit)
School of Business
The University of Sydney
------------------------
,-_|\ Building H69, Office 437
/ \ Corner Codrington Street
\_,-._* & Rose Street
Darlington 2006
@ Australia
--------------------------------------
E-mail: [hidden email]
------------------------
Location details:
Travelling from Broadway, turn south off City Road
Navigate toward the Acquatic Centre
------------------------
University Map: http://db.auth.usyd.edu.au/directories/map/index.stm
University Website:
www.usyd.edu.au
Faculty Website
www.econ.usyd.edu.au
------------------------
Faculty Student Information Office
(Timetables, Special Consideration)
Merewether Building
Enter from City Road side
e-mail: [hidden email]
Phone: 9351-3076
----------------------------------
Executive Officer for Business Information Systems
Katy Roy
Room 347, Building H69
E-mail: [hidden email]
Phone: 9036 9432
---------------------------------

Bob Schacht-3

Re: Merging databases by renaming variables

In reply to this post by Richard Ristow

At 12:55 PM 11/30/2006, Richard Ristow wrote:
>At 02:26 PM 11/30/2006, Bob Schacht wrote:
>
>>>>I have two largish databases (250 fields, 80-300 records) with that
>>>>have
>>>>data from the same questionnaire [to be concanated].

Thanks for your helpful response, Richard.
And yes, concatenated is the appropriate word.

What I meant by "same questionnaire" is that most people responded by
pencil and paper in the old fashioned way, while others responded to an
electronic version of the same questionnaire on Zoomerang. Zoomerang
provides an Excel file of the results, whereas I had constructed an SPSS
database, based on the printed version. It would have been easier (?) if I
had started out by using the initial Zoomerang files to make an Excel file,
then making an SPSS file from that, and using that as the database to
record the responses from the pencil and paper questionnaires

Some interesting differences: Electronically, if a question has six
possible responses and I want them to choose only one, I can force that in
Zoomerang, because if the respondent clicks on a second choice, the first
choice mark is removed by the software. But in the pencil and paper
version, people can ignore the directions and mark more than one choice, no
matter what the directions say. If only one or two people do that, it
doesn't matter. But if many people do that, I have a problem. Unless I'm
going to choose arbitrarily among their choices, I have to convert the
response format to a field for each choice, as if it were a "mark all that
apply" question, rather than one field with a code for the different
choices. In this sense, although the questions in the electronic and
written forms are the same, the responses are not!

>>>>The 'field' [i.e., variable] names are all different between the two
>>>>databases.
>>
>>I have discovered that my problem is a bit more complicated, in that
>>the fields that are supposed to be the same often differ in type,
>>width or "measure" (Nominal or Scale).
>>See more below.

I have also discovered since then, that if SPSS can't figure out the type
of variable it is reading, it automatically defaults to Numeric 8.2.
Consequently, many of the inconsistencies in type and width that I have
encountered seem to be due to this default designation.

>I wrote that
>>>my taste would be to use the /RENAME subcommand on my ADD FILES,
>>>rather than a separate RENAME VARIABLES.
>>
>>Because some of the fields that are supposed to match, don't match in
>>type, width, or measure, do I need to handle that issue first, before
>>using the Add Files?
>
>No. As I wrote before, having incompatible variable names actually
>makes this problem easier.
>
>Suppose you have File_A and File_B. Variable names for corresponding
>quantities are different; for example, the answer to question 1 is
>called A_Ques1 in file A, and B_Ques1 in file B. (Of course, the
>correspondence of names wouldn't be that neat.) Suppose that you want
>to keep the variable names and attributes as they are in File_A.

Thanks for these detailed examples!

>Then,
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B.
>
>For variables with identical types and lengths in the two files (don't
>worry about measurement level), add RENAME clauses for File_B. If, say,
>variables for questions 1 and 3 match this way,
>
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B/RENAME=(B_Ques1, B_Ques3
> = A_Ques1, A_Ques3).
>
>For variables that don't have identical types and lengths, after the
>ADD FILES command,
>
>* If they're both strings and the lengths are different, use simple
>COMPUTE statements:
>. COMPUTE A_Ques2 = B_Ques2.
>(But be careful of losing text if the variable from file B is longer.)
>
>* If the variable in file A is numeric and the variable from file B is
>character, say two spaces long,
>. COMPUTE A_Ques4 = NUMBER(B_Ques4,F2).
>(But be careful of any values in file B that can't be converted to
>numbers.)
>
>* Similarly, if the variable in file A is string and the variable from
>file B is numeric,
>. COMPUTE A_Ques6 = STRING(B_Ques6,F2).
>
>After, you may want to delete all variables from file B that were
>converted by such COMPUTE statements. But that's as you prefer.

Again, thanks for these detailed examples.

Bob

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

Art Kendall-2

Re: Merging databases by renaming variables

the format should be whatever you set in the defaults
go to <edit><options> <data> about half way down on the left of the box
set teh default for new variables to whatever you would like.

The default I use is zero.

or in syntax before bringing in the data
set format = f6.0.

Art Kendall
Social Research Consultants

Bob Schacht wrote: <parts snipped>

>
> I have also discovered since then, that if SPSS can't figure out the type
> of variable it is reading, it automatically defaults to Numeric 8.2.
> Consequently, many of the inconsistencies in type and width that I have
> encountered seem to be due to this default designation.
>
>
> Bob
>
>
> Robert M. Schacht, Ph.D. <[hidden email]>
> Pacific Basin Rehabilitation Research & Training Center
> 1268 Young Street, Suite #204
> Research Center, University of Hawaii
> Honolulu, HI 96814
>
>

Richard Ristow

Re: Merging databases by renaming variables

In reply to this post by Bob Schacht-3

Bob Schacht raised an interesting further issue:

At 04:51 PM 12/1/2006, Bob Schacht wrote:

>What I meant by "same questionnaire" is that most people responded by
>pencil and paper in the old fashioned way, while others responded to
>an electronic version of the same questionnaire on Zoomerang.
>Zoomerang provides an Excel file of the results, whereas I had
>constructed an SPSS database, based on the printed version. It would
>have been easier (?) if I had started out by using the initial
>Zoomerang files to make an Excel file, then making an SPSS file from
>that, and using that as the database to record the responses from the
>pencil and paper questionnaires.

That's probably true, though see below. Of course, we'd all find our
work easier if, at the beginning, we could always make the decisions
that, at the end, will turn out to have been the best.

>Some interesting differences: Electronically, if a question has six
>possible responses and I want them to choose only one, I can force
>that in Zoomerang. But in the pencil and paper version, people can
>ignore the directions and mark more than one choice, no matter what
>the directions say. If only one or two people do that, it doesn't
>matter. But if many people do that, I have a problem. Unless I'm going
>to choose arbitrarily among their choices, I have to convert the
>response format to a field for each choice, as if it were a "mark all
>that apply" question, rather than one field with a code for the
>different choices. In this sense, although the questions in the
>electronic and written forms are the same, the responses are not!

Or, in fact, the questions are effectively NOT the same, in that the
'space' of possible responses is different. And that is an interesting
difference, in fact a fascinating one. It isn't a data-management
matter; it's a substantive observation about your project. What does it
seem the multiple responses mean?

There's always the 'ballot' solution: reject any case where more than
one was chosen. I don't know; maybe, below about 5% incidence of
multiple answers, I'd take this alternative.

Otherwise, consider the nature of the question, and the answers.

. If it's a Likert scale or some such, you could take something like
the mid-range of the responses. But I'd want another variable
indicating there was more than one response, and maybe the largest and
smallest response. If somebody marked both '1' and '5' on a scale, it's
fair to think you've no idea what they meant.

. If it seems inherently multiple-response ("What make of car do you
own?"), probably represent the paper results as a multiple-response
item, and consider changing the electronic version to allow multiple
responses.

. If the question invites ambiguity ("Which is your favorite kind of
cake?" or uncertainty ("Where are you going for your next vacation?"),
I don't know. Probably, consult somebody with deep knowledge about such
kinds of survey question, and the underlying psychological theory.
Superficially, I seem to recall that forced-choice questions (one
response taken) are considered different from the same question with
multiple responses allowed, possibly with advantages and disadvantages
of each.

. Finally, of course, if there are significant number of multiple
responses, recognize that the questions are NOT the same on the paper
and electronic versions, though they're phrased the same, and use
caution in interpreting, accordingly.

-Best of luck,
Richard

Bob Schacht-3

Re: Merging databases by renaming variables

At 01:38 PM 12/4/2006, Richard Ristow wrote:

>Bob Schacht raised an interesting further issue:
>
>At 04:51 PM 12/1/2006, Bob Schacht wrote:
>
>>What I meant by "same questionnaire" is that most people responded by
>>pencil and paper in the old fashioned way, while others responded to an
>>electronic version of the same questionnaire on Zoomerang. Zoomerang
>>provides an Excel file of the results, whereas I had constructed an SPSS
>>database, based on the printed version. It would have been easier (?) if
>>I had started out by using the initial Zoomerang files to make an Excel
>>file, then making an SPSS file from that, and using that as the database
>>to record the responses from the pencil and paper questionnaires.
>[snip]
>>Some interesting differences: Electronically, if a question has six
>>possible responses and I want them to choose only one, I can force that
>>in Zoomerang. But in the pencil and paper version, people can ignore the
>>directions and mark more than one choice, no matter what the directions
>>say. If only one or two people do that, it doesn't matter. But if many
>>people do that, I have a problem. Unless I'm going to choose arbitrarily
>>among their choices, I have to convert the response format to a field for
>>each choice, as if it were a "mark all that apply" question, rather than
>>one field with a code for the different choices. In this sense, although
>>the questions in the electronic and written forms are the same, the
>>responses are not!
>
>Or, in fact, the questions are effectively NOT the same, in that the
>'space' of possible responses is different. And that is an interesting
>difference, in fact a fascinating one. It isn't a data-management matter;
>it's a substantive observation about your project. What does it seem the
>multiple responses mean?

In some cases, it means that the question was poorly designed! That's the
trouble with using freshly minted, untested questions sprung full blown
from the forehead of Zeus (or, in this case, me). In the case of one of the
questions under discussion, I designed what I thought at the time were
mutually exclusive alternatives arranged in what was intended to be a
scalar dimension. But it turned out that adjacent alternatives were not
always mutually exclusive.

>There's always the 'ballot' solution: reject any case where more than one
>was chosen. I don't know; maybe, below about 5% incidence of multiple
>answers, I'd take this alternative.

Me, too. But the incidence seemed to be more than 5%.

>Otherwise, consider the nature of the question, and the answers.
>
>. If it's a Likert scale or some such, you could take something like the
>mid-range of the responses. But I'd want another variable indicating there
>was more than one response, and maybe the largest and smallest response.
>If somebody marked both '1' and '5' on a scale, it's fair to think you've
>no idea what they meant.

Yes on all counts (see above). In fact, we briefly considered this strategy.

>. If it seems inherently multiple-response ("What make of car do you
>own?"), probably represent the paper results as a multiple-response item,
>and consider changing the electronic version to allow multiple responses.

The question was intended to place respondents with disabilities on a scale
of independence ranging from "independent" to "dependent." In other words,
we wanted to place the respondents on a scale, the underlying value of
which was "how independent are you?"

>. If the question invites ambiguity ("Which is your favorite kind of
>cake?" or uncertainty ("Where are you going for your next vacation?"), I
>don't know.

Well, it wasn't *meant* to invite ambiguity, but it did.

>Probably, consult somebody with deep knowledge about such kinds of survey
>question, and the underlying psychological theory. Superficially, I seem
>to recall that forced-choice questions (one response taken) are considered
>different from the same question with multiple responses allowed, possibly
>with advantages and disadvantages of each.
>
>. Finally, of course, if there are significant number of multiple
>responses, recognize that the questions are NOT the same on the paper and
>electronic versions, though they're phrased the same, and use caution in
>interpreting, accordingly.
>
>-Best of luck,
> Richard

I like your first "Otherwise" alternative best for this situation. Thanks.
Bob

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

Bob Schacht-3

Re: Merging databases by renaming variables

In reply to this post by Richard Ristow

At 12:55 PM 11/30/2006, Richard Ristow wrote, in response to my previous
question:

[snip]

>Suppose you have File_A and File_B. Variable names for corresponding
>quantities are different; for example, the answer to question 1 is
>called A_Ques1 in file A, and B_Ques1 in file B. (Of course, the
>correspondence of names wouldn't be that neat.) Suppose that you want
>to keep the variable names and attributes as they are in File_A.
>
>Then,
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B.
>
>For variables with identical types and lengths in the two files (don't
>worry about measurement level), add RENAME clauses for File_B. If, say,
>variables for questions 1 and 3 match this way,
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B/RENAME=(B_Ques1, B_Ques3
> = A_Ques1, A_Ques3).
>
>For variables that don't have identical types and lengths, after the
>ADD FILES command,
>
>* If they're both strings and the lengths are different, use simple
>COMPUTE statements:
>. COMPUTE A_Ques2 = B_Ques2.
>(But be careful of losing text if the variable from file B is longer.)
>
>* If the variable in file A is numeric and the variable from file B is
>character, say two spaces long,
>. COMPUTE A_Ques4 = NUMBER(B_Ques4,F2).
>(But be careful of any values in file B that can't be converted to
>numbers.)
>
>* Similarly, if the variable in file A is string and the variable from
>file B is numeric,
>. COMPUTE A_Ques6 = STRING(B_Ques6,F2).

Richard,
Thanks for these suggestions.
Can these be listed in any order?
Could I do the whole thing just with COMPUTE statements? i.e., instead of
RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3).
Could I use instead

COMPUTE A_Ques1 = B_Ques1
COMPUTE A_Ques3 = B_Ques3
etc.???

Thanks,
Bob

Richard Ristow

Re: Merging databases by renaming variables

At 03:07 PM 12/8/2006, Bob Schacht wrote:

>At 12:55 PM 11/30/2006, Richard Ristow wrote, in response to my
>previous
>question:
>
>>Suppose you have File_A and File_B. Suppose that you want
>>to keep the variable names and attributes as they are in File_A.
>>
>>Then,
>>
>>ADD FILES
>> /FILE=File_A
>> /FILE=File_B.
>>
>>For variables with identical types and lengths in the two files
>>(don't
>>worry about measurement level), add RENAME clauses for File_B.
>>
>>For variables that don't have identical types and lengths, after the
>>ADD FILES command [copy values using COMPUTE statements].
>
>Thanks for these suggestions.
>Can these be listed in any order?

Sure. You have to be careful about the order you want your variables
in. But if all the quantities you want exist in File_A, and the
variables in File_A are in the order you want, that will take care of
itself.

>Could I do the whole thing just with COMPUTE statements? i.e., instead
>of
>RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3).
>Could I use instead
>
>COMPUTE A_Ques1 = B_Ques1
>COMPUTE A_Ques3 = B_Ques3

Again, sure you could. Doing it with COMPUTEs leaves variables B_Ques1
and B_Ques3 in the combined file; doing it with RENAME drops those
variables. But that's likely not a large drawback, since you'll
probably be dropping variables anyway.

You could even do it with a big DO REPEAT:

DO REPEAT A_stuff = <all File_A variables except the keys>
/B_stuff = <all File_B variables except the keys>.
. COMPUTE A_stuff = B_stuff.
END REPEAT.

That won't completely work, and will have to be tweaked, if
. There are any variables in one file but not the other.
. There are any variables that are character in one file and numeric in
the other.
. The same variables occur in both files, but not in the same order.
(This is the most subtle, as the code could run without errors but give
badly wrong results.)

Otherwise, it will work fine. The values from File_B will be in the
corresponding variables in File_A, with the names and attributes of the
File_A variables. As before, drop the File_B variables when and as you
like.

-Cheers and good luck,
Richard

Bob Schacht-3

Again Re: Merging databases by renaming variables

At 12:05 PM 12/8/2006, Richard Ristow wrote:

>At 03:07 PM 12/8/2006, Bob Schacht wrote:
>
>>At 12:55 PM 11/30/2006, Richard Ristow wrote, in response to my previous
>>question:
>>
>>>Suppose you have File_A and File_B. Suppose that you want
>>>to keep the variable names and attributes as they are in File_A.
>>>
>>>Then,
>>>
>>>ADD FILES
>>> /FILE=File_A
>>> /FILE=File_B.
>>>
>>>. . . For variables that don't have identical types and lengths, after the
>>>ADD FILES command [copy values using COMPUTE statements such as COMPUTE
>>>A_Ques2 = B_Ques2.]
>>
>>Thanks for these suggestions.
>>Can these be listed in any order?
>
>Sure. You have to be careful about the order you want your variables in.
>But if all the quantities you want exist in File_A, and the variables in
>File_A are in the order you want, that will take care of itself.
>
>>Could I do the whole thing just with COMPUTE statements? i.e., instead of
>>RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3).
>>Could I use instead
>>
>>COMPUTE A_Ques1 = B_Ques1
>>COMPUTE A_Ques3 = B_Ques3
>
>Again, sure you could. Doing it with COMPUTEs leaves variables B_Ques1 and
>B_Ques3 in the combined file; doing it with RENAME drops those variables.
>But that's likely not a large drawback, since you'll probably be dropping
>variables anyway.

I was at first confused, because in the RENAME statement, FILE_A variable
names are on the *RIGHT* of the equal sign, whereas in the COMPUTE
statements, they're on the LEFT side of the = sign. I think I understand
the implications now.

My next question is about listing RENAME and COMPUTE statements in the same
(?) FILE subcommand.
Or... should the COMPUTE statements all follow the FILE_A subcommand,
rather than the FILE_B subcommand?
To use your examples in more fully fleshed out form, should I use

>>>ADD FILES
>>> /FILE=File_A/COMPUTE A_Ques3 = B_Ques3
>>> /FILE=File_B/RENAME=(B_Ques1 = A_Ques1).

or should I use

ADD FILES
/FILE=File_A
/FILE=File_B/RENAME=(B_Ques1 = A_Ques1)
/COMPUTE A_Ques3 = B_Ques3.

Thanks,
Bob

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

Richard Ristow

Re: Merging databases by renaming variables

There's a 'whoops', an IMPORTANT point that I missed, near the end of
this posting; I may have caused you trouble by omitting it.

At 05:06 PM 12/12/2006, Bob Schacht wrote:

>>>Could I do the whole thing just with COMPUTE statements? i.e.,
>>>instead of
>>>RENAME=(B_Ques1, B_Ques3 = A_Ques1, A_Ques3).
>>>Could I use instead
>>>
>>>COMPUTE A_Ques1 = B_Ques1
>>>COMPUTE A_Ques3 = B_Ques3
>>
>>Again, sure you could.
>
>My next question is about listing RENAME and COMPUTE statements in the
>same
>(?) FILE subcommand. Or... should the COMPUTE statements all follow
>the FILE_A subcommand, rather than the FILE_B subcommand?

OK, and I'll say more below, but:

/RENAME is a subcommand of ADD FILES; each /RENAME belongs to a
particular /FILE on the list; but,

COMPUTE is a separate statement, and has to follow the whole of the ADD
FILES command.

>To use your examples in more fully fleshed out form, should I use
>
>ADD FILES
> /FILE=File_A/COMPUTE A_Ques3 = B_Ques3
> /FILE=File_B/RENAME=(B_Ques1 = A_Ques1).
>or
>ADD FILES
> /FILE=File_A
> /FILE=File_B/RENAME=(B_Ques1 = A_Ques1)
> /COMPUTE A_Ques3 = B_Ques3.

Neither. In both cases, you're writing /COMPUTE as a subcommand of ADD
FILES. You'd write,

ADD FILES
/FILE=File_A
/FILE=File_B/RENAME=(B_Ques1 = A_Ques1).

COMPUTE A_Ques3 = B_Ques3.

That is, the COMPUTE is an entirely separate statement. (I've left a
blank line, to make that clear.) It's a transformation statement that
is part of the transformation program that follows the ADD FILES, and
it acts on the data from the files as combined by the ADD FILES.

*** HERE'S WHAT I MISSED: You want to use COMPUTEs to copy values from
the File_B variables to the File_A variables *if* the data is in the
File_B variables. If it's already in the File_A variables, those
COMPUTEs will wipe out the good data. So you have to do the COMPUTEs
*only* for records that came from File_B. Like this:

ADD FILES
/FILE=File_A
/FILE=File_B/IN=B_Record /* New subcommand
/RENAME=(B_Ques1 = A_Ques1).

DO IF B_Record EQ 1.
. COMPUTE A_Ques3 = B_Ques3.
END IF.

That adds two things:
. The /IN subcommand, which applies to File_B, creates variable
B_Record, which is 1 in records that come from File_B, and 0 in records
that come from File_A.
. The DO IF, which tests variable B_Record, and executes the COMPUTEs
only for records that come from File_B.

You can have all the COMPUTEs within a single DO IF, and probably
should:

DO IF B_Record EQ 1.
. COMPUTE A_Ques3 = B_Ques3.
. COMPUTE A_Ques4 = B_QUES4.
END IF.

And so forth. And you see I've written

. COMPUTE A_Ques3 = B_Ques3.
instead of
COMPUTE A_Ques3 = B_Ques3.

They're both valid. The first is 'pseudo-indented'; i.e., the command
starts in the first space on the line, but the keyword COMPUTE is set
back. I use pseudo-indents inside constructs like DO IF, because I
think it makes the code easier to read.

-Good luck,
Richard

Bob Schacht-3

Ristow Re: Merging databases by renaming variables

At 02:18 PM 12/12/2006, Richard Ristow wrote:
>There's a 'whoops', an IMPORTANT point that I missed, near the end of this
>posting; I may have caused you trouble by omitting it.

>. . . COMPUTE is a separate statement, and has to follow the whole of the
>ADD FILES command.
>
>. . . You'd write,
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B/RENAME=(B_Ques1 = A_Ques1).
>
>COMPUTE A_Ques3 = B_Ques3.
>
>That is, the COMPUTE is an entirely separate statement. (I've left a blank
>line, to make that clear.) It's a transformation statement that is part of
>the transformation program that follows the ADD FILES, and it acts on the
>data from the files as combined by the ADD FILES.

Good point. Thanks for pointing that out.

>*** HERE'S WHAT I MISSED: You want to use COMPUTEs to copy values from
>the File_B variables to the File_A variables *if* the data is in the
>File_B variables. If it's already in the File_A variables, those COMPUTEs
>will wipe out the good data.

Point well taken!

>So you have to do the COMPUTEs *only* for records that came from File_B.
>Like this:
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B/IN=B_Record /* New subcommand
> /RENAME=(B_Ques1 = A_Ques1).
>
>DO IF B_Record EQ 1.
>. COMPUTE A_Ques3 = B_Ques3.
>END IF.
>
>That adds two things:
>. The /IN subcommand, which applies to File_B, creates variable B_Record,
>which is 1 in records that come from File_B, and 0 in records that come
>from File_A.

Thanks for this suggestion. I may already have a variable that can function
in that way.

>. The DO IF, which tests variable B_Record, and executes the COMPUTEs only
>for records that come from File_B.
>
>You can have all the COMPUTEs within a single DO IF, and probably should:
>
>DO IF B_Record EQ 1.
>. COMPUTE A_Ques3 = B_Ques3.
>. COMPUTE A_Ques4 = B_QUES4.
>END IF.

Yes, I agree. What this means, in effect, is that when variables that are
supposed to match don't match in type or length, when I merge the files,
the ADD command should add the (somewhat redundant) variables (i.e., all
the ones that are not renamed, which will be most of them) as well as add
on the new cases. I then use the COMPUTE statements, as above, to correctly
fill in all the variables in File_A, and then I can DROP all of the
duplicate variables from originating from File_B. Right?

Thanks,
Bob

Richard Ristow

Re: Ristow Re: Merging databases by renaming variables

At 08:57 PM 12/12/2006, Bob Schacht wrote:

>>. The DO IF, which tests variable B_Record, and executes the COMPUTEs
>>only for records that come from File_B.
>>
>>You can have all the COMPUTEs within a single DO IF, and probably
>>should:
>>
>>DO IF B_Record EQ 1.
>>. COMPUTE A_Ques3 = B_Ques3.
>>. COMPUTE A_Ques4 = B_QUES4.
>>END IF.
>
>[So] when variables that are supposed to match don't match in type or
>length, when I merge the files, the ADD command should add the
>(somewhat redundant) variables (i.e., all the ones that are not
>renamed, which will be most of them) as well as add on the new cases.

Yes. That's automatic with ADD FILES: all variables from both files are
kept, except those explicitly DROPped.

>I then use the COMPUTE statements, as above, to correctly fill in all
>the variables in File_A, and then I can DROP all of the duplicate
>variables from originating from File_B. Right?

You've got it. Good luck to you!
Richard

Bob Schacht-3

Re: Merging databases by renaming variables

In reply to this post by Richard Ristow

At 02:18 PM 12/12/2006, Richard Ristow wrote:
[snip]

>*** HERE'S WHAT I MISSED: You want to use COMPUTEs to copy values from
>the File_B variables to the File_A variables *if* the data is in the
>File_B variables. If it's already in the File_A variables, those COMPUTEs
>will wipe out the good data. So you have to do the COMPUTEs *only* for
>records that came from File_B. Like this:
>
>ADD FILES
> /FILE=File_A
> /FILE=File_B/IN=B_Record /* New subcommand
> /RENAME=(B_Ques1 = A_Ques1).

Richard,
OK, what I actually wrote was

ADD FILES
/FILE=Needs_Assessment_rev2
/FILE=FinalComplete /IN=Zoomrang
/RENAME=(Question4Ididntgetanyhelpforthis= Serv_2a)
etc.

But this earned me the following error message:
> >Error # 5109 in column 1. Text:
> >The first subcommand is not FILE or TABLE. The entire command is skipped.
> >This command not executed.

So, what am I doing wrong here?

Thanks,
Bob

Richard Ristow

Re: Merging databases by renaming variables

At 04:27 PM 12/18/2006, Bob Schacht wrote:

>OK, what I actually wrote was
>
>ADD FILES
> /FILE=Needs_Assessment_rev2
> /FILE=FinalComplete /IN=Zoomrang
> /RENAME=(Question4Ididntgetanyhelpforthis= Serv_2a)
>etc.
>
>But this earned me the following error message:
>>Error # 5109 in column 1. Text:
>>The first subcommand is not FILE or TABLE. The entire command is
>>skipped.
>>This command not executed.
>
>So, what am I doing wrong here?

I'm not sure. It seems to work for me. Below is SPSS draft output. I
haven't put in any of the COMPUTE statements.

* C:\Documents and Settings\Richard\My Documents .
* \Technical\spssx-l\Z-2006d .
* \2006-12-18 Schacht - Merging databases by renaming variables.SPS.

* In response to (follow-up) posting .
* Date: Mon, 18 Dec 2006 11:27:46 -1000 .
* From: Bob Schacht <[hidden email]> .
* Subject: Re: Merging databases by renaming variables .
* To: [hidden email] .

* Syntax error using ADD FILES. .

* =================== .
* APPENDIX: TEST DATA .
* =================== .
* Uses 'dataset' data (SPSS 14.0 and above).

NEW FILE.
* "File_A" .......... .
DATA LIST LIST
/ID (F3)
A_Ques1 (A8)
A_Ques2 (F2)
A_Ques3 (A10)
A_Ques4 (F2).
BEGIN DATA
001 Agree 17 Why_not? 42
003 Not_Me 02 Maybe 99
END DATA.
DATASET NAME Needs_Assessment_rev2.

Dataset Name
|-----------------------------|---------------------------|
|Output Created |19-DEC-2006 03:05:07 |
|-----------------------------|---------------------------|

LIST.

List
|-----------------------------|---------------------------|
|Output Created |19-DEC-2006 03:05:07 |
|-----------------------------|---------------------------|

[Needs_Assessment_rev2]

ID A_Ques1 A_Ques2 A_Ques3 A_Ques4

1 Agree 17 Why_not? 42
3 Not_Me 2 Maybe 99

Number of cases read: 2 Number of cases listed: 2

NEW FILE.
* "File_B" .......... .
DATA LIST LIST
/ID (F3)
B_Ques1 (A8)
B_Ques2 (F2)
B_Ques3 (A10)
B_Ques4 (F4).
BEGIN DATA
002 Uh_uh 07 Nevermore! 451
004 Cool! 13 Yeah_man 987
END DATA.
DATASET NAME FinalComplete.

Dataset Name
|-----------------------------|---------------------------|
|Output Created |19-DEC-2006 03:05:08 |
|-----------------------------|---------------------------|

LIST.

List
|-----------------------------|---------------------------|
|Output Created |19-DEC-2006 03:05:08 |
|-----------------------------|---------------------------|

[FinalComplete]

ID B_Ques1 B_Ques2 B_Ques3 B_Ques4

2 Uh_uh 7 Nevermore! 451
4 Cool! 13 Yeah_man 987

Number of cases read: 2 Number of cases listed: 2

* .......... The ADD FILES ................ .

ADD FILES
/FILE=Needs_Assessment_rev2
/FILE=FinalComplete /IN=Zoomrang
/RENAME=(B_Ques1 = A_Ques1).
LIST.

List
|-----------------------------|---------------------------|
|Output Created |19-DEC-2006 03:05:24 |
|-----------------------------|---------------------------|
ID A_Ques1 A_Ques2 A_Ques3 A_Ques4 B_Ques2 B_Ques3 B_Ques4
Zoomrang

1
Agree 17 Why_not? 42 . . 0
3
Not_Me 2 Maybe 99 . . 0
2
Uh_uh . . 7 Nevermore! 451 1
4
Cool! . . 13 Yeah_man 987 1

Number of cases read: 4 Number of cases listed: 4