SPSSX Discussion

combining two datasets

Classic

List

Threaded

6 messages Options

Khaing Soe-2

combining two datasets

Dear All,

One question:

I have two data sets where nearly 400 cases identified by household numbers are common to both data sets.These have same as well as different variables.

How can I create a new data set with these common cases only?

Please advise.

Regards, Khaing Soe

John F Hall

Re: combining two datasets

Click

Help > Command syntax reference:

Check out MATCH FILES in the manual.

Open one file to display the data editor (this will be the open file * in the syntax below). If you have a recent release of SPSS, open the second file as well.

Click File > New > syntax to open a new syntax editor and write something like

match files file = * / file = <file2.sav> /by <household> .

You need to specify the full pathway for file2 if you can't have more than one file open. This syntax will pick up variables from the first file named, plus any different ones from the second file. The default is all variables. If you only want certain ones you need to use the sub-command /KEEP <your varlist>.

John Hall
[hidden email]
http://surveyresearch.weebly.com

----- Original Message -----

From: [hidden email]

To: [hidden email]

Sent: Friday, September 17, 2010 9:22 AM

Subject: combining two datasets

Dear All,

One question:

I have two data sets where nearly 400 cases identified by household numbers are common to both data sets.These have same as well as different variables.

How can I create a new data set with these common cases only?

Please advise.

Regards, Khaing Soe

Bruce Weaver

Re: combining two datasets

Administrator

John F Hall wrote

Click

Help > Command syntax reference:

Check out MATCH FILES in the manual.

Open one file to display the data editor (this will be the open file * in the syntax below). If you have a recent release of SPSS, open the second file as well.

Click File > New > syntax to open a new syntax editor and write something like

match files file = * / file = <file2.sav> /by <household> .

You need to specify the full pathway for file2 if you can't have more than one file open. This syntax will pick up variables from the first file named, plus any different ones from the second file. The default is all variables. If you only want certain ones you need to use the sub-command /KEEP <your varlist>.

John Hall
johnfhall@orange.fr
http://surveyresearch.weebly.com

For some examples of what John describes here, see the "Merging (MATCH merging) SPSS data files" tutorial found on this page:

http://www.ats.ucla.edu/stat/spss/topics/data_management.htm

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

John F Hall

Re: combining two datasets

Bruce

To save people scrolling, it's http://www.ats.ucla.edu/stat/spss/modules/merge.htm

ATS is one of the best sites around for SPSS tutorials (along with your own, Raynald's and John Samuel's at Indiana)

John

----- Original Message -----

From: [hidden email]

To: [hidden email]

Sent: Friday, September 17, 2010 1:55 PM

Subject: Re: combining two datasets

John F Hall wrote:

>
> Click
>
> Help > Command syntax reference:
>
> Check out MATCH FILES in the manual.
>
> Open one file to display the data editor (this will be the open file * in
> the syntax below). If you have a recent release of SPSS, open the second
> file as well.
>
> Click File > New > syntax to open a new syntax editor and write something
> like
>
> match files file = * / file = <file2.sav> /by <household> .
>
> You need to specify the full pathway for file2 if you can't have more than
> one file open. This syntax will pick up variables from the first file
> named, plus any different ones from the second file. The default is all
> variables. If you only want certain ones you need to use the sub-command
> /KEEP <your varlist>.
>
> John Hall
> [hidden email]
> http://surveyresearch.weebly.com
>

For some examples of what John describes here, see the "Merging (MATCH
merging) SPSS data files" tutorial found on this page:

http://www.ats.ucla.edu/stat/spss/topics/data_management.htm

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/combining-two-datasets-tp2843384p2843587.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Melissa Ives

Re: combining two datasets

In reply to this post by Khaing Soe-2

In the MATCH files, look at the use of /IN= to create variables indicating if the file contributed to match

match files file = * / in=infile1/ file = <file2.sav> / in=infile2/ by <household> .

Your resulting dataset will have two additional indicator (0/1) variables (infile1 and infile2) that are 1 if there was a record in the indicated file.

You can run a crosstab of infile1 by infile2 to get a 2x2 table. The cell where both variables=1 is what you seem to be asking for "common cases only'. To get that subset of records, use: Select if infile1=1 and infile2=1.

If you really meant common variables only, that is a different answer.

Melissa

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Khaing Soe
Sent: Friday, September 17, 2010 2:22 AM
To: [hidden email]
Subject: [SPSSX-L] combining two datasets

Dear All,

One question:

I have two data sets where nearly 400 cases identified by household numbers are common to both data sets.These have same as well as different variables.

How can I create a new data set with these common cases only?

Please advise.

Regards, Khaing Soe

PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

Richard Ristow

Re: combining two datasets

In reply to this post by John F Hall

At 06:45 AM 9/17/2010, John F Hall wrote:

Open one file to display the data editor (this will be the open file * in the syntax below). If you have a recent release of SPSS, open the second file as well.

Click File > New > syntax to open a new syntax editor and write something like

match files file = * / file = <file2.sav> /by <household> .

You need to specify the full pathway for file2 if you can't have more than one file open.

It can be more direct to name both files, rather than first loading file 1 and using "file=*":

match files file = <file1.sav> / file = <file2.sav> /by <household> .That's often forgotten, since the menus only generate syntax with "file=*".

(As later posters have written, you'll probably want "/IN=" clauses so you can identify those cases the two datasets have in common.)

This syntax will pick up variables from the first file named, plus any different ones from the second file.

NOW, your datasets have "same as well as different variables". What do you want to have happen for those variables that are the same? Do you want the values from file1, or both values, or what?

One way to keep both is to use a "/RENAME=" clause for file 2, to give new names to its copy of the common variables. Another (if you have many variables in common) is to use ADD FILES instead of MATCH FILES, and interleave the two files rather than joining their records.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD