SPSSX Discussion

Merging Two Files on Two Key Variables and Different Samples

Classic

List

Threaded

3 messages Options

David-673

Merging Two Files on Two Key Variables and Different Samples

Hi all,

Long time reader, first time poster. I am having a problem trying to merge
two SPSS files. I need to add new variables from the second data set with
the following constraints:

1. Data #1 has less observations than Data #2 (like 10k).

2. In both sets, some participants have data across multiple groups, so I
often have 2-3 cases with the same ID but different group ID, like this:

------------------
ID Group IV
------------------
345 A 100
345 B 20
345 C 1
------------------

Using the Merge Files function in SPSS causes a mismatch of the data. I
tried to use the /TABLE and /BY command, but not sure how to include the
Group as a second key variable.

Any tips most appreciated!

best,
dave

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Merging Two Files on Two Key Variables and Different Samples

David,

Sort cases by id and group. The by keyword can accept a number of variables.
So:

Match files file=.../table=.../By id group.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Merging Two Files on Two Key Variables and Different Samples

In reply to this post by David-673

At 09:06 PM 10/10/2008, David wrote:

>I am trying to merge two SPSS files. I need to add new variables
>from the second data set with the following constraints:
>
>1. Data #1 has less observations than Data #2 (like 10k).

That, in itself, is no problem, for any number of observations.

>2. In both sets, some participants have data across multiple groups,
>so I often have 2-3 cases with the same ID but different group ID, like this:
>
>------------------
>ID Group IV
>------------------
>345 A 100
>345 B 20
>345 C 1
>------------------
>
>Using the Merge Files function in SPSS causes a mismatch of the
>data. I tried to use the /TABLE and /BY command, but not sure how
>to include the Group as a second key variable.

It sounds like the combination of ID and Group identifies cases
uniquely in both files; and that you want variables combined from the
two files for cases that match on both variables. (If that's not the
case, we'll have to try again.)

>[I'm] not sure how to include the Group as a second key variable.

All you have to do is add it on the /BY subcommand. The following
works, with some test data; see the run, below. Complete test data
and code are in an APPENDIX at the end of this posting.

MATCH FILES /FILE=*
/FILE='File2'
/BY ID Group.

Here's the test run. Is this what you were looking for?

File 1:
|-----------------------------|---------------------------|
|Output Created |12-OCT-2008 15:27:35 |
|-----------------------------|---------------------------|
[File1]
ID Group IV

345 A 100
345 B 20
345 C 1
345 E 17
678 B 11
678 C 99
678 D 121
910 A 987

Number of cases read: 8 Number of cases listed: 8

File 2:
|-----------------------------|---------------------------|
|Output Created |12-OCT-2008 15:27:36 |
|-----------------------------|---------------------------|
[File2]
ID Group Letter

123 E Alpha
345 B Beta
345 C Gamma
345 D Delta
345 E Epsilon
678 C Zeta
678 D Eta
678 E Theta

Number of cases read: 8 Number of cases listed: 8

MERGE:
DATASET ACTIVATE File1 WINDOW=FRONT.

MATCH FILES /FILE=*
/FILE='File2'
/BY ID Group.
LIST.

List
|-----------------------------|---------------------------|
|Output Created |12-OCT-2008 15:29:39 |
|-----------------------------|---------------------------|
[File1]
ID Group IV Letter

123 E . Alpha
345 A 100
345 B 20 Beta
345 C 1 Gamma
345 D . Delta
345 E 17 Epsilon
678 B 11
678 C 99 Zeta
678 D 121 Eta
678 E . Theta
910 A 987

Number of cases read: 11 Number of cases listed: 11
=============================
APPENDIX: Test data, and code
=============================
DATA LIST LIST/
ID Group IV
(F4, A2, F5).
BEGIN DATA
345 A 100
345 B 20
345 C 1
345 E 17
678 B 11
678 C 99
678 D 121
910 A 987
END DATA.
DATASET NAME File1 WINDOW=FRONT.
LIST.

DATA LIST LIST/
ID Group Letter
(F4, A2, A8).
BEGIN DATA
123 E Alpha
345 B Beta
345 C Gamma
345 D Delta
345 E Epsilon
678 C Zeta
678 D Eta
678 E Theta
END DATA.
DATASET NAME File2 WINDOW=FRONT.
LIST.

DATASET ACTIVATE File1 WINDOW=FRONT.

MATCH FILES /FILE=*
/FILE='File2'
/BY ID Group.

LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD