Merging Two Files on Two Key Variables and Different Samples

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging Two Files on Two Key Variables and Different Samples

David-673
Hi all,

Long time reader, first time poster.  I am having a problem trying to merge
two SPSS files.  I need to add new variables from the second data set with
the following constraints:

1. Data #1 has less observations than Data #2 (like 10k).

2. In both sets, some participants have data across multiple groups, so I
often have 2-3 cases with the same ID but different group ID, like this:

------------------
ID    Group   IV
------------------
345    A      100
345    B      20
345    C      1
------------------

Using the Merge Files function in SPSS causes a mismatch of the data.  I
tried to use the /TABLE and /BY command, but not sure how to include the
Group as a second key variable.

Any tips most appreciated!

best,
dave

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Merging Two Files on Two Key Variables and Different Samples

Maguin, Eugene
David,

Sort cases by id and group. The by keyword can accept a number of variables.
So:

Match files file=.../table=.../By id group.


Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Merging Two Files on Two Key Variables and Different Samples

Richard Ristow
In reply to this post by David-673
At 09:06 PM 10/10/2008, David wrote:

>I am trying to merge two SPSS files.  I need to add new variables
>from the second data set with the following constraints:
>
>1. Data #1 has less observations than Data #2 (like 10k).

That, in itself, is no problem, for any number of observations.

>2. In both sets, some participants have data across multiple groups,
>so I often have 2-3 cases with the same ID but different group ID, like this:
>
>------------------
>ID    Group   IV
>------------------
>345    A      100
>345    B      20
>345    C      1
>------------------
>
>Using the Merge Files function in SPSS causes a mismatch of the
>data.  I tried to use the /TABLE and /BY command, but not sure how
>to include the Group as a second key variable.

It sounds like the combination of ID and Group identifies cases
uniquely in both files; and that you want variables combined from the
two files for cases that match on both variables. (If that's not the
case, we'll have to try again.)

>[I'm] not sure how to include the Group as a second key variable.

All you have to do is add it on the /BY subcommand. The following
works, with some test data; see the run, below. Complete test data
and code are in an APPENDIX at the end of this posting.

MATCH FILES /FILE=*
  /FILE='File2'
  /BY ID Group.

Here's the test run. Is this what you were looking for?

File 1:
|-----------------------------|---------------------------|
|Output Created               |12-OCT-2008 15:27:35       |
|-----------------------------|---------------------------|
[File1]
   ID Group    IV

  345 A       100
  345 B        20
  345 C         1
  345 E        17
  678 B        11
  678 C        99
  678 D       121
  910 A       987

Number of cases read:  8    Number of cases listed:  8


File 2:
|-----------------------------|---------------------------|
|Output Created               |12-OCT-2008 15:27:36       |
|-----------------------------|---------------------------|
[File2]
   ID Group Letter

  123 E     Alpha
  345 B     Beta
  345 C     Gamma
  345 D     Delta
  345 E     Epsilon
  678 C     Zeta
  678 D     Eta
  678 E     Theta

Number of cases read:  8    Number of cases listed:  8

MERGE:
DATASET ACTIVATE File1 WINDOW=FRONT.

MATCH FILES /FILE=*
  /FILE='File2'
  /BY ID Group.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |12-OCT-2008 15:29:39       |
|-----------------------------|---------------------------|
[File1]
   ID Group    IV Letter

  123 E         . Alpha
  345 A       100
  345 B        20 Beta
  345 C         1 Gamma
  345 D         . Delta
  345 E        17 Epsilon
  678 B        11
  678 C        99 Zeta
  678 D       121 Eta
  678 E         . Theta
  910 A       987

Number of cases read:  11    Number of cases listed:  11
=============================
APPENDIX: Test data, and code
=============================
DATA LIST LIST/
     ID    Group   IV
    (F4,   A2,     F5).
BEGIN DATA
     345    A      100
     345    B      20
     345    C      1
     345    E      17
     678    B      11
     678    C      99
     678    D      121
     910    A      987
END DATA.
DATASET NAME     File1 WINDOW=FRONT.
LIST.


DATA LIST LIST/
     ID    Group   Letter
    (F4,   A2,     A8).
BEGIN DATA
     123    E      Alpha
     345    B      Beta
     345    C      Gamma
     345    D      Delta
     345    E      Epsilon
     678    C      Zeta
     678    D      Eta
     678    E      Theta
END DATA.
DATASET NAME     File2 WINDOW=FRONT.
LIST.


DATASET ACTIVATE File1 WINDOW=FRONT.

MATCH FILES /FILE=*
  /FILE='File2'
  /BY ID Group.

LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD