Merging datasets: Different cases and different variables

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging datasets: Different cases and different variables

Matthew Reeder
Hi all,

  I have a question that seems fairly basic. I'm interested in merging cases from two fairly large datasets (we'll call them set1 and set2) together into a merged set (set3), but it's a bit trickier than that. Below are the key points.

  + Some of these variables are shared between datasets (i.e., variables that appear in both sets, with the same range of possible values, with values having the same meanings), while other variables are unique to one dataset or the other (i.e., variables that appear in only one of the two datasets).

  Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2 contains variables v1, v2, v3, v6, v7, v8.

  + The cases in both datasets are all unique. Cases appearing in set1 do not appear among the cases in the set2 and vice versa.

  I want to merge set1 and set2 together such that all cases appear and v1-v8 are all present in set3. Data for the shared variables should all appear under those shared variables (e.g., v1, v2, v3, v6). I need the unique variables to appear, as well. In situations where a unique variable had not been in one of the two datasets (e.g., shown above, v4 does not appear in set2), I need values for those cases to be sysmis or 9999 or something comparable. Therefore, set2 would have SYSMIS or 9999 values for v4.

  Is there an efficent way of going about this that does not involve identifying all of the unique variables between the datasets beforehand and computing them separately in each dataset and then merging? I couldn't find this addressed in any of the SPSS manuals, in that they usually discuss either (1) adding variables between datasets with shared cases or (2) adding cases between datasets with shared variables. This would seem to me to be a fairly common procedure, so I'm guessing there is some way of going about it?


  Thanks a ton for any help,

  - Matt


---------------------------------
Need a vacation? Get great deals to amazing places on Yahoo! Travel.
Reply | Threaded
Open this post in threaded view
|

Re: Merging datasets: Different cases and different variables

Hector Maletta
         Matthew,
         Just merge the two files. SPSS would do exactly what you want.
Cases without a variable will be sysmis. The syntax is:

         MATCH FILES/FILE 'SET1.SAV'/FILE 'SET2.SAV'/BY SET IDVAR.

         This assumes there is a variable called SET identifying the set,
and another variable called ID that uniquely identifies each case.
         If the BY clause is omitted and no BY variable is named, the two
files would be matched in the order the cases appear, i.e. the first case in
set 1 with the first case in set 2, etc., which is not what you want.
Therefore you NEED a variable identifying the set. However, if you have a
variable identifying the set, in your case you do not need a variable
identifying the case since the cases are different in the two sets.

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Matthew Reeder
Sent: 14 June 2007 14:16
To: [hidden email]
Subject: Merging datasets: Different cases and different variables

         Hi all,

           I have a question that seems fairly basic. I'm interested in
merging cases from two fairly large datasets (we'll call them set1 and set2)
together into a merged set (set3), but it's a bit trickier than that. Below
are the key points.

           + Some of these variables are shared between datasets (i.e.,
variables that appear in both sets, with the same range of possible values,
with values having the same meanings), while other variables are unique to
one dataset or the other (i.e., variables that appear in only one of the two
datasets).

           Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2
contains variables v1, v2, v3, v6, v7, v8.

           + The cases in both datasets are all unique. Cases appearing in
set1 do not appear among the cases in the set2 and vice versa.

           I want to merge set1 and set2 together such that all cases appear
and v1-v8 are all present in set3. Data for the shared variables should all
appear under those shared variables (e.g., v1, v2, v3, v6). I need the
unique variables to appear, as well. In situations where a unique variable
had not been in one of the two datasets (e.g., shown above, v4 does not
appear in set2), I need values for those cases to be sysmis or 9999 or
something comparable. Therefore, set2 would have SYSMIS or 9999 values for
v4.


           Is there an efficent way of going about this that does not
involve identifying all of the unique variables between the datasets
beforehand and computing them separately in each dataset and then merging? I
couldn't find this addressed in any of the SPSS manuals, in that they
usually discuss either (1) adding variables between datasets with shared
cases or (2) adding cases between datasets with shared variables. This would
seem to me to be a fairly common procedure, so I'm guessing there is some
way of going about it?


           Thanks a ton for any help,

           - Matt


         ---------------------------------
         Need a vacation? Get great deals to amazing places on Yahoo!
Travel.
Reply | Threaded
Open this post in threaded view
|

Re: Merging datasets: Different cases and different variables

Richard Ristow
In reply to this post by Matthew Reeder
At 01:15 PM 6/14/2007, Matthew Reeder wrote:

>I'm interested in merging cases from two datasets (we'll call them
>set1 and set2) together into a merged set (set3). Below are the key
>points.
>
>   + Some of these variables are shared between datasets (i.e.,
> variables that appear in both sets, with the same range of possible
> values, with values having the same meanings), while other variables
> are unique to one dataset or the other (i.e., variables that appear
> in only one of the two datasets).
>
>   Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2
> contains variables v1, v2, v3, v6, v7, v8.
>
>   + The cases in both datasets are all unique. Cases appearing in
> set1 do not appear among the cases in the set2 and vice versa.

Good. That makes it really easy.

>   I want to merge set1 and set2 together such that all cases appear
> and v1-v8 are all present in set3. Data for the shared variables
> should all appear under those shared variables (e.g., v1, v2, v3,
> v6). I need the unique variables to appear, as well. In situations
> where a unique variable had not been in one of the two datasets
> (e.g., shown above, v4 does not appear in set2), I need values for
> those cases to be sysmis or 9999 or something comparable.

ADD FILES
   /FILE=set1
   /FILE=set2.

SAVE OUTFILE=set3 /* if desired */.

does exactly what you want. If there's a variable or variables on which
the file is to be sorted, make sure that set1 and set2 are sorted by
those variables, and use /BY, naming those variables, on the ADD FILES.
Reply | Threaded
Open this post in threaded view
|

Re: More on merging datasets: Different cases and different variables

Hector Maletta
In reply to this post by Matthew Reeder
         Even more simple:
         Since the cases in one data set are not repeated in the other data
set, in fact you can also use ADD FILES instead of MATCH FILES, without
necessarily using any BY clause: ADD FILES /FILE 'SET1.SAV'/FILE 'SET2.SAV'.

         Hector


         -----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: 14 June 2007 14:39
To: 'Matthew Reeder'; '[hidden email]'
Subject: RE: Merging datasets: Different cases and different variables

                  Matthew,
                  Just merge the two files. SPSS would do exactly what you
want. Cases without a variable will be sysmis. The syntax is:

                  MATCH FILES/FILE 'SET1.SAV'/FILE 'SET2.SAV'/BY SET IDVAR.

                  This assumes there is a variable called SET identifying
the set, and another variable called ID that uniquely identifies each case.
                  If the BY clause is omitted and no BY variable is named,
the two files would be matched in the order the cases appear, i.e. the first
case in set 1 with the first case in set 2, etc., which is not what you
want. Therefore you NEED a variable identifying the set. However, if you
have a variable identifying the set, in your case you do not need a variable
identifying the case since the cases are different in the two sets.

                  Hector

                  -----Original Message-----
         From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of Matthew Reeder
         Sent: 14 June 2007 14:16
         To: [hidden email]
         Subject: Merging datasets: Different cases and different variables

                  Hi all,

                    I have a question that seems fairly basic. I'm
interested in merging cases from two fairly large datasets (we'll call them
set1 and set2) together into a merged set (set3), but it's a bit trickier
than that. Below are the key points.

                    + Some of these variables are shared between datasets
(i.e., variables that appear in both sets, with the same range of possible
values, with values having the same meanings), while other variables are
unique to one dataset or the other (i.e., variables that appear in only one
of the two datasets).

                    Set1 might contain variables v1, v2, v3, v4, v5, v6,
while set2 contains variables v1, v2, v3, v6, v7, v8.

                    + The cases in both datasets are all unique. Cases
appearing in set1 do not appear among the cases in the set2 and vice versa.

                    I want to merge set1 and set2 together such that all
cases appear and v1-v8 are all present in set3. Data for the shared
variables should all appear under those shared variables (e.g., v1, v2, v3,
v6). I need the unique variables to appear, as well. In situations where a
unique variable had not been in one of the two datasets (e.g., shown above,
v4 does not appear in set2), I need values for those cases to be sysmis or
9999 or something comparable. Therefore, set2 would have SYSMIS or 9999
values for v4.


                    Is there an efficent way of going about this that does
not involve identifying all of the unique variables between the datasets
beforehand and computing them separately in each dataset and then merging? I
couldn't find this addressed in any of the SPSS manuals, in that they
usually discuss either (1) adding variables between datasets with shared
cases or (2) adding cases between datasets with shared variables. This would
seem to me to be a fairly common procedure, so I'm guessing there is some
way of going about it?


                    Thanks a ton for any help,

                    - Matt


                  ---------------------------------
                  Need a vacation? Get great deals to amazing places on
Yahoo! Travel.
Reply | Threaded
Open this post in threaded view
|

Re: More on merging datasets: Different cases and different variables

Matthew Reeder
Hey Hector,

  Thanks for the ADD FILE and MATCH FILE lines; I didn't realize it would be as simple as this. I used the GUI to do this before (Data --> Merge Files), which was where I kept having a hard time. I tried both of these and they worked well.


  - matt

Hector Maletta <[hidden email]> wrote:
  Even more simple:
Since the cases in one data set are not repeated in the other data
set, in fact you can also use ADD FILES instead of MATCH FILES, without
necessarily using any BY clause: ADD FILES /FILE 'SET1.SAV'/FILE 'SET2.SAV'.

Hector


-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: 14 June 2007 14:39
To: 'Matthew Reeder'; '[hidden email]'
Subject: RE: Merging datasets: Different cases and different variables

Matthew,
Just merge the two files. SPSS would do exactly what you
want. Cases without a variable will be sysmis. The syntax is:

MATCH FILES/FILE 'SET1.SAV'/FILE 'SET2.SAV'/BY SET IDVAR.

This assumes there is a variable called SET identifying
the set, and another variable called ID that uniquely identifies each case.
If the BY clause is omitted and no BY variable is named,
the two files would be matched in the order the cases appear, i.e. the first
case in set 1 with the first case in set 2, etc., which is not what you
want. Therefore you NEED a variable identifying the set. However, if you
have a variable identifying the set, in your case you do not need a variable
identifying the case since the cases are different in the two sets.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of Matthew Reeder
Sent: 14 June 2007 14:16
To: [hidden email]
Subject: Merging datasets: Different cases and different variables

Hi all,

I have a question that seems fairly basic. I'm
interested in merging cases from two fairly large datasets (we'll call them
set1 and set2) together into a merged set (set3), but it's a bit trickier
than that. Below are the key points.

+ Some of these variables are shared between datasets
(i.e., variables that appear in both sets, with the same range of possible
values, with values having the same meanings), while other variables are
unique to one dataset or the other (i.e., variables that appear in only one
of the two datasets).

Set1 might contain variables v1, v2, v3, v4, v5, v6,
while set2 contains variables v1, v2, v3, v6, v7, v8.

+ The cases in both datasets are all unique. Cases
appearing in set1 do not appear among the cases in the set2 and vice versa.

I want to merge set1 and set2 together such that all
cases appear and v1-v8 are all present in set3. Data for the shared
variables should all appear under those shared variables (e.g., v1, v2, v3,
v6). I need the unique variables to appear, as well. In situations where a
unique variable had not been in one of the two datasets (e.g., shown above,
v4 does not appear in set2), I need values for those cases to be sysmis or
9999 or something comparable. Therefore, set2 would have SYSMIS or 9999
values for v4.


Is there an efficent way of going about this that does
not involve identifying all of the unique variables between the datasets
beforehand and computing them separately in each dataset and then merging? I
couldn't find this addressed in any of the SPSS manuals, in that they
usually discuss either (1) adding variables between datasets with shared
cases or (2) adding cases between datasets with shared variables. This would
seem to me to be a fairly common procedure, so I'm guessing there is some
way of going about it?


Thanks a ton for any help,

- Matt


---------------------------------
Need a vacation? Get great deals to amazing places on
Yahoo! Travel.



---------------------------------
Need a vacation? Get great deals to amazing places on Yahoo! Travel.