|
Hi all,
I have a question that seems fairly basic. I'm interested in merging cases from two fairly large datasets (we'll call them set1 and set2) together into a merged set (set3), but it's a bit trickier than that. Below are the key points. + Some of these variables are shared between datasets (i.e., variables that appear in both sets, with the same range of possible values, with values having the same meanings), while other variables are unique to one dataset or the other (i.e., variables that appear in only one of the two datasets). Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2 contains variables v1, v2, v3, v6, v7, v8. + The cases in both datasets are all unique. Cases appearing in set1 do not appear among the cases in the set2 and vice versa. I want to merge set1 and set2 together such that all cases appear and v1-v8 are all present in set3. Data for the shared variables should all appear under those shared variables (e.g., v1, v2, v3, v6). I need the unique variables to appear, as well. In situations where a unique variable had not been in one of the two datasets (e.g., shown above, v4 does not appear in set2), I need values for those cases to be sysmis or 9999 or something comparable. Therefore, set2 would have SYSMIS or 9999 values for v4. Is there an efficent way of going about this that does not involve identifying all of the unique variables between the datasets beforehand and computing them separately in each dataset and then merging? I couldn't find this addressed in any of the SPSS manuals, in that they usually discuss either (1) adding variables between datasets with shared cases or (2) adding cases between datasets with shared variables. This would seem to me to be a fairly common procedure, so I'm guessing there is some way of going about it? Thanks a ton for any help, - Matt --------------------------------- Need a vacation? Get great deals to amazing places on Yahoo! Travel. |
|
Matthew,
Just merge the two files. SPSS would do exactly what you want. Cases without a variable will be sysmis. The syntax is: MATCH FILES/FILE 'SET1.SAV'/FILE 'SET2.SAV'/BY SET IDVAR. This assumes there is a variable called SET identifying the set, and another variable called ID that uniquely identifies each case. If the BY clause is omitted and no BY variable is named, the two files would be matched in the order the cases appear, i.e. the first case in set 1 with the first case in set 2, etc., which is not what you want. Therefore you NEED a variable identifying the set. However, if you have a variable identifying the set, in your case you do not need a variable identifying the case since the cases are different in the two sets. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matthew Reeder Sent: 14 June 2007 14:16 To: [hidden email] Subject: Merging datasets: Different cases and different variables Hi all, I have a question that seems fairly basic. I'm interested in merging cases from two fairly large datasets (we'll call them set1 and set2) together into a merged set (set3), but it's a bit trickier than that. Below are the key points. + Some of these variables are shared between datasets (i.e., variables that appear in both sets, with the same range of possible values, with values having the same meanings), while other variables are unique to one dataset or the other (i.e., variables that appear in only one of the two datasets). Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2 contains variables v1, v2, v3, v6, v7, v8. + The cases in both datasets are all unique. Cases appearing in set1 do not appear among the cases in the set2 and vice versa. I want to merge set1 and set2 together such that all cases appear and v1-v8 are all present in set3. Data for the shared variables should all appear under those shared variables (e.g., v1, v2, v3, v6). I need the unique variables to appear, as well. In situations where a unique variable had not been in one of the two datasets (e.g., shown above, v4 does not appear in set2), I need values for those cases to be sysmis or 9999 or something comparable. Therefore, set2 would have SYSMIS or 9999 values for v4. Is there an efficent way of going about this that does not involve identifying all of the unique variables between the datasets beforehand and computing them separately in each dataset and then merging? I couldn't find this addressed in any of the SPSS manuals, in that they usually discuss either (1) adding variables between datasets with shared cases or (2) adding cases between datasets with shared variables. This would seem to me to be a fairly common procedure, so I'm guessing there is some way of going about it? Thanks a ton for any help, - Matt --------------------------------- Need a vacation? Get great deals to amazing places on Yahoo! Travel. |
|
In reply to this post by Matthew Reeder
At 01:15 PM 6/14/2007, Matthew Reeder wrote:
>I'm interested in merging cases from two datasets (we'll call them >set1 and set2) together into a merged set (set3). Below are the key >points. > > + Some of these variables are shared between datasets (i.e., > variables that appear in both sets, with the same range of possible > values, with values having the same meanings), while other variables > are unique to one dataset or the other (i.e., variables that appear > in only one of the two datasets). > > Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2 > contains variables v1, v2, v3, v6, v7, v8. > > + The cases in both datasets are all unique. Cases appearing in > set1 do not appear among the cases in the set2 and vice versa. Good. That makes it really easy. > I want to merge set1 and set2 together such that all cases appear > and v1-v8 are all present in set3. Data for the shared variables > should all appear under those shared variables (e.g., v1, v2, v3, > v6). I need the unique variables to appear, as well. In situations > where a unique variable had not been in one of the two datasets > (e.g., shown above, v4 does not appear in set2), I need values for > those cases to be sysmis or 9999 or something comparable. ADD FILES /FILE=set1 /FILE=set2. SAVE OUTFILE=set3 /* if desired */. does exactly what you want. If there's a variable or variables on which the file is to be sorted, make sure that set1 and set2 are sorted by those variables, and use /BY, naming those variables, on the ADD FILES. |
|
In reply to this post by Matthew Reeder
Even more simple:
Since the cases in one data set are not repeated in the other data set, in fact you can also use ADD FILES instead of MATCH FILES, without necessarily using any BY clause: ADD FILES /FILE 'SET1.SAV'/FILE 'SET2.SAV'. Hector -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: 14 June 2007 14:39 To: 'Matthew Reeder'; '[hidden email]' Subject: RE: Merging datasets: Different cases and different variables Matthew, Just merge the two files. SPSS would do exactly what you want. Cases without a variable will be sysmis. The syntax is: MATCH FILES/FILE 'SET1.SAV'/FILE 'SET2.SAV'/BY SET IDVAR. This assumes there is a variable called SET identifying the set, and another variable called ID that uniquely identifies each case. If the BY clause is omitted and no BY variable is named, the two files would be matched in the order the cases appear, i.e. the first case in set 1 with the first case in set 2, etc., which is not what you want. Therefore you NEED a variable identifying the set. However, if you have a variable identifying the set, in your case you do not need a variable identifying the case since the cases are different in the two sets. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matthew Reeder Sent: 14 June 2007 14:16 To: [hidden email] Subject: Merging datasets: Different cases and different variables Hi all, I have a question that seems fairly basic. I'm interested in merging cases from two fairly large datasets (we'll call them set1 and set2) together into a merged set (set3), but it's a bit trickier than that. Below are the key points. + Some of these variables are shared between datasets (i.e., variables that appear in both sets, with the same range of possible values, with values having the same meanings), while other variables are unique to one dataset or the other (i.e., variables that appear in only one of the two datasets). Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2 contains variables v1, v2, v3, v6, v7, v8. + The cases in both datasets are all unique. Cases appearing in set1 do not appear among the cases in the set2 and vice versa. I want to merge set1 and set2 together such that all cases appear and v1-v8 are all present in set3. Data for the shared variables should all appear under those shared variables (e.g., v1, v2, v3, v6). I need the unique variables to appear, as well. In situations where a unique variable had not been in one of the two datasets (e.g., shown above, v4 does not appear in set2), I need values for those cases to be sysmis or 9999 or something comparable. Therefore, set2 would have SYSMIS or 9999 values for v4. Is there an efficent way of going about this that does not involve identifying all of the unique variables between the datasets beforehand and computing them separately in each dataset and then merging? I couldn't find this addressed in any of the SPSS manuals, in that they usually discuss either (1) adding variables between datasets with shared cases or (2) adding cases between datasets with shared variables. This would seem to me to be a fairly common procedure, so I'm guessing there is some way of going about it? Thanks a ton for any help, - Matt --------------------------------- Need a vacation? Get great deals to amazing places on Yahoo! Travel. |
|
Hey Hector,
Thanks for the ADD FILE and MATCH FILE lines; I didn't realize it would be as simple as this. I used the GUI to do this before (Data --> Merge Files), which was where I kept having a hard time. I tried both of these and they worked well. - matt Hector Maletta <[hidden email]> wrote: Even more simple: Since the cases in one data set are not repeated in the other data set, in fact you can also use ADD FILES instead of MATCH FILES, without necessarily using any BY clause: ADD FILES /FILE 'SET1.SAV'/FILE 'SET2.SAV'. Hector -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: 14 June 2007 14:39 To: 'Matthew Reeder'; '[hidden email]' Subject: RE: Merging datasets: Different cases and different variables Matthew, Just merge the two files. SPSS would do exactly what you want. Cases without a variable will be sysmis. The syntax is: MATCH FILES/FILE 'SET1.SAV'/FILE 'SET2.SAV'/BY SET IDVAR. This assumes there is a variable called SET identifying the set, and another variable called ID that uniquely identifies each case. If the BY clause is omitted and no BY variable is named, the two files would be matched in the order the cases appear, i.e. the first case in set 1 with the first case in set 2, etc., which is not what you want. Therefore you NEED a variable identifying the set. However, if you have a variable identifying the set, in your case you do not need a variable identifying the case since the cases are different in the two sets. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matthew Reeder Sent: 14 June 2007 14:16 To: [hidden email] Subject: Merging datasets: Different cases and different variables Hi all, I have a question that seems fairly basic. I'm interested in merging cases from two fairly large datasets (we'll call them set1 and set2) together into a merged set (set3), but it's a bit trickier than that. Below are the key points. + Some of these variables are shared between datasets (i.e., variables that appear in both sets, with the same range of possible values, with values having the same meanings), while other variables are unique to one dataset or the other (i.e., variables that appear in only one of the two datasets). Set1 might contain variables v1, v2, v3, v4, v5, v6, while set2 contains variables v1, v2, v3, v6, v7, v8. + The cases in both datasets are all unique. Cases appearing in set1 do not appear among the cases in the set2 and vice versa. I want to merge set1 and set2 together such that all cases appear and v1-v8 are all present in set3. Data for the shared variables should all appear under those shared variables (e.g., v1, v2, v3, v6). I need the unique variables to appear, as well. In situations where a unique variable had not been in one of the two datasets (e.g., shown above, v4 does not appear in set2), I need values for those cases to be sysmis or 9999 or something comparable. Therefore, set2 would have SYSMIS or 9999 values for v4. Is there an efficent way of going about this that does not involve identifying all of the unique variables between the datasets beforehand and computing them separately in each dataset and then merging? I couldn't find this addressed in any of the SPSS manuals, in that they usually discuss either (1) adding variables between datasets with shared cases or (2) adding cases between datasets with shared variables. This would seem to me to be a fairly common procedure, so I'm guessing there is some way of going about it? Thanks a ton for any help, - Matt --------------------------------- Need a vacation? Get great deals to amazing places on Yahoo! Travel. --------------------------------- Need a vacation? Get great deals to amazing places on Yahoo! Travel. |
| Free forum by Nabble | Edit this page |
