Hi all, May I ask for your advice on combining multiple data sets having the same variable/column names but with different data and varying numbers of records by person? I received five data sets each ranging from 150,000 – 200,000 records. Each data set has the same number (47) of columns/variable names but the number of cases/rows varies by subject (each subject has a unique 11-13 character alphanumeric identifier). A partial example is shown below. Moreover: 1) the same subjects may appear in two or more data sets; and 2) while the columns in each of the five data sets share the same variable names the data within the columns is not always the same.
Initially I thought of using AGGREGATE to collapse each file. But then wasn’t sure about handling the next step: combining the five data sets into one when they have same column/variable names but may or may not have similar data. So I don't think MATCHFILE alone is an option. Hope someone can guide me out of this bog. Thank you very much! Elle |
Elle, What, precisely, does this this mean “. . . . the data within the columns is not always the same.” What it means to me is that var1 in some datasets is, for example, height and in other data sets it is weight. So, the ordinary thing to do would be to rename columns as needed so that all records in each column contained the same data. Why not do that here? Ignoring that, what is the issue with varying numbers of records by person? Are records duplicated across the datasets and you want to build a final data set with unduplicated records? Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of elle lists Hi all, May I ask for your advice on combining multiple data sets having the same variable/column names but with different data and varying numbers of records by person? I received five data sets each ranging from 150,000 – 200,000 records. Each data set has the same number (47) of columns/variable names but the number of cases/rows varies by subject (each subject has a unique 11-13 character alphanumeric identifier). A partial example is shown below. Moreover: 1) the same subjects may appear in two or more data sets; and 2) while the columns in each of the five data sets share the same variable names the data within the columns is not always the same.
Initially I thought of using AGGREGATE to collapse each file. But then wasn’t sure about handling the next step: combining the five data sets into one when they have same column/variable names but may or may not have similar data. So I don't think MATCHFILE alone is an option. Hope someone can guide me out of this bog. Thank you very much! Elle |
Out at a conference August 11-12. |
Free forum by Nabble | Edit this page |