SPSSX Discussion

? Combine multiple data sets having same headers but different data and varying subject records

Classic

List

Threaded

3 messages Options

elle lists

? Combine multiple data sets having same headers but different data and varying subject records

Hi all,

May I ask for your advice on combining multiple data sets having the same variable/column names but with different data and varying numbers of records by person? I received five data sets each ranging from 150,000 – 200,000 records. Each data set has the same number (47) of columns/variable names but the number of cases/rows varies by subject (each subject has a unique 11-13 character alphanumeric identifier). A partial example is shown below. Moreover: 1) the same subjects may appear in two or more data sets; and 2) while the columns in each of the five data sets share the same variable names the data within the columns is not always the same.


id	VAR1	VAR2	VAR3

ZW5KLREQ1O1EET	46	1	C

B8484_EWUO#IK	62	1	G

B8484_EWUO#IK	56	1	G

RO1ILWSQ#TD8BT	41	1	G

RO1ILWSQ#TD8BT	32	1	G

RO1ILWSQ#TD8BT	55	1	G

41PAEGQ@FGKN	68	0	D

41PAEGQ@FGKN	71	0	D

41PAEGQ@FGKN	74	0	D

41PAEGQ@FGKN	55	0	D

T@1PEDF7@KM	62	1	G

Initially I thought of using AGGREGATE to collapse each file. But then wasn’t sure about handling the next step: combining the five data sets into one when they have same column/variable names but may or may not have similar data. So I don't think MATCHFILE alone is an option.

Hope someone can guide me out of this bog. Thank you very much!

Elle

Maguin, Eugene

Re: ? Combine multiple data sets having same headers but different data and varying subject records

Elle,

What, precisely, does this this mean “. . . . the data within the columns is not always the same.” What it means to me is that var1 in some datasets is, for example, height and in other data sets it is weight. So, the ordinary thing to do would be to rename columns as needed so that all records in each column contained the same data. Why not do that here?

Ignoring that, what is the issue with varying numbers of records by person? Are records duplicated across the datasets and you want to build a final data set with unduplicated records?

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of elle lists
Sent: Monday, October 17, 2011 8:18 AM
To: [hidden email]
Subject: ? Combine multiple data sets having same headers but different data and varying subject records

Hi all,


id	VAR1	VAR2	VAR3

ZW5KLREQ1O1EET	46	1	C

B8484_EWUO#IK	62	1	G

B8484_EWUO#IK	56	1	G

RO1ILWSQ#TD8BT	41	1	G

RO1ILWSQ#TD8BT	32	1	G

RO1ILWSQ#TD8BT	55	1	G

41PAEGQ@FGKN	68	0	D

41PAEGQ@FGKN	71	0	D

41PAEGQ@FGKN	74	0	D

41PAEGQ@FGKN	55	0	D

T@1PEDF7@KM	62	1	G

Hope someone can guide me out of this bog. Thank you very much!

Elle

Chih-Hung Chang

Automatic reply: ? Combine multiple data sets having same headers but different data and varying subject records

Out at a conference August 11-12.