? Combine multiple data sets having same headers but different data and varying subject records

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

? Combine multiple data sets having same headers but different data and varying subject records

elle lists

Hi all,

May I ask for your advice on combining multiple data sets having the same variable/column names but with different data and varying numbers of records by person?  I received five data sets each ranging from 150,000 – 200,000 records. Each data set has the same number (47) of columns/variable names but the number of cases/rows varies by subject (each subject has a unique 11-13 character alphanumeric identifier). A partial example is shown below. Moreover: 1) the same subjects may appear in two or more data sets; and 2) while the columns in each of the five data sets share the same variable names the data within the columns is not always the same. 

id

VAR1

VAR2

VAR3

ZW5KLREQ1O1EET

46

1

C

B8484_EWUO#IK

62

1

G

B8484_EWUO#IK

56

1

G

RO1ILWSQ#TD8BT

41

1

G

RO1ILWSQ#TD8BT

32

1

G

RO1ILWSQ#TD8BT

55

1

G

41PAEGQ@FGKN

68

0

D

41PAEGQ@FGKN

71

0

D

41PAEGQ@FGKN

74

0

D

41PAEGQ@FGKN

55

0

D

T@1PEDF7@KM

62

1

G

 

Initially I thought of using AGGREGATE to collapse each file.  But then wasn’t sure about handling the next step: combining the five data sets into one when they have same column/variable names but may or may not have similar data.  So I don't think MATCHFILE alone is an option.

Hope someone can guide me out of this bog.  Thank you very much!

Elle

Reply | Threaded
Open this post in threaded view
|

Re: ? Combine multiple data sets having same headers but different data and varying subject records

Maguin, Eugene

Elle,

 What, precisely, does this this mean “. . . . the data within the columns is not always the same.” What it means to me is that var1 in some datasets is, for example, height and in other data sets it is weight. So, the ordinary thing to do would be to rename columns as needed so that all records in each column contained the same data. Why not do that here?

Ignoring that, what is the issue with varying numbers of records by person? Are records duplicated across the datasets and you want to build a final data set with unduplicated records?

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of elle lists
Sent: Monday, October 17, 2011 8:18 AM
To: [hidden email]
Subject: ? Combine multiple data sets having same headers but different data and varying subject records

 

Hi all,

May I ask for your advice on combining multiple data sets having the same variable/column names but with different data and varying numbers of records by person?  I received five data sets each ranging from 150,000 – 200,000 records. Each data set has the same number (47) of columns/variable names but the number of cases/rows varies by subject (each subject has a unique 11-13 character alphanumeric identifier). A partial example is shown below. Moreover: 1) the same subjects may appear in two or more data sets; and 2) while the columns in each of the five data sets share the same variable names the data within the columns is not always the same. 

 

id

VAR1

VAR2

VAR3

ZW5KLREQ1O1EET

46

1

C

B8484_EWUO#IK

62

1

G

B8484_EWUO#IK

56

1

G

RO1ILWSQ#TD8BT

41

1

G

RO1ILWSQ#TD8BT

32

1

G

RO1ILWSQ#TD8BT

55

1

G

41PAEGQ@FGKN

68

0

D

41PAEGQ@FGKN

71

0

D

41PAEGQ@FGKN

74

0

D

41PAEGQ@FGKN

55

0

D

T@1PEDF7@KM

62

1

G

 

Initially I thought of using AGGREGATE to collapse each file.  But then wasn’t sure about handling the next step: combining the five data sets into one when they have same column/variable names but may or may not have similar data.  So I don't think MATCHFILE alone is an option.

Hope someone can guide me out of this bog.  Thank you very much!

Elle

Reply | Threaded
Open this post in threaded view
|

Automatic reply: ? Combine multiple data sets having same headers but different data and varying subject records

Chih-Hung Chang

Out at a conference August 11-12.