SPSSX Discussion

Re: ? Combine multiple data sets having same headers but different data and varying subject record

Classic

List

Threaded

9 messages Options

elle lists

Re: ? Combine multiple data sets having same headers but different data and varying subject record

Hi Gene,

Thanks so much for your help. Here's an example which may help illustrate the problem. The two data sets have the same variable names with corresponding info on Score and Status. However, in Dataset #2, three records sharing the same IDs as shown in Dataset #1 have different score and status info.

I'd like to build one data file from the 5 separate files where: 1) there's one record per row, but which 2) also captures instances where the data within variables differ between data sets. So besides having Score and Status, there would be something like Score1 and Status1 to show the different info in Dataset #2. I think AGGREGATE will address the first issue of collapsing the data but I'm puzzled on how to address the second issue.

Elle

DATA SET #1			DATA SET #2

ID	SCORE	STATUS	ID	SCORE	STATUS
ZW5KLREQ1O1EET	46	1	ZW5KLREQ1O1EET	52	2
B8484_EWUO#IK	62	1	B8484_EWUO#IK	63	2
B8484_EWUO#IK	62	1	X1#27WB1FGKN	44	1
RO1ILWSQ#TD8BT	41	1	23_BILPW0F@QBT	77	3
RO1ILWSQ#TD8BT	41	1	RO1ILWSQ#TD8BT	44	2

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Monday, October 17, 2011 3:11 AM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject records

Elle,

What, precisely, does this this mean “. . . . the data within the columns is not always the same.” What it means to me is that var1 in some datasets is, for example, height and in other data sets it is weight. So, the ordinary thing to do would be to rename columns as needed so that all records in each column contained the same data. Why not do that here?

Ignoring that, what is the issue with varying numbers of records by person? Are records duplicated across the datasets and you want to build a final data set with unduplicated records?

Gene Maguin

David Marso

Re: ? Combine multiple data sets having same headers but different data and varying subject record

Administrator

ADD FILES
/ FILE=FILE1 / IN=IN1
/ FILE=FILE2 / IN=IN2
/ FILE=FILE3 / IN=IN3
/ FILE=FILE4 / IN=IN4
/ FILE=FILE5 / IN=IN5.
COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
VECTOR SCORE(5) STATUS(5).
COMPUTE SCORE(RNUM)=SCORE.
COMPUTE STATUS(RNUM)=STATUS.
AGGREGATE OUTFILE *
/ BREAK ID
/ SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1 TO STATUS5).

elle lists wrote

Hi Gene,

Thanks so much for your help. Here's an example which may help illustrate
the problem. The two data sets have the same variable names with
corresponding info on Score and Status. However, in Dataset #2, three
records sharing the same IDs as shown in Dataset #1 have different score and
status info.

I'd like to build one data file from the 5 separate files where: 1) there's
one record per row, but which 2) also captures instances where the data
within variables differ between data sets. So besides having Score and
Status, there would be something like Score1 and Status1 to show the
different info in Dataset #2. I think AGGREGATE will address the first
issue of collapsing the data but I'm puzzled on how to address the second
issue.

Elle

*DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE STATUS
ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1 RO1ILWSQ#TD8BT
41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2

*From:* SPSSX(r) Discussion [mailto:[hidden email]] *On Behalf Of
*Gene Maguin
*Sent:* Monday, October 17, 2011 3:11 AM
*To:* [hidden email]
*Subject:* Re: ? Combine multiple data sets having same headers but
different data and varying subject records

Elle,

What, precisely, does this this mean “. . . . the data within the columns
is not always the same.” What it means to me is that var1 in some datasets
is, for example, height and in other data sets it is weight. So, the
ordinary thing to do would be to rename columns as needed so that all
records in each column contained the same data. Why not do that here?

Ignoring that, what is the issue with varying numbers of records by person?
Are records duplicated across the datasets and you want to build a final
data set with unduplicated records?

Gene Maguin

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

elle lists

Re: ? Combine multiple data sets having same headers but different data and varying subject record

In reply to this post by elle lists

Hi David,

Thanks very much for the syntax. I'm still a way from being syntax-literate so my apologies in advance if I'm not comprehending the syntax. It seems (I think) that the syntax taking SCORE and STATUS from each data set to create a final data set that contains SCORE1 to SCORE5 and STATUS1 to STATUS5. If so, then I'm thinking I should also include the corresponding subject ID variable from the 5 data sets as well.

Thank you for your example as it's giving me a better sense of how to proceed.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, October 17, 2011 1:50 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

ADD FILES
/ FILE=FILE1 / IN=IN1
/ FILE=FILE2 / IN=IN2
/ FILE=FILE3 / IN=IN3
/ FILE=FILE4 / IN=IN4
/ FILE=FILE5 / IN=IN5.
COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
VECTOR SCORE(5) STATUS(5).
COMPUTE SCORE(RNUM)=SCORE.
COMPUTE STATUS(RNUM)=STATUS.
AGGREGATE OUTFILE *
/ BREAK ID
/ SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1 TO STATUS5).

elle lists wrote:

>
> Hi Gene,
>
> Thanks so much for your help. Here's an example which may help
> illustrate the problem. The two data sets have the same variable names
> with corresponding info on Score and Status. However, in Dataset #2,
> three records sharing the same IDs as shown in Dataset #1 have
> different score and status info.
>
> I'd like to build one data file from the 5 separate files where: 1)
> there's one record per row, but which 2) also captures instances
> where the data within variables differ between data sets. So besides
> having Score and Status, there would be something like Score1 and
> Status1 to show the different info in Dataset #2. I think AGGREGATE
> will address the first issue of collapsing the data but I'm puzzled on
> how to address the second issue.
>
>
> Elle
>
>
>
> *DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE
> STATUS
> ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
> B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1 RO1ILWSQ#TD8BT
> 41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2
>
>
>
>
>
>
>
> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf Of *Gene
> Maguin
> *Sent:* Monday, October 17, 2011 3:11 AM
> *To:* SPSSX-L@.UGA
> *Subject:* Re: ? Combine multiple data sets having same headers but
> different data and varying subject records
>
>
>
> Elle,
>
> What, precisely, does this this mean “. . . . the data within the
> columns is not always the same.” What it means to me is that var1 in
> some datasets is, for example, height and in other data sets it is
> weight. So, the ordinary thing to do would be to rename columns as
> needed so that all records in each column contained the same data. Why not do that here?
>
> Ignoring that, what is the issue with varying numbers of records by
> person?
> Are records duplicated across the datasets and you want to build a
> final data set with unduplicated records?
>
> Gene Maguin
>

Maguin, Eugene

Re: ? Combine multiple data sets having same headers but different data and varying subject record

In reply to this post by elle lists

Elle,

No, you don't need to because (read up on the aggregate command) the aggregate command first sorts the file by the break variable and then writes a new record for each group of records with the same id value.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of elle
Sent: Tuesday, October 18, 2011 7:25 AM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

Hi David,

Thanks very much for the syntax. I'm still a way from being syntax-literate so my apologies in advance if I'm not comprehending the syntax. It seems (I think) that the syntax taking SCORE and STATUS from each data set to create a final data set that contains SCORE1 to SCORE5 and STATUS1 to STATUS5. If so, then I'm thinking I should also include the corresponding subject ID variable from the 5 data sets as well.

Thank you for your example as it's giving me a better sense of how to proceed.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, October 17, 2011 1:50 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

ADD FILES
/ FILE=FILE1 / IN=IN1
/ FILE=FILE2 / IN=IN2
/ FILE=FILE3 / IN=IN3
/ FILE=FILE4 / IN=IN4
/ FILE=FILE5 / IN=IN5.
COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
VECTOR SCORE(5) STATUS(5).
COMPUTE SCORE(RNUM)=SCORE.
COMPUTE STATUS(RNUM)=STATUS.
AGGREGATE OUTFILE *
/ BREAK ID
/ SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1 TO STATUS5).

elle lists wrote:

>
> Hi Gene,
>
> Thanks so much for your help. Here's an example which may help
> illustrate the problem. The two data sets have the same variable names
> with corresponding info on Score and Status. However, in Dataset #2,
> three records sharing the same IDs as shown in Dataset #1 have
> different score and status info.
>
> I'd like to build one data file from the 5 separate files where: 1)
> there's one record per row, but which 2) also captures instances
> where the data within variables differ between data sets. So besides
> having Score and Status, there would be something like Score1 and
> Status1 to show the different info in Dataset #2. I think AGGREGATE
> will address the first issue of collapsing the data but I'm puzzled on
> how to address the second issue.
>
>
> Elle
>
>
>
> *DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE
> STATUS
> ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
> B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1 RO1ILWSQ#TD8BT
> 41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2
>
>
>
>
>
>
>
> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf Of *Gene
> Maguin
> *Sent:* Monday, October 17, 2011 3:11 AM
> *To:* SPSSX-L@.UGA
> *Subject:* Re: ? Combine multiple data sets having same headers but
> different data and varying subject records
>
>
>
> Elle,
>
> What, precisely, does this this mean . . . . the data within the
> columns is not always the same. What it means to me is that var1 in
> some datasets is, for example, height and in other data sets it is
> weight. So, the ordinary thing to do would be to rename columns as
> needed so that all records in each column contained the same data. Why not do that here?
>
> Ignoring that, what is the issue with varying numbers of records by
> person?
> Are records duplicated across the datasets and you want to build a
> final data set with unduplicated records?
>
> Gene Maguin
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data-sets-having-same-headers-but-different-data-and-varying-subject-record-tp4911745p4911943.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

elle lists

Re: ? Combine multiple data sets having same headers but different data and varying subject record

In reply to this post by elle lists

Gene, thanks very much for the explanation. Will check the manual for more on "aggregate". ;) Thank you, too, David Marso. Greatly appreciate your and Gene's help.

Best,

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Tuesday, October 18, 2011 3:21 AM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

Elle,

No, you don't need to because (read up on the aggregate command) the aggregate command first sorts the file by the break variable and then writes a new record for each group of records with the same id value.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of elle
Sent: Tuesday, October 18, 2011 7:25 AM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

Hi David,

Thanks very much for the syntax. I'm still a way from being syntax-literate so my apologies in advance if I'm not comprehending the syntax. It seems (I think) that the syntax taking SCORE and STATUS from each data set to create a final data set that contains SCORE1 to SCORE5 and STATUS1 to STATUS5. If so, then I'm thinking I should also include the corresponding subject ID variable from the 5 data sets as well.

Thank you for your example as it's giving me a better sense of how to proceed.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, October 17, 2011 1:50 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

ADD FILES
/ FILE=FILE1 / IN=IN1
/ FILE=FILE2 / IN=IN2
/ FILE=FILE3 / IN=IN3
/ FILE=FILE4 / IN=IN4
/ FILE=FILE5 / IN=IN5.
COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
VECTOR SCORE(5) STATUS(5).
COMPUTE SCORE(RNUM)=SCORE.
COMPUTE STATUS(RNUM)=STATUS.
AGGREGATE OUTFILE *
/ BREAK ID
/ SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1 TO STATUS5).

elle lists wrote:

>
> Hi Gene,
>
> Thanks so much for your help. Here's an example which may help
> illustrate the problem. The two data sets have the same variable names
> with corresponding info on Score and Status. However, in Dataset #2,
> three records sharing the same IDs as shown in Dataset #1 have
> different score and status info.
>
> I'd like to build one data file from the 5 separate files where: 1)
> there's one record per row, but which 2) also captures instances
> where the data within variables differ between data sets. So besides
> having Score and Status, there would be something like Score1 and
> Status1 to show the different info in Dataset #2. I think AGGREGATE
> will address the first issue of collapsing the data but I'm puzzled on
> how to address the second issue.
>
>
> Elle
>
>
>
> *DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE
> STATUS
> ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
> B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1 RO1ILWSQ#TD8BT
> 41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2
>
>
>
>
>
>
>
> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf Of *Gene
> Maguin
> *Sent:* Monday, October 17, 2011 3:11 AM
> *To:* SPSSX-L@.UGA
> *Subject:* Re: ? Combine multiple data sets having same headers but
> different data and varying subject records
>
>
>
> Elle,
>
> What, precisely, does this this mean . . . . the data within the
> columns is not always the same. What it means to me is that var1 in
> some datasets is, for example, height and in other data sets it is
> weight. So, the ordinary thing to do would be to rename columns as
> needed so that all records in each column contained the same data. Why not do that here?
>
> Ignoring that, what is the issue with varying numbers of records by
> person?
> Are records duplicated across the datasets and you want to build a
> final data set with unduplicated records?
>
> Gene Maguin
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data-sets-having-same-headers-but-different-data-and-varying-subject-record-tp4911745p4911943.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: ? Combine multiple data sets having same headers but different data and varying subject record

Administrator

Please note that my posted solution will drop info if there are multiple occurrences of the same ID within any of the 5 files. Such can be resolved but requires additional logic (hint: ADD/SORT/LAG/CASESTOVARS).

elle lists wrote

Gene, thanks very much for the explanation. Will check the manual for more on "aggregate". ;) Thank you, too, David Marso. Greatly appreciate your and Gene's help.

Best,

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Tuesday, October 18, 2011 3:21 AM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

Elle,

No, you don't need to because (read up on the aggregate command) the aggregate command first sorts the file by the break variable and then writes a new record for each group of records with the same id value.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of elle
Sent: Tuesday, October 18, 2011 7:25 AM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

Hi David,

Thanks very much for the syntax. I'm still a way from being syntax-literate so my apologies in advance if I'm not comprehending the syntax. It seems (I think) that the syntax taking SCORE and STATUS from each data set to create a final data set that contains SCORE1 to SCORE5 and STATUS1 to STATUS5. If so, then I'm thinking I should also include the corresponding subject ID variable from the 5 data sets as well.

Thank you for your example as it's giving me a better sense of how to proceed.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, October 17, 2011 1:50 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different data and varying subject record

ADD FILES
/ FILE=FILE1 / IN=IN1
/ FILE=FILE2 / IN=IN2
/ FILE=FILE3 / IN=IN3
/ FILE=FILE4 / IN=IN4
/ FILE=FILE5 / IN=IN5.
COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
VECTOR SCORE(5) STATUS(5).
COMPUTE SCORE(RNUM)=SCORE.
COMPUTE STATUS(RNUM)=STATUS.
AGGREGATE OUTFILE *
/ BREAK ID
/ SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1 TO STATUS5).

elle lists wrote:
>
> Hi Gene,
>
> Thanks so much for your help. Here's an example which may help
> illustrate the problem. The two data sets have the same variable names
> with corresponding info on Score and Status. However, in Dataset #2,
> three records sharing the same IDs as shown in Dataset #1 have
> different score and status info.
>
> I'd like to build one data file from the 5 separate files where: 1)
> there's one record per row, but which 2) also captures instances
> where the data within variables differ between data sets. So besides
> having Score and Status, there would be something like Score1 and
> Status1 to show the different info in Dataset #2. I think AGGREGATE
> will address the first issue of collapsing the data but I'm puzzled on
> how to address the second issue.
>
>
> Elle
>
>
>
> *DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE
> STATUS
> ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
> B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1 RO1ILWSQ#TD8BT
> 41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2
>
>
>
>
>
>
>
> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf Of *Gene
> Maguin
> *Sent:* Monday, October 17, 2011 3:11 AM
> *To:* SPSSX-L@.UGA
> *Subject:* Re: ? Combine multiple data sets having same headers but
> different data and varying subject records
>
>
>
> Elle,
>
> What, precisely, does this this mean . . . . the data within the
> columns is not always the same. What it means to me is that var1 in
> some datasets is, for example, height and in other data sets it is
> weight. So, the ordinary thing to do would be to rename columns as
> needed so that all records in each column contained the same data. Why not do that here?
>
> Ignoring that, what is the issue with varying numbers of records by
> person?
> Are records duplicated across the datasets and you want to build a
> final data set with unduplicated records?
>
> Gene Maguin
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data-sets-having-same-headers-but-different-data-and-varying-subject-record-tp4911745p4911943.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

elle lists

Re: ? Combine multiple data sets having same headers but different data and varying subject record

In reply to this post by elle lists

There are many occurrences of same IDs (i.e., multiple records for the same
person(s)) occurring throughout the 5 files so thanks very much for the
warning and greatly appreciate the hint.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Tuesday, October 18, 2011 12:37 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different
data and varying subject record

Please note that my posted solution will drop info if there are multiple
occurrences of the same ID within any of the 5 files. Such can be resolved
but requires additional logic (hint: ADD/SORT/LAG/CASESTOVARS).

elle lists wrote:

>
> Gene, thanks very much for the explanation. Will check the manual for more
> on "aggregate". ;) Thank you, too, David Marso. Greatly appreciate your
> and Gene's help.
>
> Best,
>
> Elle
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of Gene
> Maguin
> Sent: Tuesday, October 18, 2011 3:21 AM
> To: SPSSX-L@.UGA
> Subject: Re: ? Combine multiple data sets having same headers but
> different data and varying subject record
>
> Elle,
>
> No, you don't need to because (read up on the aggregate command) the
> aggregate command first sorts the file by the break variable and then
> writes a new record for each group of records with the same id value.
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of elle
> Sent: Tuesday, October 18, 2011 7:25 AM
> To: SPSSX-L@.UGA
> Subject: Re: ? Combine multiple data sets having same headers but
> different data and varying subject record
>
> Hi David,
>
> Thanks very much for the syntax. I'm still a way from being
> syntax-literate so my apologies in advance if I'm not comprehending
> the syntax. It seems (I think) that the syntax taking SCORE and
> STATUS from each data set to create a final data set that contains
> SCORE1 to SCORE5 and STATUS1 to STATUS5. If so, then I'm thinking I
> should also include the corresponding subject ID variable from the 5 data

sets as well.

>
> Thank you for your example as it's giving me a better sense of how to
> proceed.
>
> Elle
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of David
> Marso
> Sent: Monday, October 17, 2011 1:50 PM
> To: SPSSX-L@.UGA
> Subject: Re: ? Combine multiple data sets having same headers but
> different data and varying subject record
>
> ADD FILES
> / FILE=FILE1 / IN=IN1
> / FILE=FILE2 / IN=IN2
> / FILE=FILE3 / IN=IN3
> / FILE=FILE4 / IN=IN4
> / FILE=FILE5 / IN=IN5.
> COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
> VECTOR SCORE(5) STATUS(5).
> COMPUTE SCORE(RNUM)=SCORE.
> COMPUTE STATUS(RNUM)=STATUS.
> AGGREGATE OUTFILE *
> / BREAK ID
> / SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1
> TO STATUS5).
>
>
> elle lists wrote:
>>
>> Hi Gene,
>>
>> Thanks so much for your help. Here's an example which may help
>> illustrate the problem. The two data sets have the same variable
>> names with corresponding info on Score and Status. However, in
>> Dataset #2, three records sharing the same IDs as shown in Dataset #1
>> have different score and status info.
>>
>> I'd like to build one data file from the 5 separate files where: 1)
>> there's one record per row, but which 2) also captures instances
>> where the data within variables differ between data sets. So besides
>> having Score and Status, there would be something like Score1 and
>> Status1 to show the different info in Dataset #2. I think AGGREGATE
>> will address the first issue of collapsing the data but I'm puzzled
>> on how to address the second issue.
>>
>>
>> Elle
>>
>>
>>
>> *DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE
>> STATUS
>> ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
>> B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1
>> RO1ILWSQ#TD8BT
>> 41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2
>>
>>
>>
>>
>>
>>
>>
>> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf Of *Gene
>> Maguin
>> *Sent:* Monday, October 17, 2011 3:11 AM
>> *To:* SPSSX-L@.UGA
>> *Subject:* Re: ? Combine multiple data sets having same headers but
>> different data and varying subject records
>>
>>
>>
>> Elle,
>>
>> What, precisely, does this this mean . . . . the data within the
>> columns is not always the same. What it means to me is that var1 in
>> some datasets is, for example, height and in other data sets it is
>> weight. So, the ordinary thing to do would be to rename columns as
>> needed so that all records in each column contained the same data.
>> Why not do that here?
>>
>> Ignoring that, what is the issue with varying numbers of records by
>> person?
>> Are records duplicated across the datasets and you want to build a
>> final data set with unduplicated records?
>>
>> Gene Maguin
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data
> -sets-having-same-headers-but-different-data-and-varying-subject-recor
> d-tp4911745p4911943.html Sent from the SPSSX Discussion mailing list
> archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>

Maguin, Eugene

Re: ? Combine multiple data sets having same headers but different data and varying subject record

In reply to this post by elle lists

Elle,

I assume you have a version of spss with the casestovars command in it. Read
the documentation in the syntax reference on this command. You can set up
the command sequence in two different ways and the way you do it will give
you a different name structure on the variables. One way is this

Add files ....
Sort cases by id.
Casestovars id=id.

The result is variables named like this:
var1.1 var1.2 var1.3 ... var1.7, where the value for var1.3 may be from file
2 for one case and from file 1 on another case.

Another way is this.

Get file1
Sort cases by id.
Casestovars id=id.
Save file1a <<< note new name to preserve source file.

Repeat for file2, file3, file4, file5. Then,

Add files file1a, file2a ... file5a.
Sort cases by id.
Casestovars id=id.

The result is variables named like this:
var1.1.1 var1.1.2 var1.2.1 var1.2.2 var1.3.1 var1.3.2 ... In other words,
var1.x.y
where the '.y' refers to the file number the value came from, and the
'.x' refers to the value's order in the '.y' file

Casestovars is more direct but David's syntax can be modified to do the same
thing.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
elle
Sent: Tuesday, October 18, 2011 9:10 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different
data and varying subject record

There are many occurrences of same IDs (i.e., multiple records for the same
person(s)) occurring throughout the 5 files so thanks very much for the
warning and greatly appreciate the hint.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Tuesday, October 18, 2011 12:37 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different
data and varying subject record

Please note that my posted solution will drop info if there are multiple
occurrences of the same ID within any of the 5 files. Such can be resolved
but requires additional logic (hint: ADD/SORT/LAG/CASESTOVARS).

elle lists wrote:

sets as well.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data-sets-
having-same-headers-but-different-data-and-varying-subject-record-tp4911745p
4915728.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: ? Combine multiple data sets having same headers but different data and varying subject record

Administrator

Questions which plague me are:
How is it that these disparate values arise?
Is there a temporal structure to their occurrence?
Do you wish to retain *ALL* occurrences or just the unique values?
How many actual columns are there in the data? C2Vs will possibly generate *MANY* new data columns.
What analyses do you wish to perform?
Often, data in a "wide" format are a general PIA to analyze in comparison to "long" format.
So, spelling out the middle/long term goal state might be helpful in determining best path forward.
HTH, David
--

Gene Maguin wrote

Elle,

I assume you have a version of spss with the casestovars command in it. Read
the documentation in the syntax reference on this command. You can set up
the command sequence in two different ways and the way you do it will give
you a different name structure on the variables. One way is this

Add files ....
Sort cases by id.
Casestovars id=id.

The result is variables named like this:
var1.1 var1.2 var1.3 ... var1.7, where the value for var1.3 may be from file
2 for one case and from file 1 on another case.

Another way is this.

Get file1
Sort cases by id.
Casestovars id=id.
Save file1a <<< note new name to preserve source file.

Repeat for file2, file3, file4, file5. Then,

Add files file1a, file2a ... file5a.
Sort cases by id.
Casestovars id=id.

The result is variables named like this:
var1.1.1 var1.1.2 var1.2.1 var1.2.2 var1.3.1 var1.3.2 ... In other words,
var1.x.y
where the '.y' refers to the file number the value came from, and the
'.x' refers to the value's order in the '.y' file

Casestovars is more direct but David's syntax can be modified to do the same
thing.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
elle
Sent: Tuesday, October 18, 2011 9:10 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different
data and varying subject record

There are many occurrences of same IDs (i.e., multiple records for the same
person(s)) occurring throughout the 5 files so thanks very much for the
warning and greatly appreciate the hint.

Elle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Tuesday, October 18, 2011 12:37 PM
To: [hidden email]
Subject: Re: ? Combine multiple data sets having same headers but different
data and varying subject record

Please note that my posted solution will drop info if there are multiple
occurrences of the same ID within any of the 5 files. Such can be resolved
but requires additional logic (hint: ADD/SORT/LAG/CASESTOVARS).

elle lists wrote:
>
> Gene, thanks very much for the explanation. Will check the manual for more
> on "aggregate". ;) Thank you, too, David Marso. Greatly appreciate your
> and Gene's help.
>
> Best,
>
> Elle
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of Gene
> Maguin
> Sent: Tuesday, October 18, 2011 3:21 AM
> To: SPSSX-L@.UGA
> Subject: Re: ? Combine multiple data sets having same headers but
> different data and varying subject record
>
> Elle,
>
> No, you don't need to because (read up on the aggregate command) the
> aggregate command first sorts the file by the break variable and then
> writes a new record for each group of records with the same id value.
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of elle
> Sent: Tuesday, October 18, 2011 7:25 AM
> To: SPSSX-L@.UGA
> Subject: Re: ? Combine multiple data sets having same headers but
> different data and varying subject record
>
> Hi David,
>
> Thanks very much for the syntax. I'm still a way from being
> syntax-literate so my apologies in advance if I'm not comprehending
> the syntax. It seems (I think) that the syntax taking SCORE and
> STATUS from each data set to create a final data set that contains
> SCORE1 to SCORE5 and STATUS1 to STATUS5. If so, then I'm thinking I
> should also include the corresponding subject ID variable from the 5 data
sets as well.
>
> Thank you for your example as it's giving me a better sense of how to
> proceed.
>
> Elle
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of David
> Marso
> Sent: Monday, October 17, 2011 1:50 PM
> To: SPSSX-L@.UGA
> Subject: Re: ? Combine multiple data sets having same headers but
> different data and varying subject record
>
> ADD FILES
> / FILE=FILE1 / IN=IN1
> / FILE=FILE2 / IN=IN2
> / FILE=FILE3 / IN=IN3
> / FILE=FILE4 / IN=IN4
> / FILE=FILE5 / IN=IN5.
> COMPUTE RNUM=IN1*1 + IN2*2 + IN3*3 + IN4*4 + IN5*5.
> VECTOR SCORE(5) STATUS(5).
> COMPUTE SCORE(RNUM)=SCORE.
> COMPUTE STATUS(RNUM)=STATUS.
> AGGREGATE OUTFILE *
> / BREAK ID
> / SCORE1 TO SCORE5 STATUS1 TO STATUS5=MAX(SCORE1 TO SCORE5 STATUS1
> TO STATUS5).
>
>
> elle lists wrote:
>>
>> Hi Gene,
>>
>> Thanks so much for your help. Here's an example which may help
>> illustrate the problem. The two data sets have the same variable
>> names with corresponding info on Score and Status. However, in
>> Dataset #2, three records sharing the same IDs as shown in Dataset #1
>> have different score and status info.
>>
>> I'd like to build one data file from the 5 separate files where: 1)
>> there's one record per row, but which 2) also captures instances
>> where the data within variables differ between data sets. So besides
>> having Score and Status, there would be something like Score1 and
>> Status1 to show the different info in Dataset #2. I think AGGREGATE
>> will address the first issue of collapsing the data but I'm puzzled
>> on how to address the second issue.
>>
>>
>> Elle
>>
>>
>>
>> *DATA SET #1* *DATA SET #2* ID SCORE STATUS ID SCORE
>> STATUS
>> ZW5KLREQ1O1EET 46 1 ZW5KLREQ1O1EET 52 2 B8484_EWUO#IK 62 1
>> B8484_EWUO#IK 63 2 B8484_EWUO#IK 62 1 X1#27WB1FGKN 44 1
>> RO1ILWSQ#TD8BT
>> 41 1 23_BILPW0F@QBT 77 3 RO1ILWSQ#TD8BT 41 1 RO1ILWSQ#TD8BT 44 2
>>
>>
>>
>>
>>
>>
>>
>> *From:* SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] *On Behalf Of *Gene
>> Maguin
>> *Sent:* Monday, October 17, 2011 3:11 AM
>> *To:* SPSSX-L@.UGA
>> *Subject:* Re: ? Combine multiple data sets having same headers but
>> different data and varying subject records
>>
>>
>>
>> Elle,
>>
>> What, precisely, does this this mean . . . . the data within the
>> columns is not always the same. What it means to me is that var1 in
>> some datasets is, for example, height and in other data sets it is
>> weight. So, the ordinary thing to do would be to rename columns as
>> needed so that all records in each column contained the same data.
>> Why not do that here?
>>
>> Ignoring that, what is the issue with varying numbers of records by
>> person?
>> Are records duplicated across the datasets and you want to build a
>> final data set with unduplicated records?
>>
>> Gene Maguin
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data
> -sets-having-same-headers-but-different-data-and-varying-subject-recor
> d-tp4911745p4911943.html Sent from the SPSSX Discussion mailing list
> archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Re-Combine-multiple-data-sets-
having-same-headers-but-different-data-and-varying-subject-record-tp4911745p
4915728.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD