SPSSX Discussion

Merge Files

Classic

List

Threaded

3 messages Options

Jignesh Sutar

Merge Files

Dear list,

I'm in need of a helping hand,

I have 2 data sets in the following format

Data1: v1
Resp1 1
Resp2 1
Resp3 1
Resp4 1
Resp5 1

Resp6 2
Resp7 2
Resp8 2
Resp9 2

Data2: v1
RResp1 1
RResp2 1
RResp3 1

RResp4 2
RResp5 2

I am in need to restructure and merge these two datasets so they are in the
following structure,

Resp1_RResp1
Resp1_RResp2
Resp1_RResp3
Resp2_RResp1
Resp2_RResp2
Resp2_RResp3
Resp3_RResp1
Resp3_RResp2
Resp3_RResp3

Resp6_RResp4
Resp6_RResp5
Resp7_RResp4
Resp7_RResp5
Resp8_RResp4
Resp8_RResp5
Resp9_RResp4
Resp9_RResp5

Any suggestions on how to do this effeceintly would be most welcomed,

Many thanks in advance.

Jigs

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Melissa Ives

Re: Merge Files

Seems like you could merge them with a rename statement for each file
and with a \keep= subcommand to put the variables in the order you
desire.

Something like this, but you'll need to fill in your own :
Match files
file=file1/rename (resp1 to resp5=Resp11 <etc. using the names
you want to use>)/
file=file2/rename (resp1 to resp5=Resp21 <etc. using the names
you want to use>)/
keep=<list variables in the order you want>/
by ?? (if you have/need a by variable).

Resp11 means file1 response 1.
Resp21 means file2 response 1.

It looks like you want to combine the responses so that the Resp1_RResp1
would be 11?

If so, does this do what you want?
Compute Resp1_RResp1=(resp11*10)+resp21.
Compute Resp1_RResp2=(resp11*10)+resp22.

Then you'd just continue this format through whatever combinations you
want such as:
Compute Resp6_RResp4=(resp16*10)+resp24.
Compute Resp6_RResp5=(resp16*10)+resp25.
Compute Resp7_RResp4=(resp17*10)+resp24.
Compute Resp7_RResp5=(resp17*10)+resp25.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
J Sutar
Sent: Monday, December 03, 2007 8:45 AM
To: [hidden email]
Subject: [SPSSX-L] Merge Files

Dear list,

I'm in need of a helping hand,

I have 2 data sets in the following format

Data1: v1
Resp1 1
Resp2 1
Resp3 1
Resp4 1
Resp5 1

Resp6 2
Resp7 2
Resp8 2
Resp9 2

Data2: v1
RResp1 1
RResp2 1
RResp3 1

RResp4 2
RResp5 2

I am in need to restructure and merge these two datasets so they are in
the following structure,

Resp1_RResp1
Resp1_RResp2
Resp1_RResp3
Resp2_RResp1
Resp2_RResp2
Resp2_RResp3
Resp3_RResp1
Resp3_RResp2
Resp3_RResp3

Resp6_RResp4
Resp6_RResp5
Resp7_RResp4
Resp7_RResp5
Resp8_RResp4
Resp8_RResp5
Resp9_RResp4
Resp9_RResp5

Any suggestions on how to do this effeceintly would be most welcomed,

Many thanks in advance.

Jigs

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: [Many-to-many] Merge Files

In reply to this post by Jignesh Sutar

At 09:44 AM 12/3/2007, J Sutar wrote:

>I have 2 data sets in the following format

|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:32 |
|-----------------------------|---------------------------|
[Data1]
v1 Data1

1 Resp1
1 Resp2
1 Resp3
1 Resp4
1 Resp5
2 Resp6
2 Resp7
2 Resp8
2 Resp9

Number of cases read: 9 Number of cases listed: 9

|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:32 |
|-----------------------------|---------------------------|
[Data2]
v1 Data2

1 RResp1
1 RResp2
1 RResp3
2 RResp4
2 RResp5

Number of cases read: 5 Number of cases listed: 5

>I need to restructure and merge these two datasets so they are in
>the following structure,
>
>Resp1_RResp1
>Resp1_RResp2
>Resp1_RResp3
>Resp2_RResp1
>Resp2_RResp2
[...]

This is a "many-to-many" merge: every record in Data1 joined with
every record in Data2 with the same value of the grouping variable, v1.

Probably the simplest way is the "long-wide-long" method. In this
method, CASESTOVARS converts one of the datasets to 'wide' form, with
one record per value of v1. That makes the merge many-to-one, which
can be done with MATCH FILES with the /TABLE subcommand. Finally,
VARSTOCASES converts the wide-form data back to long form, completing
the merge.

The VARSTOCASES must be written taking into account both
. The list of variables, in the dataset that's converted to wide form
. The maximum number of cases in a group, in the same dataset.
That's an interesting Python exercise; I haven't done it yet.

With that reservation, here's the solution. It's SPSS 14 draft
output, and includes listings of the intermediate files in the
process. At the end, variables Resp1 and Resp2 are still separate
string variables, but it's easy to concatenate them to the form you requested.

* ..... Convert one of the files (I'm using Data2) to 'wide' form .
DATASET ACTIVATE Data2.
DATASET COPY Wide2.
DATASET ACTIVATE Wide2 WINDOW=FRONT.
CASESTOVARS
/ID = v1
/GROUPBY = VARIABLE .

Cases to Variables
|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:36 |
|-----------------------------|---------------------------|
[Wide2]

Generated Variables
|---------|-------|
|Original |Result |
|Variabl |-------|
|e |Name |
|-------|-|-------|
|Data2 |1|Data2.1|
| |2|Data2.2|
| |3|Data2.3|
|-------|-|-------|

Processing Statistics
|---------------|---|
|Cases In |5 |
|Cases Out |2 |
|---------------|---|
|Cases In/Cases |2.5|
|Out | |
|---------------|---|
|Variables In |2 |
|Variables Out |4 |
|---------------|---|
|Index Values |3 |
|---------------|---|

. /**/ LIST /*-*/.
List
|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:37 |
|-----------------------------|---------------------------|
[Wide2]
v1 Data2.1 Data2.2 Data2.3

1 RResp1 RResp2 RResp3
2 RResp4 RResp5

Number of cases read: 2 Number of cases listed: 2

* ..... Now it's a "many-to-one" merge, since Data2, in ........ .
* ..... 'wide' form, has only one record per group (v1) ........ .

MATCH FILES
/FILE =Data1
/TABLE=Wide2
/BY v1.

. /**/ LIST /*-*/.
List
|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:40 |
|-----------------------------|---------------------------|
v1 Data1 Data2.1 Data2.2 Data2.3

1 Resp1 RResp1 RResp2 RResp3
1 Resp2 RResp1 RResp2 RResp3
1 Resp3 RResp1 RResp2 RResp3
1 Resp4 RResp1 RResp2 RResp3
1 Resp5 RResp1 RResp2 RResp3
2 Resp6 RResp4 RResp5
2 Resp7 RResp4 RResp5
2 Resp8 RResp4 RResp5
2 Resp9 RResp4 RResp5

Number of cases read: 9 Number of cases listed: 9

* ..... Convert the Data2 data back to long form, ........ .
* ..... completing the merge. ........ .
* ..... (This is the code that depends on the dictionary ........ .
* ..... and the data for Data2.) ........ .

VARSTOCASES
/MAKE Data2 FROM Data2.1 Data2.2 Data2.3
/KEEP = ALL
/NULL = DROP.

Variables to Cases
|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:40 |
|-----------------------------|---------------------------|
Generated Variables
|-----|------|
|Name |Label |
|-----|------|
|Data2|<none>|
|-----|------|

Processing Statistics
|-------------|-|
|Variables In |5|
|Variables Out|3|
|-------------|-|

LIST.
List
|-----------------------------|---------------------------|
|Output Created |11-DEC-2007 16:43:41 |
|-----------------------------|---------------------------|
v1 Data1 Data2

1 Resp1 RResp1
1 Resp1 RResp2
1 Resp1 RResp3
1 Resp2 RResp1
1 Resp2 RResp2
1 Resp2 RResp3
1 Resp3 RResp1
1 Resp3 RResp2
1 Resp3 RResp3
1 Resp4 RResp1
1 Resp4 RResp2
1 Resp4 RResp3
1 Resp5 RResp1
1 Resp5 RResp2
1 Resp5 RResp3
2 Resp6 RResp4
2 Resp6 RResp5
2 Resp7 RResp4
2 Resp7 RResp5
2 Resp8 RResp4
2 Resp8 RResp5
2 Resp9 RResp4
2 Resp9 RResp5

Number of cases read: 23 Number of cases listed: 23
===================
APPENDIX: Test data
===================
* ................................................................. .
* ................. Test data ................. .

DATA LIST LIST
/Data1 v1
(A8, F2).
BEGIN DATA
Resp1 1
Resp2 1
Resp3 1
Resp4 1
Resp5 1
Resp6 2
Resp7 2
Resp8 2
Resp9 2
END DATA.
ADD FILES /FILE=*/ KEEP=v1 ALL.
SORT CASES BY v1.
DATASET NAME Data1 WINDOW=FRONT.

DATA LIST LIST
/Data2 v1
(A8, F2).
BEGIN DATA
RResp1 1
RResp2 1
RResp3 1
RResp4 2
RResp5 2
END DATA.
ADD FILES /FILE=*/ KEEP=v1 ALL.
SORT CASES BY v1.
DATASET NAME Data2 WINDOW=FRONT.

* ................................................................. .
* ................. Show test data ................. .

DATASET ACTIVATE Data1 WINDOW=FRONT.
LIST.
DATASET ACTIVATE Data2 WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD