Merge Files

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Merge Files

Jignesh Sutar
Dear list,

I'm in need of a helping hand,

I have 2 data sets in the following format

Data1:    v1
Resp1    1
Resp2    1
Resp3    1
Resp4    1
Resp5    1

 Resp6    2
Resp7    2
Resp8    2
Resp9    2

Data2:     v1
RResp1   1
RResp2   1
RResp3   1

 RResp4   2
RResp5   2


I am in need to restructure and merge these two datasets so they are in the
following structure,

Resp1_RResp1
Resp1_RResp2
Resp1_RResp3
Resp2_RResp1
Resp2_RResp2
Resp2_RResp3
 Resp3_RResp1
Resp3_RResp2
Resp3_RResp3

Resp6_RResp4
Resp6_RResp5
Resp7_RResp4
Resp7_RResp5
Resp8_RResp4
Resp8_RResp5
Resp9_RResp4
Resp9_RResp5

Any suggestions on how to do this effeceintly would be most welcomed,

Many thanks in advance.

Jigs

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Merge Files

Melissa Ives
Seems like you could merge them with a rename statement for each file
and with a \keep= subcommand to put the variables in the order you
desire.

Something like this, but you'll need to fill in your own :
Match files
        file=file1/rename (resp1 to resp5=Resp11 <etc. using the names
you want to use>)/
        file=file2/rename (resp1 to resp5=Resp21 <etc. using the names
you want to use>)/
        keep=<list variables in the order you want>/
        by ?? (if you have/need a by variable).

Resp11 means file1 response 1.
Resp21 means file2 response 1.

It looks like you want to combine the responses so that the Resp1_RResp1
would be 11?

If so, does this do what you want?
Compute Resp1_RResp1=(resp11*10)+resp21.
Compute Resp1_RResp2=(resp11*10)+resp22.

Then you'd just continue this format through whatever combinations you
want such as:
Compute Resp6_RResp4=(resp16*10)+resp24.
Compute Resp6_RResp5=(resp16*10)+resp25.
Compute Resp7_RResp4=(resp17*10)+resp24.
Compute Resp7_RResp5=(resp17*10)+resp25.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
J Sutar
Sent: Monday, December 03, 2007 8:45 AM
To: [hidden email]
Subject: [SPSSX-L] Merge Files

Dear list,

I'm in need of a helping hand,

I have 2 data sets in the following format

Data1:    v1
Resp1    1
Resp2    1
Resp3    1
Resp4    1
Resp5    1

 Resp6    2
Resp7    2
Resp8    2
Resp9    2

Data2:     v1
RResp1   1
RResp2   1
RResp3   1

 RResp4   2
RResp5   2


I am in need to restructure and merge these two datasets so they are in
the following structure,

Resp1_RResp1
Resp1_RResp2
Resp1_RResp3
Resp2_RResp1
Resp2_RResp2
Resp2_RResp3
 Resp3_RResp1
Resp3_RResp2
Resp3_RResp3

Resp6_RResp4
Resp6_RResp5
Resp7_RResp4
Resp7_RResp5
Resp8_RResp4
Resp8_RResp5
Resp9_RResp4
Resp9_RResp5

Any suggestions on how to do this effeceintly would be most welcomed,

Many thanks in advance.

Jigs

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [Many-to-many] Merge Files

Richard Ristow
In reply to this post by Jignesh Sutar
At 09:44 AM 12/3/2007, J Sutar wrote:

>I have 2 data sets in the following format

|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:32       |
|-----------------------------|---------------------------|
[Data1]
v1 Data1

  1 Resp1
  1 Resp2
  1 Resp3
  1 Resp4
  1 Resp5
  2 Resp6
  2 Resp7
  2 Resp8
  2 Resp9

Number of cases read:  9    Number of cases listed:  9

|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:32       |
|-----------------------------|---------------------------|
[Data2]
v1 Data2

  1 RResp1
  1 RResp2
  1 RResp3
  2 RResp4
  2 RResp5

Number of cases read:  5    Number of cases listed:  5

>I need to restructure and merge these two datasets so they are in
>the following structure,
>
>Resp1_RResp1
>Resp1_RResp2
>Resp1_RResp3
>Resp2_RResp1
>Resp2_RResp2
[...]

This is a "many-to-many" merge: every record in Data1 joined with
every record in Data2 with the same value of the grouping variable, v1.

Probably the simplest way is the "long-wide-long" method. In this
method, CASESTOVARS converts one of the datasets to 'wide' form, with
one record per value of v1. That makes the merge many-to-one, which
can be done with MATCH FILES with the /TABLE subcommand. Finally,
VARSTOCASES converts the wide-form data back to long form, completing
the merge.

The VARSTOCASES must be written taking into account both
. The list of variables, in the dataset that's converted to wide form
. The maximum number of cases in a group, in the same dataset.
That's an interesting Python exercise; I haven't done it yet.

With that reservation, here's the solution. It's SPSS 14 draft
output, and includes listings of the intermediate files in the
process. At the end, variables Resp1 and Resp2 are still separate
string variables, but it's easy to concatenate them to the form you requested.

*  .....   Convert one of the files (I'm using Data2) to 'wide' form .
DATASET ACTIVATE Data2.
DATASET COPY     Wide2.
DATASET ACTIVATE Wide2 WINDOW=FRONT.
CASESTOVARS
   /ID = v1
   /GROUPBY = VARIABLE .

Cases to Variables
|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:36       |
|-----------------------------|---------------------------|
[Wide2]

Generated Variables
|---------|-------|
|Original |Result |
|Variabl  |-------|
|e        |Name   |
|-------|-|-------|
|Data2  |1|Data2.1|
|       |2|Data2.2|
|       |3|Data2.3|
|-------|-|-------|

Processing Statistics
|---------------|---|
|Cases In       |5  |
|Cases Out      |2  |
|---------------|---|
|Cases In/Cases |2.5|
|Out            |   |
|---------------|---|
|Variables In   |2  |
|Variables Out  |4  |
|---------------|---|
|Index Values   |3  |
|---------------|---|

.  /**/  LIST /*-*/.
List
|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:37       |
|-----------------------------|---------------------------|
[Wide2]
v1 Data2.1  Data2.2  Data2.3

  1 RResp1   RResp2   RResp3
  2 RResp4   RResp5

Number of cases read:  2    Number of cases listed:  2


*  .....   Now it's a "many-to-one" merge, since Data2, in  ........ .
*  .....   'wide' form, has only one record per group (v1)  ........ .

MATCH FILES
   /FILE =Data1
   /TABLE=Wide2
   /BY v1.

.  /**/  LIST /*-*/.
List
|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:40       |
|-----------------------------|---------------------------|
v1 Data1    Data2.1  Data2.2  Data2.3

  1 Resp1    RResp1   RResp2   RResp3
  1 Resp2    RResp1   RResp2   RResp3
  1 Resp3    RResp1   RResp2   RResp3
  1 Resp4    RResp1   RResp2   RResp3
  1 Resp5    RResp1   RResp2   RResp3
  2 Resp6    RResp4   RResp5
  2 Resp7    RResp4   RResp5
  2 Resp8    RResp4   RResp5
  2 Resp9    RResp4   RResp5

Number of cases read:  9    Number of cases listed:  9


*  .....   Convert the Data2 data back to long form,        ........ .
*  .....   completing the merge.                            ........ .
*  .....   (This is the code that depends on the dictionary ........ .
*  .....   and the data for Data2.)                         ........ .


VARSTOCASES
   /MAKE Data2 FROM Data2.1 Data2.2 Data2.3
   /KEEP = ALL
   /NULL = DROP.

Variables to Cases
|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:40       |
|-----------------------------|---------------------------|
Generated Variables
|-----|------|
|Name |Label |
|-----|------|
|Data2|<none>|
|-----|------|

Processing Statistics
|-------------|-|
|Variables In |5|
|Variables Out|3|
|-------------|-|

LIST.
List
|-----------------------------|---------------------------|
|Output Created               |11-DEC-2007 16:43:41       |
|-----------------------------|---------------------------|
v1 Data1    Data2

  1 Resp1    RResp1
  1 Resp1    RResp2
  1 Resp1    RResp3
  1 Resp2    RResp1
  1 Resp2    RResp2
  1 Resp2    RResp3
  1 Resp3    RResp1
  1 Resp3    RResp2
  1 Resp3    RResp3
  1 Resp4    RResp1
  1 Resp4    RResp2
  1 Resp4    RResp3
  1 Resp5    RResp1
  1 Resp5    RResp2
  1 Resp5    RResp3
  2 Resp6    RResp4
  2 Resp6    RResp5
  2 Resp7    RResp4
  2 Resp7    RResp5
  2 Resp8    RResp4
  2 Resp8    RResp5
  2 Resp9    RResp4
  2 Resp9    RResp5

Number of cases read:  23    Number of cases listed:  23
===================
APPENDIX: Test data
===================
*  ................................................................. .
*  .................   Test data                   ................. .

DATA LIST LIST
   /Data1    v1
   (A8,      F2).
BEGIN DATA
    Resp1    1
    Resp2    1
    Resp3    1
    Resp4    1
    Resp5    1
    Resp6    2
    Resp7    2
    Resp8    2
    Resp9    2
END DATA.
ADD  FILES /FILE=*/ KEEP=v1 ALL.
SORT CASES BY v1.
DATASET NAME     Data1 WINDOW=FRONT.


DATA LIST LIST
   /Data2    v1
   (A8,      F2).
BEGIN DATA
    RResp1   1
    RResp2   1
    RResp3   1
    RResp4   2
    RResp5   2
END DATA.
ADD  FILES /FILE=*/ KEEP=v1 ALL.
SORT CASES BY v1.
DATASET NAME     Data2 WINDOW=FRONT.

*  ................................................................. .
*  .................   Show test data              ................. .

DATASET ACTIVATE Data1 WINDOW=FRONT.
LIST.
DATASET ACTIVATE Data2 WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD