Merging Dissimilar Files

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging Dissimilar Files

Matthew Boswell
First, a late thank you to all who had suggestions about my last query.

Second, we are trying to combine two datasets, each having longitudinal
data.  Each has a unique set of variables to contribute and each covers a
slightly different span of years.

DATASET A
STATE    YEAR    AGE    DATE
ALA      1980    20      5
ALA      1981    17      7
DEL      1980    23      7
DEL      1981    22      8


DATASET B
STATE    YEAR    HEIGHT   WEIGHT
ALA      1978     175      225
ALA      1979     200      250
ALA      1980     225      200
DEL      1978     150      150
DEL      1979     175      190
DEL      1980     180      160

  I like how in Raynald's book it says (regarding merging two or more files)
"Now that we've had a good laugh over the liklihood that all of the files
have been the exact same structure with the exact name variables...."  (p.
92)....Its nice to know that I am not alone!!

We would like the final dataset to have all the cases and all the
vars,looking like:

state  year   age   date    height  weight

ALA    1978                 175     160
ALA    1979                 200     250
ALA    1980   20     5      225     200
ALA    1981   17     7
etc.


  not sure if this is an ADD or a MATCH files, Hector had some MATCH
suggestions in the archives that I tried, but nothing worked; how can I
match without having a unique identifier- all this stuff is longitudinal, is
there any way to have a BY STATE YEAR type command in there?
Thanks!

Matt B.
Reply | Threaded
Open this post in threaded view
|

Re: Merging Dissimilar Files

Raynald Levesque
Hi Matthew,

The following assumes
- there is at most 1 record per file for a given state,year
- files are sorted by state year.

DATASET CLOSE ALL.
DATA LIST LIST /state(A3) year(F4) age(F3) date(F4).
BEGIN DATA
ALA      1980    20      5
ALA      1981    17      7
DEL      1980    23      7
DEL      1981    22      8
END DATA.
DATASET NAME A.

DATA LIST LIST /state(A3) year(F4) height(F3) weight(F3).
BEGIN DATA
ALA      1978     175      225
ALA      1979     200      250
ALA      1980     225      200
DEL      1978     150      150
DEL      1979     175      190
DEL      1980     180      160
END DATA.
DATASET NAME B.

ADD FILES FILE=A
        /FILE=B
        /BY=state year.
AGGREGATE OUTFILE=*
        /PRE
        /BREAK=state year
        /age date height weight=FIRST(age date height weight).
LIST.

* Result is:


state year age date height weight

ALA   1978   .    .   175    225
ALA   1979   .    .   200    250
ALA   1980  20    5   225    200
ALA   1981  17    7     .      .
DEL   1978   .    .   150    150
DEL   1979   .    .   175    190
DEL   1980  23    7   180    160
DEL   1981  22    8     .      .


Number of cases read:  8    Number of cases listed:  8

Regards

Raynald Levesque
www.spsstools.net



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Matthew Boswell
Sent: December 30, 2006 2:57 AM
To: [hidden email]
Subject: Merging Dissimilar Files

First, a late thank you to all who had suggestions about my last query.

Second, we are trying to combine two datasets, each having longitudinal
data.  Each has a unique set of variables to contribute and each covers a
slightly different span of years.

DATASET A
STATE    YEAR    AGE    DATE
ALA      1980    20      5
ALA      1981    17      7
DEL      1980    23      7
DEL      1981    22      8


DATASET B
STATE    YEAR    HEIGHT   WEIGHT
ALA      1978     175      225
ALA      1979     200      250
ALA      1980     225      200
DEL      1978     150      150
DEL      1979     175      190
DEL      1980     180      160

  I like how in Raynald's book it says (regarding merging two or more files)
"Now that we've had a good laugh over the liklihood that all of the files
have been the exact same structure with the exact name variables...."  (p.
92)....Its nice to know that I am not alone!!

We would like the final dataset to have all the cases and all the
vars,looking like:

state  year   age   date    height  weight

ALA    1978                 175     160
ALA    1979                 200     250
ALA    1980   20     5      225     200
ALA    1981   17     7
etc.


  not sure if this is an ADD or a MATCH files, Hector had some MATCH
suggestions in the archives that I tried, but nothing worked; how can I
match without having a unique identifier- all this stuff is longitudinal, is
there any way to have a BY STATE YEAR type command in there?
Thanks!

Matt B.
Reply | Threaded
Open this post in threaded view
|

Re: Merging Dissimilar Files

Matthew Boswell
In reply to this post by Matthew Boswell
Thanks for the syntax, it worked like a charm- the only modifications were
to specify the .sav files and then indicate the proper outfile (a new data
set).


Checking the command syntax for these new commands, I think I can see a
little bit of the difference between ADD/MATCH/AGGREGATE; here I am clearly
aggregating cases, based on the fact that I am adding both new cases and new
variables to a new dataset.

BREAK was one of those commands that I would constantly see come up in other
people's syntax, here on the list, but could never piece together
intuitively from reading the syntax.  Basically, this was the command I was
looking for in the sense that I had two "keys" to identify cases (apologies
to database people out there)?  I can also sort the BREAK vars in a variety
of ways, VAR1 (A) VAR2 (D) etc??

Now I get to write the real syntax for the real datasets in question- many
more cases and vars.  Half the fun in asking these questions is making up
these hypothetical datasets!

Thanks again
Matt B.
Reply | Threaded
Open this post in threaded view
|

Re: Merging Dissimilar Files

Richard Ristow
In reply to this post by Raynald Levesque
At 12:25 PM 12/30/2006, Raynald Levesque wrote:

>The following assumes
>- there is at most 1 record per file for a given state,year
>- files are sorted by state year.

[test data omitted]

>ADD FILES FILE=A
>         /FILE=B
>         /BY=state year.
>AGGREGATE OUTFILE=*
>         /PRE
>         /BREAK=state year
>         /age date height weight=FIRST(age date height weight).
>LIST.

MATCH FILES gives a working alternative syntax, unless I'm missing
something. It makes the same assumptions listed above. I think it's
simpler; it doesn't require a list of all non-key variables in the two
files. Also, though I'm not checking, I don't think the FIRST function
to AGGREGATE will work as desired for string variables without blank
declared as missing - and that includes all long strings. (However,
replacing FIRST by MAX will ordinarily work, as almost all non-blank
string values compare as higher than blank strings.)

The following is SPSS draft output, using the test data from Raynald's
post. (To self: syntax & output files not kept separately.)


MATCH FILES
      /FILE=A
      /FILE=B
      /BY STATE YEAR.

LIST.

List
|--------------------------|------------------------|
|Output Created            |31-DEC-2006 01:17:21    |
|--------------------------|------------------------|
state year age date height weight

ALA   1978   .    .   175    225
ALA   1979   .    .   200    250
ALA   1980  20    5   225    200
ALA   1981  17    7     .      .
DEL   1978   .    .   150    150
DEL   1979   .    .   175    190
DEL   1980  23    7   180    160
DEL   1981  22    8     .      .

Number of cases read:  8    Number of cases listed:  8


 From Raynald's posting, for comparison:

>* Result is:
>
>state year age date height weight
>
>ALA   1978   .    .   175    225
>ALA   1979   .    .   200    250
>ALA   1980  20    5   225    200
>ALA   1981  17    7     .      .
>DEL   1978   .    .   150    150
>DEL   1979   .    .   175    190
>DEL   1980  23    7   180    160
>DEL   1981  22    8     .      .