SPSSX Discussion

Merging Dissimilar Files

Classic

List

Threaded

4 messages Options

Matthew Boswell

Merging Dissimilar Files

First, a late thank you to all who had suggestions about my last query.

Second, we are trying to combine two datasets, each having longitudinal
data. Each has a unique set of variables to contribute and each covers a
slightly different span of years.

DATASET A
STATE YEAR AGE DATE
ALA 1980 20 5
ALA 1981 17 7
DEL 1980 23 7
DEL 1981 22 8

DATASET B
STATE YEAR HEIGHT WEIGHT
ALA 1978 175 225
ALA 1979 200 250
ALA 1980 225 200
DEL 1978 150 150
DEL 1979 175 190
DEL 1980 180 160

I like how in Raynald's book it says (regarding merging two or more files)
"Now that we've had a good laugh over the liklihood that all of the files
have been the exact same structure with the exact name variables...." (p.
92)....Its nice to know that I am not alone!!

We would like the final dataset to have all the cases and all the
vars,looking like:

state year age date height weight

ALA 1978 175 160
ALA 1979 200 250
ALA 1980 20 5 225 200
ALA 1981 17 7
etc.

not sure if this is an ADD or a MATCH files, Hector had some MATCH
suggestions in the archives that I tried, but nothing worked; how can I
match without having a unique identifier- all this stuff is longitudinal, is
there any way to have a BY STATE YEAR type command in there?
Thanks!

Matt B.

Raynald Levesque

Re: Merging Dissimilar Files

Hi Matthew,

The following assumes
- there is at most 1 record per file for a given state,year
- files are sorted by state year.

DATASET CLOSE ALL.
DATA LIST LIST /state(A3) year(F4) age(F3) date(F4).
BEGIN DATA
ALA 1980 20 5
ALA 1981 17 7
DEL 1980 23 7
DEL 1981 22 8
END DATA.
DATASET NAME A.

DATA LIST LIST /state(A3) year(F4) height(F3) weight(F3).
BEGIN DATA
ALA 1978 175 225
ALA 1979 200 250
ALA 1980 225 200
DEL 1978 150 150
DEL 1979 175 190
DEL 1980 180 160
END DATA.
DATASET NAME B.

ADD FILES FILE=A
/FILE=B
/BY=state year.
AGGREGATE OUTFILE=*
/PRE
/BREAK=state year
/age date height weight=FIRST(age date height weight).
LIST.

* Result is:

state year age date height weight

ALA 1978 . . 175 225
ALA 1979 . . 200 250
ALA 1980 20 5 225 200
ALA 1981 17 7 . .
DEL 1978 . . 150 150
DEL 1979 . . 175 190
DEL 1980 23 7 180 160
DEL 1981 22 8 . .

Number of cases read: 8 Number of cases listed: 8

Regards

Raynald Levesque
www.spsstools.net

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Matthew Boswell
Sent: December 30, 2006 2:57 AM
To: [hidden email]
Subject: Merging Dissimilar Files

First, a late thank you to all who had suggestions about my last query.

Second, we are trying to combine two datasets, each having longitudinal
data. Each has a unique set of variables to contribute and each covers a
slightly different span of years.

DATASET A
STATE YEAR AGE DATE
ALA 1980 20 5
ALA 1981 17 7
DEL 1980 23 7
DEL 1981 22 8

DATASET B
STATE YEAR HEIGHT WEIGHT
ALA 1978 175 225
ALA 1979 200 250
ALA 1980 225 200
DEL 1978 150 150
DEL 1979 175 190
DEL 1980 180 160

I like how in Raynald's book it says (regarding merging two or more files)
"Now that we've had a good laugh over the liklihood that all of the files
have been the exact same structure with the exact name variables...." (p.
92)....Its nice to know that I am not alone!!

We would like the final dataset to have all the cases and all the
vars,looking like:

state year age date height weight

ALA 1978 175 160
ALA 1979 200 250
ALA 1980 20 5 225 200
ALA 1981 17 7
etc.

not sure if this is an ADD or a MATCH files, Hector had some MATCH
suggestions in the archives that I tried, but nothing worked; how can I
match without having a unique identifier- all this stuff is longitudinal, is
there any way to have a BY STATE YEAR type command in there?
Thanks!

Matt B.

Matthew Boswell

Re: Merging Dissimilar Files

In reply to this post by Matthew Boswell

Thanks for the syntax, it worked like a charm- the only modifications were
to specify the .sav files and then indicate the proper outfile (a new data
set).

Checking the command syntax for these new commands, I think I can see a
little bit of the difference between ADD/MATCH/AGGREGATE; here I am clearly
aggregating cases, based on the fact that I am adding both new cases and new
variables to a new dataset.

BREAK was one of those commands that I would constantly see come up in other
people's syntax, here on the list, but could never piece together
intuitively from reading the syntax. Basically, this was the command I was
looking for in the sense that I had two "keys" to identify cases (apologies
to database people out there)? I can also sort the BREAK vars in a variety
of ways, VAR1 (A) VAR2 (D) etc??

Now I get to write the real syntax for the real datasets in question- many
more cases and vars. Half the fun in asking these questions is making up
these hypothetical datasets!

Thanks again
Matt B.

Richard Ristow

Re: Merging Dissimilar Files

In reply to this post by Raynald Levesque

At 12:25 PM 12/30/2006, Raynald Levesque wrote:

>The following assumes
>- there is at most 1 record per file for a given state,year
>- files are sorted by state year.

[test data omitted]

>ADD FILES FILE=A
> /FILE=B
> /BY=state year.
>AGGREGATE OUTFILE=*
> /PRE
> /BREAK=state year
> /age date height weight=FIRST(age date height weight).
>LIST.

MATCH FILES gives a working alternative syntax, unless I'm missing
something. It makes the same assumptions listed above. I think it's
simpler; it doesn't require a list of all non-key variables in the two
files. Also, though I'm not checking, I don't think the FIRST function
to AGGREGATE will work as desired for string variables without blank
declared as missing - and that includes all long strings. (However,
replacing FIRST by MAX will ordinarily work, as almost all non-blank
string values compare as higher than blank strings.)

The following is SPSS draft output, using the test data from Raynald's
post. (To self: syntax & output files not kept separately.)

MATCH FILES
/FILE=A
/FILE=B
/BY STATE YEAR.

LIST.

List
|--------------------------|------------------------|
|Output Created |31-DEC-2006 01:17:21 |
|--------------------------|------------------------|
state year age date height weight

ALA 1978 . . 175 225
ALA 1979 . . 200 250
ALA 1980 20 5 225 200
ALA 1981 17 7 . .
DEL 1978 . . 150 150
DEL 1979 . . 175 190
DEL 1980 23 7 180 160
DEL 1981 22 8 . .

Number of cases read: 8 Number of cases listed: 8

From Raynald's posting, for comparison:

>* Result is:
>
>state year age date height weight
>
>ALA 1978 . . 175 225
>ALA 1979 . . 200 250
>ALA 1980 20 5 225 200
>ALA 1981 17 7 . .
>DEL 1978 . . 150 150
>DEL 1979 . . 175 190
>DEL 1980 23 7 180 160
>DEL 1981 22 8 . .