First, a late thank you to all who had suggestions about my last query.
Second, we are trying to combine two datasets, each having longitudinal data. Each has a unique set of variables to contribute and each covers a slightly different span of years. DATASET A STATE YEAR AGE DATE ALA 1980 20 5 ALA 1981 17 7 DEL 1980 23 7 DEL 1981 22 8 DATASET B STATE YEAR HEIGHT WEIGHT ALA 1978 175 225 ALA 1979 200 250 ALA 1980 225 200 DEL 1978 150 150 DEL 1979 175 190 DEL 1980 180 160 I like how in Raynald's book it says (regarding merging two or more files) "Now that we've had a good laugh over the liklihood that all of the files have been the exact same structure with the exact name variables...." (p. 92)....Its nice to know that I am not alone!! We would like the final dataset to have all the cases and all the vars,looking like: state year age date height weight ALA 1978 175 160 ALA 1979 200 250 ALA 1980 20 5 225 200 ALA 1981 17 7 etc. not sure if this is an ADD or a MATCH files, Hector had some MATCH suggestions in the archives that I tried, but nothing worked; how can I match without having a unique identifier- all this stuff is longitudinal, is there any way to have a BY STATE YEAR type command in there? Thanks! Matt B. |
Hi Matthew,
The following assumes - there is at most 1 record per file for a given state,year - files are sorted by state year. DATASET CLOSE ALL. DATA LIST LIST /state(A3) year(F4) age(F3) date(F4). BEGIN DATA ALA 1980 20 5 ALA 1981 17 7 DEL 1980 23 7 DEL 1981 22 8 END DATA. DATASET NAME A. DATA LIST LIST /state(A3) year(F4) height(F3) weight(F3). BEGIN DATA ALA 1978 175 225 ALA 1979 200 250 ALA 1980 225 200 DEL 1978 150 150 DEL 1979 175 190 DEL 1980 180 160 END DATA. DATASET NAME B. ADD FILES FILE=A /FILE=B /BY=state year. AGGREGATE OUTFILE=* /PRE /BREAK=state year /age date height weight=FIRST(age date height weight). LIST. * Result is: state year age date height weight ALA 1978 . . 175 225 ALA 1979 . . 200 250 ALA 1980 20 5 225 200 ALA 1981 17 7 . . DEL 1978 . . 150 150 DEL 1979 . . 175 190 DEL 1980 23 7 180 160 DEL 1981 22 8 . . Number of cases read: 8 Number of cases listed: 8 Regards Raynald Levesque www.spsstools.net -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matthew Boswell Sent: December 30, 2006 2:57 AM To: [hidden email] Subject: Merging Dissimilar Files First, a late thank you to all who had suggestions about my last query. Second, we are trying to combine two datasets, each having longitudinal data. Each has a unique set of variables to contribute and each covers a slightly different span of years. DATASET A STATE YEAR AGE DATE ALA 1980 20 5 ALA 1981 17 7 DEL 1980 23 7 DEL 1981 22 8 DATASET B STATE YEAR HEIGHT WEIGHT ALA 1978 175 225 ALA 1979 200 250 ALA 1980 225 200 DEL 1978 150 150 DEL 1979 175 190 DEL 1980 180 160 I like how in Raynald's book it says (regarding merging two or more files) "Now that we've had a good laugh over the liklihood that all of the files have been the exact same structure with the exact name variables...." (p. 92)....Its nice to know that I am not alone!! We would like the final dataset to have all the cases and all the vars,looking like: state year age date height weight ALA 1978 175 160 ALA 1979 200 250 ALA 1980 20 5 225 200 ALA 1981 17 7 etc. not sure if this is an ADD or a MATCH files, Hector had some MATCH suggestions in the archives that I tried, but nothing worked; how can I match without having a unique identifier- all this stuff is longitudinal, is there any way to have a BY STATE YEAR type command in there? Thanks! Matt B. |
In reply to this post by Matthew Boswell
Thanks for the syntax, it worked like a charm- the only modifications were
to specify the .sav files and then indicate the proper outfile (a new data set). Checking the command syntax for these new commands, I think I can see a little bit of the difference between ADD/MATCH/AGGREGATE; here I am clearly aggregating cases, based on the fact that I am adding both new cases and new variables to a new dataset. BREAK was one of those commands that I would constantly see come up in other people's syntax, here on the list, but could never piece together intuitively from reading the syntax. Basically, this was the command I was looking for in the sense that I had two "keys" to identify cases (apologies to database people out there)? I can also sort the BREAK vars in a variety of ways, VAR1 (A) VAR2 (D) etc?? Now I get to write the real syntax for the real datasets in question- many more cases and vars. Half the fun in asking these questions is making up these hypothetical datasets! Thanks again Matt B. |
In reply to this post by Raynald Levesque
At 12:25 PM 12/30/2006, Raynald Levesque wrote:
>The following assumes >- there is at most 1 record per file for a given state,year >- files are sorted by state year. [test data omitted] >ADD FILES FILE=A > /FILE=B > /BY=state year. >AGGREGATE OUTFILE=* > /PRE > /BREAK=state year > /age date height weight=FIRST(age date height weight). >LIST. MATCH FILES gives a working alternative syntax, unless I'm missing something. It makes the same assumptions listed above. I think it's simpler; it doesn't require a list of all non-key variables in the two files. Also, though I'm not checking, I don't think the FIRST function to AGGREGATE will work as desired for string variables without blank declared as missing - and that includes all long strings. (However, replacing FIRST by MAX will ordinarily work, as almost all non-blank string values compare as higher than blank strings.) The following is SPSS draft output, using the test data from Raynald's post. (To self: syntax & output files not kept separately.) MATCH FILES /FILE=A /FILE=B /BY STATE YEAR. LIST. List |--------------------------|------------------------| |Output Created |31-DEC-2006 01:17:21 | |--------------------------|------------------------| state year age date height weight ALA 1978 . . 175 225 ALA 1979 . . 200 250 ALA 1980 20 5 225 200 ALA 1981 17 7 . . DEL 1978 . . 150 150 DEL 1979 . . 175 190 DEL 1980 23 7 180 160 DEL 1981 22 8 . . Number of cases read: 8 Number of cases listed: 8 From Raynald's posting, for comparison: >* Result is: > >state year age date height weight > >ALA 1978 . . 175 225 >ALA 1979 . . 200 250 >ALA 1980 20 5 225 200 >ALA 1981 17 7 . . >DEL 1978 . . 150 150 >DEL 1979 . . 175 190 >DEL 1980 23 7 180 160 >DEL 1981 22 8 . . |
Free forum by Nabble | Edit this page |