|
I want to something like a MATCH FILES FILE=x /TABLE=y /BY z . Except the
"y" file is not unduplicated on "z". This is much easier to explain with an example: /* First file - id & name, unduplicated (pre-sorted) . data list /id 1 name 3-10 (a) . begin data 1 Ann 2 Bill 3 Chris 4 David 5 Ethel 6 Frank end data . save outfile=UndupComplete . /* Second file - id & major, duplicated but subset of id (pre-sorted) . data list /id 1 Major 3-6 (a) . begin data 2 ENGL 3 HIST 3 THEO 5 ENGL end data . save outfile=DupPartial . The result I want is a dataset that looks like: id name Major 1 Ann 2 Bill ENGL 3 Chris HIST 3 Chris THEO 4 David 5 Ethel ENGL 6 Frank Is there a good way to do this? (I'm using SPSS 14). Thanks. Catherine |
|
Catherine,
no problem as far as the ID is not duplicated. MATCH FILES /FILE= UNDUPCOMPLETE/FILE= DupPartial/by ID. This produces exactly what you want. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Catherine Kubitschek Sent: 27 September 2007 22:34 To: [hidden email] Subject: "Match files" with duplicates in table I want to something like a MATCH FILES FILE=x /TABLE=y /BY z . Except the "y" file is not unduplicated on "z". This is much easier to explain with an example: /* First file - id & name, unduplicated (pre-sorted) . data list /id 1 name 3-10 (a) . begin data 1 Ann 2 Bill 3 Chris 4 David 5 Ethel 6 Frank end data . save outfile=UndupComplete . /* Second file - id & major, duplicated but subset of id (pre-sorted) . data list /id 1 Major 3-6 (a) . begin data 2 ENGL 3 HIST 3 THEO 5 ENGL end data . save outfile=DupPartial . The result I want is a dataset that looks like: id name Major 1 Ann 2 Bill ENGL 3 Chris HIST 3 Chris THEO 4 David 5 Ethel ENGL 6 Frank Is there a good way to do this? (I'm using SPSS 14). Thanks. Catherine |
|
I'm sorry. I put a double negative into my message. My problem is that
the id IS duplicated in what would be the table. -Catherine At 9/27/2007 10:01 PM, Hector Maletta wrote: > Catherine, > no problem as far as the ID is not duplicated. > MATCH FILES /FILE= UNDUPCOMPLETE/FILE= DupPartial/by ID. > > This produces exactly what you want. > > Hector > > -----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Catherine Kubitschek >Sent: 27 September 2007 22:34 >To: [hidden email] >Subject: "Match files" with duplicates in table > > I want to something like a MATCH FILES FILE=x /TABLE=y /BY z . >Except the > "y" file is not unduplicated on "z". > > This is much easier to explain with an example: > > /* First file - id & name, unduplicated (pre-sorted) . > data list /id 1 name 3-10 (a) . > begin data > 1 Ann > 2 Bill > 3 Chris > 4 David > 5 Ethel > 6 Frank > end data . > save outfile=UndupComplete . > > /* Second file - id & major, duplicated but subset of id >(pre-sorted) . > data list /id 1 Major 3-6 (a) . > begin data > 2 ENGL > 3 HIST > 3 THEO > 5 ENGL > end data . > save outfile=DupPartial . > > The result I want is a dataset that looks like: > > id name Major > > 1 Ann > 2 Bill ENGL > 3 Chris HIST > 3 Chris THEO > 4 David > 5 Ethel ENGL > 6 Frank > > Is there a good way to do this? (I'm using SPSS 14). > Thanks. > > Catherine |
|
In reply to this post by Catherine Kubitschek
Bless you for really good test data! And exactly the right kind of
subject line: Stating the nature of the problem, succinctly, with no text except that statement. At 09:34 PM 9/27/2007, Catherine Kubitschek wrote: >I want to something like a MATCH FILES FILE=x /TABLE=y /BY z . Except >the >"y" file is not unduplicated on "z". The result I want is a dataset >that looks like: > >id name Major > > 1 Ann > 2 Bill ENGL > 3 Chris HIST > 3 Chris THEO > 4 David > 5 Ethel ENGL > 6 Frank One the way to do this is with MATCH FILES, ADD FILES, and SELECT. (I tried a couple of variant solutions, but I don't like them.) Following is SPSS 14 draft output. I'm using datasets instead of your scratch .SAV files. |-----------------------------|---------------------------| |Output Created |28-SEP-2007 12:37:31 | |-----------------------------|---------------------------| [UndupComplete] id name 1 Ann 2 Bill 3 Chris 4 David 5 Ethel 6 Frank Number of cases read: 6 Number of cases listed: 6 |-----------------------------|---------------------------| |Output Created |28-SEP-2007 12:37:31 | |-----------------------------|---------------------------| [DupPartial] id Major 2 ENGL 3 HIST 3 THEO 5 ENGL Number of cases read: 4 Number of cases listed: 4 * ... I. Double-match. May not give desired results if "MAJOR" . * ... is ever blank in 'DupPartial' . MATCH FILES /TABLE=UndupComplete /FILE =DupPartial /BY ID. DATASET NAME I_DoubleMatch WINDOW=FRONT. . /*-- LIST /*-*/. ADD FILES /FILE = * /FILE = UndupComplete /BY ID. . /*-- LIST /*-*/. SELECT IF MAJOR NE ' ' OR MISSING(LAG(ID)) OR ID NE LAG(ID) . LIST. List |-----------------------------|---------------------------| |Output Created |28-SEP-2007 12:37:33 | |-----------------------------|---------------------------| [I_DoubleMatch] id name Major 1 Ann 2 Bill ENGL 3 Chris HIST 3 Chris THEO 4 David 5 Ethel ENGL 6 Frank Number of cases read: 7 Number of cases listed: 7 =================== APPENDIX: Test data =================== * ................................................................. . * ................. Test data, from posting ..................... . * (Excellent. I didn't have to touch it, except replacing SAVE by . * DATASET commands.) . /* First file - id & name, unduplicated (pre-sorted) . data list /id 1 name 3-10 (a) . begin data 1 Ann 2 Bill 3 Chris 4 David 5 Ethel 6 Frank end data . * ... save outfile=UndupComplete . DATASET NAME UndupComplete WINDOW=FRONT. /* Second file - id & major, duplicated but subset of id (pre-sorted) . data list /id 1 Major 3-6 (a) . begin data 2 ENGL 3 HIST 3 THEO 5 ENGL end data . * ... save outfile=DupPartial . DATASET NAME DupPartial WINDOW=FRONT. DATASET ACTIVATE UndupComplete WINDOW=FRONT. LIST. DATASET ACTIVATE DupPartial WINDOW=FRONT. LIST. |
|
In reply to this post by Catherine Kubitschek
Catherine, Hector's solution produces a result of:
ID NAME MAJOR ----------------------------------------------- 1 Ann 2 Bill Engl 3 Chris Hist 3 Theo 4 David 5 Ethel Engl 6 Frank ...even though you get error message: File #2 KEY: 3 >Warning # 5132 >Duplicate key in a file. The BY variables do not uniquely identify each >case on the indicated file. Please check the results carefully. If I were you, I'd do it this way, sort by id (a) and name (d) and compute name as equal to lag(name) if id = lag(id). Maybe another approach would be to flatten out the file with duplicates using casestovars: casestovars /id = id. Then do the match file and you get: ID NAME MAJOR.1 MAJOR.2 ----------------------------------------------------------------- 1 Ann 2 Bill Engl 3 Chris Hist Theo 4 David 5 Ethel Engl 6 Frank Hope that helped! -Gary |
|
I have pursued privately my dialog with Catherine and the final
result is thus: MATCH FILES /TABLE = UNDUPCOMPLETE /FILE = DupPartial/by ID. MATCH FILES /FILE = * /FILE= UNDUPCOMPLETE /BY ID. SAVE OUTFILE =NEWFILE. The first MATCH assigns data from the list of students to each student-course record in the second file. The second MATCH will have the following effect: each student-course already included in the active file (resulting from the first step) will be included; each student not included in the active file (omitted in the results of the first step because not registered in any course) will be included too, but with a blank course. There might be a more elegant solution, but it is late here and I cannot think of anything better right now. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hal 9000 Sent: 28 September 2007 14:30 To: [hidden email] Subject: Re: "Match files" with duplicates in table Catherine, Hector's solution produces a result of: ID NAME MAJOR ----------------------------------------------- 1 Ann 2 Bill Engl 3 Chris Hist 3 Theo 4 David 5 Ethel Engl 6 Frank ...even though you get error message: File #2 KEY: 3 >Warning # 5132 >Duplicate key in a file. The BY variables do not uniquely identify each >case on the indicated file. Please check the results carefully. If I were you, I'd do it this way, sort by id (a) and name (d) and compute name as equal to lag(name) if id = lag(id). Maybe another approach would be to flatten out the file with duplicates using casestovars: casestovars /id = id. Then do the match file and you get: ID NAME MAJOR.1 MAJOR.2 ----------------------------------------------------------------- 1 Ann 2 Bill Engl 3 Chris Hist Theo 4 David 5 Ethel Engl 6 Frank Hope that helped! -Gary |
|
Cool!
On 9/28/07, Hector Maletta <[hidden email]> wrote: > I have pursued privately my dialog with Catherine and the final > result is thus: > > MATCH FILES > /TABLE = UNDUPCOMPLETE > /FILE = DupPartial/by ID. > MATCH FILES /FILE = * /FILE= UNDUPCOMPLETE /BY ID. > > SAVE OUTFILE =NEWFILE. > > The first MATCH assigns data from the list of students to each > student-course record in the second file. The second MATCH will have the > following effect: each student-course already included in the active file > (resulting from the first step) will be included; each student not included > in the active file (omitted in the results of the first step because not > registered in any course) will be included too, but with a blank course. > > There might be a more elegant solution, but it is late here and I > cannot think of anything better right now. > > > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hal > 9000 > Sent: 28 September 2007 14:30 > To: [hidden email] > Subject: Re: "Match files" with duplicates in table > > Catherine, Hector's solution produces a result of: > > ID NAME MAJOR > ----------------------------------------------- > 1 Ann > 2 Bill Engl > 3 Chris Hist > 3 Theo > 4 David > 5 Ethel Engl > 6 Frank > > ...even though you get error message: > > File #2 > KEY: 3 > > >Warning # 5132 > >Duplicate key in a file. The BY variables do not uniquely > identify each > >case on the indicated file. Please check the results carefully. > > If I were you, I'd do it this way, sort by id (a) and name (d) and > compute name as equal to lag(name) if id = lag(id). > > Maybe another approach would be to flatten out the file with > duplicates using casestovars: > > casestovars > /id = id. > > Then do the match file and you get: > > ID NAME MAJOR.1 MAJOR.2 > ----------------------------------------------------------------- > 1 Ann > 2 Bill Engl > 3 Chris Hist Theo > 4 David > 5 Ethel Engl > 6 Frank > > Hope that helped! > -Gary > > > |
|
In reply to this post by Hal 9000
Yep, provided that there are not too many duplicates I
usually also use: sort cases by id. casestovars /id=id. match files... Many duplicates + many records sometimes make spss crash when you attempt to run a casestovars command. Cheers!! Albert-Jan --- Hal 9000 <[hidden email]> wrote: > Catherine, Hector's solution produces a result of: > > ID NAME MAJOR > ----------------------------------------------- > 1 Ann > 2 Bill Engl > 3 Chris Hist > 3 Theo > 4 David > 5 Ethel Engl > 6 Frank > > ...even though you get error message: > > File #2 > KEY: 3 > > >Warning # 5132 > >Duplicate key in a file. The BY variables do not > uniquely identify each > >case on the indicated file. Please check the > results carefully. > > If I were you, I'd do it this way, sort by id (a) > and name (d) and > compute name as equal to lag(name) if id = lag(id). > > Maybe another approach would be to flatten out the > file with > duplicates using casestovars: > > casestovars > /id = id. > > Then do the match file and you get: > > ID NAME MAJOR.1 MAJOR.2 > > 1 Ann > 2 Bill Engl > 3 Chris Hist Theo > 4 David > 5 Ethel Engl > 6 Frank > > Hope that helped! > -Gary > Cheers! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 |
|
In reply to this post by Hector Maletta
At 02:30 PM 9/28/2007, Hector Maletta wrote:
>My final result is thus: SPSS 14 draft output, test data as before: MATCH FILES /TABLE = UNDUPCOMPLETE /FILE = DupPartial/by ID. MATCH FILES /FILE = * /FILE= UNDUPCOMPLETE /BY ID. LIST. List |-----------------------------|---------------------------| |Output Created |30-SEP-2007 17:47:58 | |-----------------------------|---------------------------| id name Major 1 Ann 2 Bill ENGL File #1 KEY: 3 >Warning # 5132 >Duplicate key in a file. The BY variables do not uniquely identify each >case on the indicated file. Please check the results carefully. 3 Chris HIST 3 Chris THEO 4 David 5 Ethel ENGL 6 Frank Number of cases read: 7 Number of cases listed: 7 (Gary Moser, <[hidden email]>, tested earlier, and reported the same error message that's here.) Hector wrote, >There might be a more elegant solution, but it >is late here and I cannot think of anything better right now. THERE'S a question of aesthetics. I expected, myself, that a neat double MATCH FILES would do it. I rejected it without writing the code because of the problem of non-unique keys, as reported in the warning message above. But Hector's code gives the right result, and I think will do so reliably. I don't like to put code in production that gives warning messages. The last thing you want is to leave users either blasè or confused about whether to pay attention to warning messages. But Hector's code does work, and it's simpler and clearer than the MATCH FILES/ADD FILES logic I posted. Now we can be like one of those TV shows where viewers vote on which singer, or something, is the best. Or maybe, simply, de gustibus non disputandum est. |
|
That "warning" message is more of a note and usually can be ignored for a
FILE file. It is however, a fatal error for a TABLE file. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Sunday, September 30, 2007 4:40 PM To: [hidden email] Subject: Re: "Match files" with duplicates in table At 02:30 PM 9/28/2007, Hector Maletta wrote: >My final result is thus: SPSS 14 draft output, test data as before: MATCH FILES /TABLE = UNDUPCOMPLETE /FILE = DupPartial/by ID. MATCH FILES /FILE = * /FILE= UNDUPCOMPLETE /BY ID. LIST. List |-----------------------------|---------------------------| |Output Created |30-SEP-2007 17:47:58 | |-----------------------------|---------------------------| id name Major 1 Ann 2 Bill ENGL File #1 KEY: 3 >Warning # 5132 >Duplicate key in a file. The BY variables do not uniquely identify each >case on the indicated file. Please check the results carefully. 3 Chris HIST 3 Chris THEO 4 David 5 Ethel ENGL 6 Frank Number of cases read: 7 Number of cases listed: 7 (Gary Moser, <[hidden email]>, tested earlier, and reported the same error message that's here.) Hector wrote, >There might be a more elegant solution, but it >is late here and I cannot think of anything better right now. THERE'S a question of aesthetics. I expected, myself, that a neat double MATCH FILES would do it. I rejected it without writing the code because of the problem of non-unique keys, as reported in the warning message above. But Hector's code gives the right result, and I think will do so reliably. I don't like to put code in production that gives warning messages. The last thing you want is to leave users either blasè or confused about whether to pay attention to warning messages. But Hector's code does work, and it's simpler and clearer than the MATCH FILES/ADD FILES logic I posted. Now we can be like one of those TV shows where viewers vote on which singer, or something, is the best. Or maybe, simply, de gustibus non disputandum est. |
|
The problem with Catherine's situation is that she wants first to
assign student's data to all course registrations, and then she wants to add student data for all students not registered in any course. The first problem is well solved with a TABLE subcommand, using the general student list as a TABLE and the course registration list as a file. Of course, the latter may contain duplicate ID records since the same students may have registered in more than one course. For the second problem, I responded in haste and late at night, sorry. Before matching or adding, one needs to exclude from the student list all students already included before, i.e. exclude all students registered for courses, and then ADD to the course file the remaining students from the student list. This complicates things a bit. The complete process may look like this: *Assign student data to course registration records. MATCH FILES/TABLE='UNDUPCOMPLETE.SAV'/FILE='DupPartial.SAV'/by ID. SAVE OUTFILE 'COURSEFILE.SAV'. *Flag records with course registration. COMPUTE REGCOURSE=1. *Aggregate registration records by student. AGGREGATE OUTFILE=*/PRESORTED/BREAK ID /REGCOURSE=MAX(REGCOURSE). *Match records of student registered in courses, with student list. MATCH FILES /FILE 'UNDUPCOMPLETE.SAV'/FILE=*/BY ID. *Exclude from list all student registered in courses. SELECT IF SYSMIS(REGCOURSE) OR REGCOURSE=0. *Add registered and non-registered students in a single list. ADD FILES /FILE 'COURSEFILE.SAV'/FILE *. SAVE OUTFILE 'FINAL.SAV'. This is untested. Hope it works. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ViAnn Beadle Sent: 01 October 2007 10:08 To: [hidden email] Subject: Re: "Match files" with duplicates in table That "warning" message is more of a note and usually can be ignored for a FILE file. It is however, a fatal error for a TABLE file. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Sunday, September 30, 2007 4:40 PM To: [hidden email] Subject: Re: "Match files" with duplicates in table At 02:30 PM 9/28/2007, Hector Maletta wrote: >My final result is thus: SPSS 14 draft output, test data as before: MATCH FILES /TABLE = UNDUPCOMPLETE /FILE = DupPartial/by ID. MATCH FILES /FILE = * /FILE= UNDUPCOMPLETE /BY ID. LIST. List |-----------------------------|---------------------------| |Output Created |30-SEP-2007 17:47:58 | |-----------------------------|---------------------------| id name Major 1 Ann 2 Bill ENGL File #1 KEY: 3 >Warning # 5132 >Duplicate key in a file. The BY variables do not uniquely identify each >case on the indicated file. Please check the results carefully. 3 Chris HIST 3 Chris THEO 4 David 5 Ethel ENGL 6 Frank Number of cases read: 7 Number of cases listed: 7 (Gary Moser, <[hidden email]>, tested earlier, and reported the same error message that's here.) Hector wrote, >There might be a more elegant solution, but it >is late here and I cannot think of anything better right now. THERE'S a question of aesthetics. I expected, myself, that a neat double MATCH FILES would do it. I rejected it without writing the code because of the problem of non-unique keys, as reported in the warning message above. But Hector's code gives the right result, and I think will do so reliably. I don't like to put code in production that gives warning messages. The last thing you want is to leave users either blasè or confused about whether to pay attention to warning messages. But Hector's code does work, and it's simpler and clearer than the MATCH FILES/ADD FILES logic I posted. Now we can be like one of those TV shows where viewers vote on which singer, or something, is the best. Or maybe, simply, de gustibus non disputandum est. |
| Free forum by Nabble | Edit this page |
