Hi everyone,
Sorry if this is really basic, but can anyone tell me how I can easily identify and remove repeat cases from a dataset using syntax? I have a large teaching evaluation dataset (over 6700 cases) where there are several hundred students (identified by their student ID numbers) who have filled in the same survey more than once. Ideally, I'd like to remove repeat cases at random so each student has provided one set of feedback only. Thanks in advance for any advice, Paul Dr. Paul Ginns Survey Officer Institute For Teaching and Learning Room 385 Carslaw Building F07 The University of Sydney NSW 2006 Australia Phone: (02) 93513607 Fax: (02) 9351 4331 Email: [hidden email] http://www.itl.usyd.edu.au/aboutus/paulginns.htm |
At 06:18 PM 2/22/2007, Paul Ginns wrote:
>I have a large teaching evaluation dataset (over 6700 cases) where >there are several hundred students (identified by their student ID >numbers) who have filled in the same survey more than once. Ideally, >I'd like to remove repeat cases at random so each student has provided >one set of feedback only. First, of course, you're throwing away information by discarding all but one of the survey instances the student has filled out. That may be the way to have a reasonable set for analysis, but the difference between surveys by the same student is likely illuminating. And if you have any other useful key, like date the survey was filled out, some criterion like earliest or latest instance may be better than random. That said, try this. Tested; SPSS 15 draft output. <WRR: Code & listing not saved separately. Test data hand-entered in data editor.> LIST. List |-----------------------------|---------------------------| |Output Created |24-FEB-2007 10:34:46 | |-----------------------------|---------------------------| [Surveys] Stdt_ID Name Q01 Q02 Q03 1 Joe Yes OK Tomorrow 2 Pete No Good Today 2 Pete No Great Yesterday 3 Ann Yes Bad Yesterday 4 Betty Yes Great Yesterday 4 Betty No Awful Tomorrow 4 Betty No OK Today 5 Sam No OK Today 6 Xenephon No Awful Yesterday 6 Xenephon No Bad Yesterday 6 Xenephon Yes OK Today 6 Xenephon Yes Good Today 6 Xenephon No Great Tomorrow 7 Elroy Yes OK Today Number of cases read: 14 Number of cases listed: 14 SORT CASES BY Stdt_ID. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=Stdt_ID /Repeats 'No. of times student filled out survey' = NU. LIST. List |-----------------------------|---------------------------| |Output Created |24-FEB-2007 10:34:46 | |-----------------------------|---------------------------| [Surveys] Stdt_ID Name Q01 Q02 Q03 Repeats 1 Joe Yes OK Tomorrow 1 2 Pete No Good Today 2 2 Pete No Great Yesterday 2 3 Ann Yes Bad Yesterday 1 4 Betty Yes Great Yesterday 3 4 Betty No Awful Tomorrow 3 4 Betty No OK Today 3 5 Sam No OK Today 1 6 Xenephon No Awful Yesterday 5 6 Xenephon No Bad Yesterday 5 6 Xenephon Yes OK Today 5 6 Xenephon Yes Good Today 5 6 Xenephon No Great Tomorrow 5 7 Elroy Yes OK Today 1 Number of cases read: 14 Number of cases listed: 14 DO IF Stdt_ID NE LAG(Stdt_ID) OR MISSING(LAG(Stdt_ID)). . COMPUTE #INSTANCE = 0. . COMPUTE #SELECT = TRUNC(RV.UNIFORM(1,Repeats+1)). END IF. COMPUTE #INSTANCE = #INSTANCE + 1. SELECT IF #INSTANCE = #SELECT. LIST. List |-----------------------------|---------------------------| |Output Created |24-FEB-2007 10:34:46 | |-----------------------------|---------------------------| [Surveys] Stdt_ID Name Q01 Q02 Q03 Repeats 1 Joe Yes OK Tomorrow 1 2 Pete No Good Today 2 3 Ann Yes Bad Yesterday 1 4 Betty Yes Great Yesterday 3 5 Sam No OK Today 1 6 Xenephon No Bad Yesterday 5 7 Elroy Yes OK Today 1 Number of cases read: 7 Number of cases listed: 7 |
Free forum by Nabble | Edit this page |