identifying and removing repeat cases

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

identifying and removing repeat cases

Paul Ginns
Hi everyone,



Sorry if this is really basic, but can anyone tell me how I can easily
identify and remove repeat cases from a dataset using syntax? I have a large
teaching evaluation dataset (over 6700 cases) where there are several
hundred students (identified by their student ID numbers) who have filled in
the same survey more than once. Ideally, I'd like to remove repeat cases at
random so each student has provided one set of feedback only.



Thanks in advance for any advice,



Paul



Dr. Paul Ginns
Survey Officer
Institute For Teaching and Learning
Room 385 Carslaw Building F07
The University of Sydney  NSW  2006
Australia

Phone:  (02) 93513607
Fax:      (02) 9351 4331
Email: [hidden email]
http://www.itl.usyd.edu.au/aboutus/paulginns.htm
Reply | Threaded
Open this post in threaded view
|

Re: identifying and removing repeat cases

Richard Ristow
At 06:18 PM 2/22/2007, Paul Ginns wrote:

>I have a large teaching evaluation dataset (over 6700 cases) where
>there are several hundred students (identified by their student ID
>numbers) who have filled in the same survey more than once. Ideally,
>I'd like to remove repeat cases at random so each student has provided
>one set of feedback only.

First, of course, you're throwing away information by discarding all
but one of the survey instances the student has filled out. That may be
the way to have a reasonable set for analysis, but the difference
between surveys by the same student is likely illuminating. And if you
have any other useful key, like date the survey was filled out, some
criterion like earliest or latest instance may be better than random.

That said, try this. Tested; SPSS 15 draft output. <WRR: Code & listing
not saved separately. Test data hand-entered in data editor.>

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-FEB-2007 10:34:46       |
|-----------------------------|---------------------------|
[Surveys]

Stdt_ID Name       Q01      Q02      Q03

     1   Joe        Yes      OK       Tomorrow
     2   Pete       No       Good     Today
     2   Pete       No       Great    Yesterday
     3   Ann        Yes      Bad      Yesterday
     4   Betty      Yes      Great    Yesterday
     4   Betty      No       Awful    Tomorrow
     4   Betty      No       OK       Today
     5   Sam        No       OK       Today
     6   Xenephon   No       Awful    Yesterday
     6   Xenephon   No       Bad      Yesterday
     6   Xenephon   Yes      OK       Today
     6   Xenephon   Yes      Good     Today
     6   Xenephon   No       Great    Tomorrow
     7   Elroy      Yes      OK       Today

Number of cases read:  14    Number of cases listed:  14


SORT CASES BY Stdt_ID.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK=Stdt_ID
    /Repeats 'No. of times student filled out survey' = NU.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-FEB-2007 10:34:46       |
|-----------------------------|---------------------------|
[Surveys]

Stdt_ID Name       Q01      Q02      Q03        Repeats

     1   Joe        Yes      OK       Tomorrow         1
     2   Pete       No       Good     Today            2
     2   Pete       No       Great    Yesterday        2
     3   Ann        Yes      Bad      Yesterday        1
     4   Betty      Yes      Great    Yesterday        3
     4   Betty      No       Awful    Tomorrow         3
     4   Betty      No       OK       Today            3
     5   Sam        No       OK       Today            1
     6   Xenephon   No       Awful    Yesterday        5
     6   Xenephon   No       Bad      Yesterday        5
     6   Xenephon   Yes      OK       Today            5
     6   Xenephon   Yes      Good     Today            5
     6   Xenephon   No       Great    Tomorrow         5
     7   Elroy      Yes      OK       Today            1

Number of cases read:  14    Number of cases listed:  14


DO IF    Stdt_ID NE LAG(Stdt_ID)
       OR    MISSING(LAG(Stdt_ID)).
.  COMPUTE #INSTANCE = 0.
.  COMPUTE #SELECT   = TRUNC(RV.UNIFORM(1,Repeats+1)).
END IF.
COMPUTE   #INSTANCE = #INSTANCE + 1.
SELECT IF #INSTANCE = #SELECT.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-FEB-2007 10:34:46       |
|-----------------------------|---------------------------|
[Surveys]

Stdt_ID Name       Q01      Q02      Q03        Repeats

     1   Joe        Yes      OK       Tomorrow         1
     2   Pete       No       Good     Today            2
     3   Ann        Yes      Bad      Yesterday        1
     4   Betty      Yes      Great    Yesterday        3
     5   Sam        No       OK       Today            1
     6   Xenephon   No       Bad      Yesterday        5
     7   Elroy      Yes      OK       Today            1

Number of cases read:  7    Number of cases listed:  7