SPSSX Discussion

identifying and removing repeat cases

Classic

List

Threaded

2 messages Options

Paul Ginns

identifying and removing repeat cases

Hi everyone,

Sorry if this is really basic, but can anyone tell me how I can easily
identify and remove repeat cases from a dataset using syntax? I have a large
teaching evaluation dataset (over 6700 cases) where there are several
hundred students (identified by their student ID numbers) who have filled in
the same survey more than once. Ideally, I'd like to remove repeat cases at
random so each student has provided one set of feedback only.

Thanks in advance for any advice,

Paul

Dr. Paul Ginns
Survey Officer
Institute For Teaching and Learning
Room 385 Carslaw Building F07
The University of Sydney NSW 2006
Australia

Phone: (02) 93513607
Fax: (02) 9351 4331
Email: [hidden email]
http://www.itl.usyd.edu.au/aboutus/paulginns.htm

Richard Ristow

Re: identifying and removing repeat cases

At 06:18 PM 2/22/2007, Paul Ginns wrote:

>I have a large teaching evaluation dataset (over 6700 cases) where
>there are several hundred students (identified by their student ID
>numbers) who have filled in the same survey more than once. Ideally,
>I'd like to remove repeat cases at random so each student has provided
>one set of feedback only.

First, of course, you're throwing away information by discarding all
but one of the survey instances the student has filled out. That may be
the way to have a reasonable set for analysis, but the difference
between surveys by the same student is likely illuminating. And if you
have any other useful key, like date the survey was filled out, some
criterion like earliest or latest instance may be better than random.

That said, try this. Tested; SPSS 15 draft output. <WRR: Code & listing
not saved separately. Test data hand-entered in data editor.>

LIST.

List
|-----------------------------|---------------------------|
|Output Created |24-FEB-2007 10:34:46 |
|-----------------------------|---------------------------|
[Surveys]

Stdt_ID Name Q01 Q02 Q03

1 Joe Yes OK Tomorrow
2 Pete No Good Today
2 Pete No Great Yesterday
3 Ann Yes Bad Yesterday
4 Betty Yes Great Yesterday
4 Betty No Awful Tomorrow
4 Betty No OK Today
5 Sam No OK Today
6 Xenephon No Awful Yesterday
6 Xenephon No Bad Yesterday
6 Xenephon Yes OK Today
6 Xenephon Yes Good Today
6 Xenephon No Great Tomorrow
7 Elroy Yes OK Today

Number of cases read: 14 Number of cases listed: 14

SORT CASES BY Stdt_ID.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=Stdt_ID
/Repeats 'No. of times student filled out survey' = NU.
LIST.

List
|-----------------------------|---------------------------|
|Output Created |24-FEB-2007 10:34:46 |
|-----------------------------|---------------------------|
[Surveys]

Stdt_ID Name Q01 Q02 Q03 Repeats

1 Joe Yes OK Tomorrow 1
2 Pete No Good Today 2
2 Pete No Great Yesterday 2
3 Ann Yes Bad Yesterday 1
4 Betty Yes Great Yesterday 3
4 Betty No Awful Tomorrow 3
4 Betty No OK Today 3
5 Sam No OK Today 1
6 Xenephon No Awful Yesterday 5
6 Xenephon No Bad Yesterday 5
6 Xenephon Yes OK Today 5
6 Xenephon Yes Good Today 5
6 Xenephon No Great Tomorrow 5
7 Elroy Yes OK Today 1

Number of cases read: 14 Number of cases listed: 14

DO IF Stdt_ID NE LAG(Stdt_ID)
OR MISSING(LAG(Stdt_ID)).
. COMPUTE #INSTANCE = 0.
. COMPUTE #SELECT = TRUNC(RV.UNIFORM(1,Repeats+1)).
END IF.
COMPUTE #INSTANCE = #INSTANCE + 1.
SELECT IF #INSTANCE = #SELECT.

LIST.

List
|-----------------------------|---------------------------|
|Output Created |24-FEB-2007 10:34:46 |
|-----------------------------|---------------------------|
[Surveys]

Stdt_ID Name Q01 Q02 Q03 Repeats

1 Joe Yes OK Tomorrow 1
2 Pete No Good Today 2
3 Ann Yes Bad Yesterday 1
4 Betty Yes Great Yesterday 3
5 Sam No OK Today 1
6 Xenephon No Bad Yesterday 5
7 Elroy Yes OK Today 1

Number of cases read: 7 Number of cases listed: 7