removing duplicates within categories

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

removing duplicates within categories

Keval Khichadia
Hi,
I would like to count the number of students enrolled in a specific course. I am using the syntax:
aggregate outfile = * mode = addvariables overwrite = yes
/break = course call
/EnrollCount = N(ID).
Is there a way I can adjust this to only count the unique number of ID'S within course and call.
If there is not a way to do this, is there syntax that can be used to remove duplicate ID'S within course and call.
Thanks,
Keval




====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: removing duplicates within categories

Carolyn Catenhauser
You could add this before your syntax. It will identify duplicate IDs,
then filter them out.

 

* Identify Duplicate Cases.

SORT CASES BY ID(A) Couse(A) Call(A) .

MATCH FILES /FILE = * /BY ID

 /FIRST = PrimaryFirst /LAST = PrimaryLast.

DO IF (PrimaryFirst).

COMPUTE MatchSequence = 1 - PrimaryLast.

ELSE.

COMPUTE MatchSequence = MatchSequence + 1.

END IF.

LEAVE MatchSequence.

FORMAT MatchSequence (f7).

COMPUTE InDupGrp = MatchSequence > 0.

SORT CASES InDupGrp(D).

MATCH FILES /FILE = * /DROP = PrimaryFirst InDupGrp.

VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as
Primary'

  MatchSequence 'Sequential count of matching cases' .

VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.

VARIABLE LEVEL PrimaryLast (ORDINAL)

     /MatchSequence (SCALE).

FREQUENCIES VARIABLES = PrimaryLast MatchSequence .

EXECUTE.

 

USE ALL.

COMPUTE filter_$=(PrimaryLast = 1).

VARIABLE LABEL filter_$ 'PrimaryLast = 1 (FILTER)'.

VALUE LABELS filter_$  0 'Not Selected' 1 'Selected'.

FORMAT filter_$ (f1.0).

FILTER BY filter_$.

EXECUTE .

 

________________________________________________________________________
_____________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email] | 816.841.5611

 

 

 

 

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Keval Khichadia
Sent: Wednesday, July 09, 2008 12:23 PM
To: [hidden email]
Subject: removing duplicates within categories

 

Hi,

I would like to count the number of students enrolled in a specific
course. I am using the syntax:

aggregate outfile = * mode = addvariables overwrite = yes

/break = course call

/EnrollCount = N(ID).

Is there a way I can adjust this to only count the unique number of ID'S
within course and call.

If there is not a way to do this, is there syntax that can be used to
remove duplicate ID'S within course and call.

Thanks,

Keval

 

 

 

 

=======

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD


#####################################################################################
This email and any attachments thereto may contain private, confidential,
and privileged material for the sole use of the intended recipient. Any review,
copying, or distribution of this email (or any attachments thereto) by others is
strictly prohibited. If you are not the intended recipient, please contact the sender
immediately and permanently delete the original and any copies of this email and any
attachments thereto.
#####################################################################################

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: removing duplicates within categories

Richard Ristow
In reply to this post by Keval Khichadia
At 01:23 PM 7/9/2008, Keval Khichadia wrote:

>Hi, I would like to count the number of students enrolled in a
>specific course. I am using the syntax:
>aggregate outfile = * mode = addvariables overwrite = yes
>            /break       = course call
>            /EnrollCount = N(ID).
>
>Is there a way I can adjust this to only count the unique number of
>ID'S within course and call?

Try this (untested). It will not work with MODE=ADDVARIABLES; create
the new aggregated file with one record per course-call combination,
then MATCH FILES back if needed.

AGGREGATE OUTFILE=*
    /BREAK = course call ID
    /NRcds 'No. of records for this course, this student' = N.

AGGREGATE OUTFILE=*
    /BREAK = course call
    /NStdt 'No. of unique students in course' = N.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD