SPSSX Discussion

Dedup involving time variable

Classic

List

Threaded

5 messages Options

fchickering

Dedup involving time variable

I want to keep only 1 instance of any row that has identical crn-day-time. How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.

number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00

Art Kendall

Re: Dedup involving time variable

get ....
click <data> <identify duplicate cases> exit the GUI via <paste>
switch to the syntax window.
run the new part of the syntax.

Art Kendall
Social Research Consultants

On 1/23/2014 3:09 PM, fchickering [via SPSSX Discussion] wrote:

I want to keep only 1 instance of any row that has identical crn-day-time. How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.

number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Dedup-involving-time-variable-tp5724105.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Bruce Weaver

Re: Dedup involving time variable

Administrator

In reply to this post by fchickering

MATCH FILES and ADD FILES both have a /FIRST sub-command you can use to flag the first row of a group of rows that have some set of variables in common. Then use SELECT IF. E.g.,

NEW FILE.
DATASET CLOSE ALL.

DATA LIST list / number crn(2f5.0) day(a2) time(time5).
BEGIN DATA
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
END DATA.

SORT CASES by crn day time.
MATCH FILES
file = * /
by crn day time /
first = FirstRec.
EXECUTE.
SELECT IF FirstRec.
SORT CASES by number.
LIST.

Output:
number crn day time FirstRec

1 20021 W 9:00 1
2 20021 T 8:00 1
3 20468 M 9:15 1
5 20555 Th 2:00 1
6 20555 Th 3:00 1

Number of cases read: 5 Number of cases listed: 5

Alternatively, you could use the LAG function. But MATCH FILES with /FIRST seems to me a bit tidier when one is checking for equivalence on more than one or two variables.

Oh yeah...you could also use Data > Identify Duplicate Cases in the GUI. It will end up generating syntax something like my MATCH FILES above, IIRC.

HTH.

fchickering wrote

I want to keep only 1 instance of any row that has identical crn-day-time. How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.

number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

fchickering

RE: Dedup involving time variable

Thank you!

From: Bruce Weaver [via SPSSX Discussion] [mailto:ml-node+[hidden email]]
Sent: Thursday, January 23, 2014 4:05 PM
To: Fran Chickering
Subject: Re: Dedup involving time variable

fchickering wrote

--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Dedup-involving-time-variable-tp5724105p5724107.html

To unsubscribe from Dedup involving time variable, click here.
NAML

David Marso

Re: Dedup involving time variable

Administrator

In reply to this post by Bruce Weaver

You don't require the EXECUTE.
---

Bruce Weaver wrote

MATCH FILES and ADD FILES both have a /FIRST sub-command you can use to flag the first row of a group of rows that have some set of variables in common. Then use SELECT IF. E.g.,

NEW FILE.
DATASET CLOSE ALL.

DATA LIST list / number crn(2f5.0) day(a2) time(time5).
BEGIN DATA
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
END DATA.

SORT CASES by crn day time.
MATCH FILES
file = * /
by crn day time /
first = FirstRec.
EXECUTE.
SELECT IF FirstRec.
SORT CASES by number.
LIST.

Output:
number crn day time FirstRec

1 20021 W 9:00 1
2 20021 T 8:00 1
3 20468 M 9:15 1
5 20555 Th 2:00 1
6 20555 Th 3:00 1

Number of cases read: 5 Number of cases listed: 5

Alternatively, you could use the LAG function. But MATCH FILES with /FIRST seems to me a bit tidier when one is checking for equivalence on more than one or two variables.

Oh yeah...you could also use Data > Identify Duplicate Cases in the GUI. It will end up generating syntax something like my MATCH FILES above, IIRC.

HTH.

fchickering wrote

I want to keep only 1 instance of any row that has identical crn-day-time. How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.

number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"