Dedup involving time variable

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Dedup involving time variable

fchickering
I want to keep only 1 instance of any row that has identical crn-day-time.  How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.



number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
Reply | Threaded
Open this post in threaded view
|

Re: Dedup involving time variable

Art Kendall
get ....
click <data> <identify duplicate cases> exit the GUI via <paste>
switch to the syntax window.
run the new part of the syntax.

Art Kendall
Social Research Consultants
On 1/23/2014 3:09 PM, fchickering [via SPSSX Discussion] wrote:
I want to keep only 1 instance of any row that has identical crn-day-time.  How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.



number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Dedup-involving-time-variable-tp5724105.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Dedup involving time variable

Bruce Weaver
Administrator
In reply to this post by fchickering
MATCH FILES and ADD FILES both have a /FIRST sub-command you can use to flag the first row of a group of rows that have some set of variables in common.  Then use SELECT IF.  E.g.,


NEW FILE.
DATASET CLOSE ALL.

DATA LIST list / number crn(2f5.0) day(a2) time(time5).
BEGIN DATA
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
END DATA.

SORT CASES by crn day time.
MATCH FILES
  file = * /
  by crn day time /
  first = FirstRec.
EXECUTE.
SELECT IF FirstRec.
SORT CASES by number.
LIST.

Output:
number   crn day  time FirstRec
 
     1 20021 W    9:00     1
     2 20021 T    8:00     1
     3 20468 M    9:15     1
     5 20555 Th   2:00     1
     6 20555 Th   3:00     1
 
Number of cases read:  5    Number of cases listed:  5


Alternatively, you could use the LAG function.  But MATCH FILES with /FIRST seems to me a bit tidier when one is checking for equivalence on more than one or two variables.

Oh yeah...you could also use Data > Identify Duplicate Cases in the GUI.  It will end up generating syntax something like my MATCH FILES above, IIRC.

HTH.


fchickering wrote
I want to keep only 1 instance of any row that has identical crn-day-time.  How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.



number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

RE: Dedup involving time variable

fchickering

Thank you!

 

From: Bruce Weaver [via SPSSX Discussion] [mailto:ml-node+[hidden email]]
Sent: Thursday, January 23, 2014 4:05 PM
To: Fran Chickering
Subject: Re: Dedup involving time variable

 

MATCH FILES and ADD FILES both have a /FIRST sub-command you can use to flag the first row of a group of rows that have some set of variables in common.  Then use SELECT IF.  E.g.,


NEW FILE.
DATASET CLOSE ALL.

DATA LIST list / number crn(2f5.0) day(a2) time(time5).
BEGIN DATA
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
END DATA.

SORT CASES by crn day time.
MATCH FILES
  file = * /
  by crn day time /
  first = FirstRec.
EXECUTE.
SELECT IF FirstRec.
SORT CASES by number.
LIST.

Output:
number   crn day  time FirstRec
 
     1 20021 W    9:00     1
     2 20021 T    8:00     1
     3 20468 M    9:15     1
     5 20555 Th   2:00     1
     6 20555 Th   3:00     1
 
Number of cases read:  5    Number of cases listed:  5


Alternatively, you could use the LAG function.  But MATCH FILES with /FIRST seems to me a bit tidier when one is checking for equivalence on more than one or two variables.

Oh yeah...you could also use Data > Identify Duplicate Cases in the GUI.  It will end up generating syntax something like my MATCH FILES above, IIRC.

HTH.

fchickering wrote

I want to keep only 1 instance of any row that has identical crn-day-time.  How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.



number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00

--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

 


If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Dedup-involving-time-variable-tp5724105p5724107.html

To unsubscribe from Dedup involving time variable, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Dedup involving time variable

David Marso
Administrator
In reply to this post by Bruce Weaver
You don't require the EXECUTE.
---
Bruce Weaver wrote
MATCH FILES and ADD FILES both have a /FIRST sub-command you can use to flag the first row of a group of rows that have some set of variables in common.  Then use SELECT IF.  E.g.,


NEW FILE.
DATASET CLOSE ALL.

DATA LIST list / number crn(2f5.0) day(a2) time(time5).
BEGIN DATA
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
END DATA.

SORT CASES by crn day time.
MATCH FILES
  file = * /
  by crn day time /
  first = FirstRec.
EXECUTE.
SELECT IF FirstRec.
SORT CASES by number.
LIST.

Output:
number   crn day  time FirstRec
 
     1 20021 W    9:00     1
     2 20021 T    8:00     1
     3 20468 M    9:15     1
     5 20555 Th   2:00     1
     6 20555 Th   3:00     1
 
Number of cases read:  5    Number of cases listed:  5


Alternatively, you could use the LAG function.  But MATCH FILES with /FIRST seems to me a bit tidier when one is checking for equivalence on more than one or two variables.

Oh yeah...you could also use Data > Identify Duplicate Cases in the GUI.  It will end up generating syntax something like my MATCH FILES above, IIRC.

HTH.


fchickering wrote
I want to keep only 1 instance of any row that has identical crn-day-time.  How would I do that?
In this example I want to keep rows 1-3 and 5-6.
Thx in advance.



number crn day time
1 20021 W 9:00
2 20021 T 8:00
3 20468 M 9:15
4 20468 M 9:15
5 20555 Th 2:00
6 20555 Th 3:00
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"