Hi All:
I tried to use LAG to delete duplicates but failed and I am sure there should an erro here. Is there anyone who can help me fix this erro? NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. SELECT IF DUP=1. EXE. I got all records wiped out. Thanks. Charles |
Try use Data -> Identify Duplicate cases
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Charles Deng Sent: Wednesday, June 21, 2006 4:11 PM To: [hidden email] Subject: Syntax Lag Hi All: I tried to use LAG to delete duplicates but failed and I am sure there should an erro here. Is there anyone who can help me fix this erro? NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. SELECT IF DUP=1. EXE. I got all records wiped out. Thanks. Charles |
In reply to this post by Charles Deng
Try this :
COMPUTE DUP=1. EXE. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. EXE. SELECT IF DUP = 1. EXE. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Charles Deng Sent: Wednesday, June 21, 2006 4:11 PM To: [hidden email] Subject: Syntax Lag Hi All: I tried to use LAG to delete duplicates but failed and I am sure there should an erro here. Is there anyone who can help me fix this erro? NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. SELECT IF DUP=1. EXE. I got all records wiped out. Thanks. Charles |
In reply to this post by Charles Deng
Your third line should have read:
IF (LAG(SSN) = SSN) DUP=LAG(DUP) + 1. Since the original IF command produced a value of 0 for all records, the Select command, looking for values of 1, effectively deleted all cases. Victor Charles Deng wrote: > Hi All: > > I tried to use LAG to delete duplicates but failed and I am sure there > should an erro here. Is there anyone who can help me fix this erro? > > NUMERIC DUP (F4). > COMPUTE DUP=0. > IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. > SELECT IF DUP=1. > EXE. > > > I got all records wiped out. > > Thanks. > > > Charles > > > > |
In reply to this post by Charles Deng
Hi Charles
Try this: sort cases by ssn. compute dup=0. if (ssn=lag(ssn,1)) dup=1. select if dup=0. I would guess what you have below will yield only the second of any duplicated cases, while this will select the first duplicate as well as all unique cases in the file, removing duplicates. Don't forget to sort by ssn first - could explain why the example below didn't yield any records. Either that or the data set didn't have any duplicates. Craig -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Charles Deng Sent: Wednesday, June 21, 2006 4:11 PM To: [hidden email] Subject: Syntax Lag Hi All: I tried to use LAG to delete duplicates but failed and I am sure there should an erro here. Is there anyone who can help me fix this erro? NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. SELECT IF DUP=1. EXE. I got all records wiped out. Thanks. Charles |
In reply to this post by Charles Deng
At 04:11 PM 6/21/2006, Charles Deng wrote:
>I tried to use LAG to delete duplicates but failed and I am sure there >should an erro here. Is there anyone who can help me fix this erro? > >NUMERIC DUP (F4). >COMPUTE DUP=0. >IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. >SELECT IF DUP=1. >EXE. > >I got all records wiped out. You have a couple of problems. The simplest is, you probably wanted to keep only the first instances of records that are in duplicate. For that, SELECT IF DUP=0, and it works. This is draft output: GET FILE=TESTDATA. LIST. List |-------------------------|------------------------| |Output Created |22-JUN-2006 12:45:06 | |-------------------------|------------------------| SEQ# SSN NAME INSTANCE 01 111-23-4567 Aaron A 02 112-34-5478 Benjamin A 03 112-34-5478 Benjamin B 04 112-34-5478 Benjamin C 05 113-45-6789 Catherine A 06 114-46-9876 Dilbert A 07 114-46-9876 Dilbert B 08 115-98-7654 Earnest A 09 116-87-6543 Francis A 10 116-87-6543 Francis B 11 117-65-4321 Gerald A Number of cases read: 11 Number of cases listed: 11 * Correcting SELECT IF .................... . NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. SELECT IF DUP=0 /* <- Select DUP=0 first instances */. LIST. List |-------------------------|------------------------| |Output Created |22-JUN-2006 12:45:06 | |-------------------------|------------------------| SEQ# SSN NAME INSTANCE DUP 01 111-23-4567 Aaron A 0 02 112-34-5478 Benjamin A 0 05 113-45-6789 Catherine A 0 06 114-46-9876 Dilbert A 0 08 115-98-7654 Earnest A 0 09 116-87-6543 Francis A 0 11 117-65-4321 Gerald A 0 Number of cases read: 7 Number of cases listed: 7 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ But, when you selected for DUP=1, why did you get no cases at all? Sometimes, when you're selecting across groups of cases, an EXECUTE is necessary. Without it, the first record is deleted because it's not a duplicate. Then the second record is the new 'first' record, so IT'S deleted; and nothing is ever selected. Look at the output of the PRINT statement: Every record is record 1, and every record has DUP=0. So, no records are taken - "LIST" gives "Number of cases read: 0". * As originally posted .................... . * Does not work. . GET FILE=TESTDATA. NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. PRINT / ' Record' $CASENUM ': ' SEQ# SSN NAME INSTANCE ' Dup=' DUP. SELECT IF DUP=1 /* <- Select DUP=1 later instances */. LIST. List |-------------------------|------------------------| |Output Created |22-JUN-2006 12:45:06 | |-------------------------|------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS\2006-06-21 Deng - Syntax Lag.SAV Record 1 : 01 111-23-4567 Aaron A Dup= 0 Record 1 : 02 112-34-5478 Benjamin A Dup= 0 Record 1 : 03 112-34-5478 Benjamin B Dup= 0 Record 1 : 04 112-34-5478 Benjamin C Dup= 0 Record 1 : 05 113-45-6789 Catherine A Dup= 0 Record 1 : 06 114-46-9876 Dilbert A Dup= 0 Record 1 : 07 114-46-9876 Dilbert B Dup= 0 Record 1 : 08 115-98-7654 Earnest A Dup= 0 Record 1 : 09 116-87-6543 Francis A Dup= 0 Record 1 : 10 116-87-6543 Francis B Dup= 0 Record 1 : 11 117-65-4321 Gerald A Dup= 0 Number of cases read: 0 Number of cases listed: 0 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ If you put in an EXECUTE statement, then you *will* get the records for which DUP=1. * Adding EXECUTE .................... . GET FILE=TESTDATA. NUMERIC DUP (F4). COMPUTE DUP=0. IF SSN=LAG(SSN) DUP=LAG(DUP) + 1. PRINT / ' Record' $CASENUM ': ' SEQ# SSN NAME INSTANCE ' Dup=' DUP. EXECUTE /* <- This EXECUTE is needed */. Record 1 : 01 111-23-4567 Aaron A Dup= 0 Record 2 : 02 112-34-5478 Benjamin A Dup= 0 Record 3 : 03 112-34-5478 Benjamin B Dup= 1 Record 4 : 04 112-34-5478 Benjamin C Dup= 2 Record 5 : 05 113-45-6789 Catherine A Dup= 0 Record 6 : 06 114-46-9876 Dilbert A Dup= 0 Record 7 : 07 114-46-9876 Dilbert B Dup= 1 Record 8 : 08 115-98-7654 Earnest A Dup= 0 Record 9 : 09 116-87-6543 Francis A Dup= 0 Record 10 : 10 116-87-6543 Francis B Dup= 1 Record 11 : 11 117-65-4321 Gerald A Dup= 0 SELECT IF DUP=1 /* <- Select DUP=1 later instances */. LIST. List |-------------------------|------------------------| |Output Created |22-JUN-2006 12:45:06 | |-------------------------|------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS\2006-06-21 Deng - Syntax Lag.SAV SEQ# SSN NAME INSTANCE DUP 03 112-34-5478 Benjamin B 1 07 114-46-9876 Dilbert B 1 10 116-87-6543 Francis B 1 Number of cases read: 3 Number of cases listed: 3 |
Free forum by Nabble | Edit this page |