Syntax Lag

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Syntax Lag

Charles Deng
Hi All:

I tried to use LAG to delete duplicates but failed and I am sure there
should an erro here.  Is there anyone who can help me fix this erro?

NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
SELECT IF DUP=1.
EXE.


 I got all records wiped out.

Thanks.


Charles
Reply | Threaded
Open this post in threaded view
|

Re: Syntax Lag

Siraj Ur-rehman
Try use Data -> Identify Duplicate cases


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Charles Deng
Sent: Wednesday, June 21, 2006 4:11 PM
To: [hidden email]
Subject: Syntax Lag

Hi All:

I tried to use LAG to delete duplicates but failed and I am sure there
should an erro here.  Is there anyone who can help me fix this erro?

NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
SELECT IF DUP=1.
EXE.


 I got all records wiped out.

Thanks.


Charles
Reply | Threaded
Open this post in threaded view
|

Re: Syntax Lag

Edward Boadi
In reply to this post by Charles Deng
Try this :

COMPUTE DUP=1.
EXE.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
EXE.
SELECT IF DUP = 1.
EXE.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Charles Deng
Sent: Wednesday, June 21, 2006 4:11 PM
To: [hidden email]
Subject: Syntax Lag


Hi All:

I tried to use LAG to delete duplicates but failed and I am sure there
should an erro here.  Is there anyone who can help me fix this erro?

NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
SELECT IF DUP=1.
EXE.


 I got all records wiped out.

Thanks.


Charles
Reply | Threaded
Open this post in threaded view
|

Re: Syntax Lag

Victor Kogler
In reply to this post by Charles Deng
Your third line should have read:

IF (LAG(SSN) = SSN) DUP=LAG(DUP) + 1.

Since the original IF command produced a value of 0 for all records, the
Select command, looking for values of 1, effectively deleted all cases.


Victor


Charles Deng wrote:

> Hi All:
>
> I tried to use LAG to delete duplicates but failed and I am sure there
> should an erro here.  Is there anyone who can help me fix this erro?
>
> NUMERIC DUP (F4).
> COMPUTE DUP=0.
> IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
> SELECT IF DUP=1.
> EXE.
>
>
>  I got all records wiped out.
>
> Thanks.
>
>
> Charles
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Syntax Lag

Bauer, Craig
In reply to this post by Charles Deng
Hi Charles

Try this:

sort cases by ssn.
compute dup=0.
if (ssn=lag(ssn,1)) dup=1.
select if dup=0.

I would guess what you have below will yield only the second of any
duplicated cases, while this will select the first duplicate as well as
all unique cases in the file, removing duplicates.

Don't forget to sort by ssn first - could explain why the example below
didn't yield any records.  Either that or the data set didn't have any
duplicates.

Craig

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Charles Deng
Sent: Wednesday, June 21, 2006 4:11 PM
To: [hidden email]
Subject: Syntax Lag

Hi All:

I tried to use LAG to delete duplicates but failed and I am sure there
should an erro here.  Is there anyone who can help me fix this erro?

NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
SELECT IF DUP=1.
EXE.


 I got all records wiped out.

Thanks.


Charles
Reply | Threaded
Open this post in threaded view
|

Re: Syntax Lag

Richard Ristow
In reply to this post by Charles Deng
At 04:11 PM 6/21/2006, Charles Deng wrote:

>I tried to use LAG to delete duplicates but failed and I am sure there
>should an erro here.  Is there anyone who can help me fix this erro?
>
>NUMERIC DUP (F4).
>COMPUTE DUP=0.
>IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
>SELECT IF DUP=1.
>EXE.
>
>I got all records wiped out.

You have a couple of problems. The simplest is, you probably wanted to
keep only the first instances of records that are in duplicate. For
that, SELECT IF DUP=0, and it works. This is draft output:

GET FILE=TESTDATA.
LIST.

List
|-------------------------|------------------------|
|Output Created           |22-JUN-2006 12:45:06    |
|-------------------------|------------------------|

SEQ# SSN         NAME      INSTANCE

  01  111-23-4567 Aaron     A
  02  112-34-5478 Benjamin  A
  03  112-34-5478 Benjamin  B
  04  112-34-5478 Benjamin  C
  05  113-45-6789 Catherine A
  06  114-46-9876 Dilbert   A
  07  114-46-9876 Dilbert   B
  08  115-98-7654 Earnest   A
  09  116-87-6543 Francis   A
  10  116-87-6543 Francis   B
  11  117-65-4321 Gerald    A

Number of cases read:  11    Number of cases listed:  11


* Correcting SELECT IF   .................... .
NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
SELECT IF DUP=0 /* <- Select DUP=0 first instances */.
LIST.

List
|-------------------------|------------------------|
|Output Created           |22-JUN-2006 12:45:06    |
|-------------------------|------------------------|

SEQ# SSN         NAME      INSTANCE  DUP

  01  111-23-4567 Aaron     A           0
  02  112-34-5478 Benjamin  A           0
  05  113-45-6789 Catherine A           0
  06  114-46-9876 Dilbert   A           0
  08  115-98-7654 Earnest   A           0
  09  116-87-6543 Francis   A           0
  11  117-65-4321 Gerald    A           0

Number of cases read:  7    Number of cases listed:  7

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
But, when you selected for DUP=1, why did you get no cases at all?

Sometimes, when you're selecting across groups of cases, an EXECUTE is
necessary. Without it, the first record is deleted because it's not a
duplicate. Then the second record is the new 'first' record, so IT'S
deleted; and nothing is ever selected. Look at the output of the PRINT
statement: Every record is record 1, and every record has DUP=0. So, no
records are taken - "LIST" gives "Number of cases read:  0".

* As originally posted   .................... .
* Does not work.                              .
GET FILE=TESTDATA.
NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
PRINT /
   '  Record' $CASENUM ': ' SEQ# SSN NAME INSTANCE ' Dup=' DUP.
SELECT IF DUP=1 /* <- Select DUP=1 later instances */.
LIST.

List
|-------------------------|------------------------|
|Output Created           |22-JUN-2006 12:45:06    |
|-------------------------|------------------------|

C:\Documents and Settings\Richard\My Documents
   \Temporary\SPSS\2006-06-21 Deng - Syntax Lag.SAV

   Record       1 : 01 111-23-4567 Aaron     A  Dup=   0
   Record       1 : 02 112-34-5478 Benjamin  A  Dup=   0
   Record       1 : 03 112-34-5478 Benjamin  B  Dup=   0
   Record       1 : 04 112-34-5478 Benjamin  C  Dup=   0
   Record       1 : 05 113-45-6789 Catherine A  Dup=   0
   Record       1 : 06 114-46-9876 Dilbert   A  Dup=   0
   Record       1 : 07 114-46-9876 Dilbert   B  Dup=   0
   Record       1 : 08 115-98-7654 Earnest   A  Dup=   0
   Record       1 : 09 116-87-6543 Francis   A  Dup=   0
   Record       1 : 10 116-87-6543 Francis   B  Dup=   0
   Record       1 : 11 117-65-4321 Gerald    A  Dup=   0

Number of cases read:  0    Number of cases listed:  0

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
If you put in an EXECUTE statement, then you *will* get the records for
which DUP=1.

* Adding EXECUTE         .................... .
GET FILE=TESTDATA.
NUMERIC DUP (F4).
COMPUTE DUP=0.
IF SSN=LAG(SSN)  DUP=LAG(DUP) + 1.
PRINT /
   '  Record' $CASENUM ': ' SEQ# SSN NAME INSTANCE ' Dup=' DUP.
EXECUTE         /* <- This EXECUTE is needed       */.
   Record       1 : 01 111-23-4567 Aaron     A  Dup=   0
   Record       2 : 02 112-34-5478 Benjamin  A  Dup=   0
   Record       3 : 03 112-34-5478 Benjamin  B  Dup=   1
   Record       4 : 04 112-34-5478 Benjamin  C  Dup=   2
   Record       5 : 05 113-45-6789 Catherine A  Dup=   0
   Record       6 : 06 114-46-9876 Dilbert   A  Dup=   0
   Record       7 : 07 114-46-9876 Dilbert   B  Dup=   1
   Record       8 : 08 115-98-7654 Earnest   A  Dup=   0
   Record       9 : 09 116-87-6543 Francis   A  Dup=   0
   Record      10 : 10 116-87-6543 Francis   B  Dup=   1
   Record      11 : 11 117-65-4321 Gerald    A  Dup=   0
SELECT IF DUP=1 /* <- Select DUP=1 later instances */.
LIST.

List
|-------------------------|------------------------|
|Output Created           |22-JUN-2006 12:45:06    |
|-------------------------|------------------------|
C:\Documents and Settings\Richard\My Documents
   \Temporary\SPSS\2006-06-21 Deng - Syntax Lag.SAV

SEQ# SSN         NAME      INSTANCE  DUP

  03  112-34-5478 Benjamin  B           1
  07  114-46-9876 Dilbert   B           1
  10  116-87-6543 Francis   B           1

Number of cases read:  3    Number of cases listed:  3