selecting most recent record

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

selecting most recent record

Raffe, Sydelle, SSA
I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual.

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ERPP request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm
Reply | Threaded
Open this post in threaded view
|

Re: selecting most recent record

Edward Boadi
Try this:

Suppose VarID and VarDate are the client ID and date variables respectively

GET FILE = "File Name".
SORT cases by VarID(A)  VarDate (D).
NUMERIC eps_num (F4).
COMPUTE eps_num=0.
IF VarID=LAG(VarID)  eps_num=LAG(eps_num) + 1.
EXECUTE.
SELECT IF eps_num=0 /* Select eps_num=0 , record with the most recent date */.
EXECUTE.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Raffe, Sydelle, SSA
Sent: Friday, September 15, 2006 2:25 PM
To: [hidden email]
Subject: selecting most recent record


I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual.

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ERPP request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm
Reply | Threaded
Open this post in threaded view
|

Re: selecting most recent record

Oliver, Richard
In reply to this post by Raffe, Sydelle, SSA
Is this sort of what you had in mind?

*create some sample data.
data list free /customer (f3) date (adate10).
begin data
1 10/21/2005
2 10/28/2005
1 6/20/1986
2 10/29/2005
end data.
*real job starts here.
AGGREGATE
  /OUTFILE=*
  MODE=ADDVARIABLES
  /BREAK=customer
  /most_recent = LAST(date).
select if date=most_recent.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Friday, September 15, 2006 1:25 PM
To: [hidden email]
Subject: selecting most recent record

I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual.

Sydelle Raffe, Alameda County Social Services Agency Information Services Division
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ERPP request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm
Reply | Threaded
Open this post in threaded view
|

Re: selecting most recent record

Trondsen, Norman(DHS-CLPPB)
In reply to this post by Raffe, Sydelle, SSA
Sydelle:

Try this.  I use it regularly on Medi-Cal data and lead poisoning case
management data.  After sorting the data set on the key VARS, MATCH
FILES flags the first record for each patient by the date VAR. You then
have options.  You can run freqs on FIRST without deleting anything.  Or
SELECT IF will delete all but the first record.

An alternative after the SORT is to run RANK CASES instead of MATCH
FILES.  This will create a rank variable and number the records for each
ID variable sequentially beginning with 1.

SORT CASES BY
  empid (A) svcdt (A) .

MATCH FILES FILE=*  /BY empid  svcdt / FIRST = first.

SELECT IF first.

 

Norm Trondsen

CDHS-CLPPB

Program Evaluation & Research

850 Marina Bay Parkway

Building P, Third Floor

Richmond, CA  94808-6403

510.620.5655


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Raffe, Sydelle, SSA
Sent: Friday, September 15, 2006 11:25 AM
To: [hidden email]
Subject: selecting most recent record


I have a data set in which there are multiple records for each client. I
would like to select the record with the most recent date to keep. I've
tried the lag function and restructuring the data. Can anyone help?
Thanks as usual.

Sydelle Raffe, Alameda County Social Services Agency Information
Services Division
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ERPP request
form at:
https://alamedasocialservices.org/staff/support_services/statistics_and_
reports/odm/index.cfm
Reply | Threaded
Open this post in threaded view
|

Re: selecting most recent record

Oliver, Richard
Correct idea, but not quite correct implementation. I think this is what you mean:

MATCH FILES FILE=*  /BY empid / last = last .

You don't want to use both id and date as BY variables; you just want to use id as the BY variable. And you want to use LAST (or sort in descending order) to get the most recent date.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Trondsen, Norman(DHS-CLPPB)
Sent: Friday, September 15, 2006 3:39 PM
To: [hidden email]
Subject: Re: selecting most recent record

Sydelle:

Try this.  I use it regularly on Medi-Cal data and lead poisoning case management data.  After sorting the data set on the key VARS, MATCH FILES flags the first record for each patient by the date VAR. You then have options.  You can run freqs on FIRST without deleting anything.  Or SELECT IF will delete all but the first record.

An alternative after the SORT is to run RANK CASES instead of MATCH FILES.  This will create a rank variable and number the records for each ID variable sequentially beginning with 1.

SORT CASES BY
  empid (A) svcdt (A) .

MATCH FILES FILE=*  /BY empid  svcdt / FIRST = first.

SELECT IF first.



Norm Trondsen

CDHS-CLPPB

Program Evaluation & Research

850 Marina Bay Parkway

Building P, Third Floor

Richmond, CA  94808-6403

510.620.5655


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Friday, September 15, 2006 11:25 AM
To: [hidden email]
Subject: selecting most recent record


I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help?
Thanks as usual.

Sydelle Raffe, Alameda County Social Services Agency Information Services Division
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ERPP request form at:
https://alamedasocialservices.org/staff/support_services/statistics_and_
reports/odm/index.cfm
Reply | Threaded
Open this post in threaded view
|

Re: selecting most recent record

Carlos Renato-2
Dear Friend

If your problem is this I have a soluction.

Client  Data
   1      20/09/2010
   1      21/09/2010
   1      22/09/2010
   2      01/01/1999
   2      08/01/2000
   3      10/10/2001
   3      21/01/1999

This syntax is sufficient to decide the problem above

* Identify Duplicate Cases.
SORT CASES BY Client(A) Date(A) .
MATCH FILES /FILE = * /BY Client
 /FIRST = PrimaryFirst /LAST = PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence = 1 - PrimaryLast.
ELSE.
COMPUTE MatchSequence = MatchSequence + 1.
END IF.
LEAVE MatchSequence.
FORMAT MatchSequence (f7).
COMPUTE InDupGrp = MatchSequence > 0.
SORT CASES InDupGrp(D).
MATCH FILES /FILE = * /DROP = PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as
Primary' .
VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES = PrimaryLast .
EXECUTE.
FILTER OFF.
USE ALL.
SELECT IF(NOT(PrimaryLast =0)).
EXECUTE .

Only the last case of each client is in the current file now.

Good bye.

Carlos Renato
Statistician
+55 (81)35811246

2006/9/15, Oliver, Richard <[hidden email]>:

>
> Correct idea, but not quite correct implementation. I think this is what
> you mean:
>
> MATCH FILES FILE=*  /BY empid / last = last .
>
> You don't want to use both id and date as BY variables; you just want to
> use id as the BY variable. And you want to use LAST (or sort in descending
> order) to get the most recent date.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Trondsen, Norman(DHS-CLPPB)
> Sent: Friday, September 15, 2006 3:39 PM
> To: [hidden email]
> Subject: Re: selecting most recent record
>
> Sydelle:
>
> Try this.  I use it regularly on Medi-Cal data and lead poisoning case
> management data.  After sorting the data set on the key VARS, MATCH FILES
> flags the first record for each patient by the date VAR. You then have
> options.  You can run freqs on FIRST without deleting anything.  Or SELECT
> IF will delete all but the first record.
>
> An alternative after the SORT is to run RANK CASES instead of MATCH
> FILES.  This will create a rank variable and number the records for each ID
> variable sequentially beginning with 1.
>
> SORT CASES BY
>   empid (A) svcdt (A) .
>
> MATCH FILES FILE=*  /BY empid  svcdt / FIRST = first.
>
> SELECT IF first.
>
>
>
> Norm Trondsen
>
> CDHS-CLPPB
>
> Program Evaluation & Research
>
> 850 Marina Bay Parkway
>
> Building P, Third Floor
>
> Richmond, CA  94808-6403
>
> 510.620.5655
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Raffe, Sydelle, SSA
> Sent: Friday, September 15, 2006 11:25 AM
> To: [hidden email]
> Subject: selecting most recent record
>
>
> I have a data set in which there are multiple records for each client. I
> would like to select the record with the most recent date to keep. I've
> tried the lag function and restructuring the data. Can anyone help?
> Thanks as usual.
>
> Sydelle Raffe, Alameda County Social Services Agency Information Services
> Division
> e:mail:  [hidden email]
> phone: 510-271-9174     fax: 510-271-9107
> If you have a request for information, please submit an ERPP request form
> at:
> https://alamedasocialservices.org/staff/support_services/statistics_and_
> reports/odm/index.cfm
>



--
Carlos Renato
Reply | Threaded
Open this post in threaded view
|

Re: selecting most recent record

Raffe, Sydelle, SSA
In reply to this post by Raffe, Sydelle, SSA
Thanks to all who responded. I used Edward Boadie's approach and it worked fine.

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ERPP request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm