I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual.
Sydelle Raffe, Alameda County Social Services Agency Information Services Division e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ERPP request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm |
Try this:
Suppose VarID and VarDate are the client ID and date variables respectively GET FILE = "File Name". SORT cases by VarID(A) VarDate (D). NUMERIC eps_num (F4). COMPUTE eps_num=0. IF VarID=LAG(VarID) eps_num=LAG(eps_num) + 1. EXECUTE. SELECT IF eps_num=0 /* Select eps_num=0 , record with the most recent date */. EXECUTE. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Raffe, Sydelle, SSA Sent: Friday, September 15, 2006 2:25 PM To: [hidden email] Subject: selecting most recent record I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual. Sydelle Raffe, Alameda County Social Services Agency Information Services Division e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ERPP request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm |
In reply to this post by Raffe, Sydelle, SSA
Is this sort of what you had in mind?
*create some sample data. data list free /customer (f3) date (adate10). begin data 1 10/21/2005 2 10/28/2005 1 6/20/1986 2 10/29/2005 end data. *real job starts here. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=customer /most_recent = LAST(date). select if date=most_recent. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Friday, September 15, 2006 1:25 PM To: [hidden email] Subject: selecting most recent record I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual. Sydelle Raffe, Alameda County Social Services Agency Information Services Division e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ERPP request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm |
In reply to this post by Raffe, Sydelle, SSA
Sydelle:
Try this. I use it regularly on Medi-Cal data and lead poisoning case management data. After sorting the data set on the key VARS, MATCH FILES flags the first record for each patient by the date VAR. You then have options. You can run freqs on FIRST without deleting anything. Or SELECT IF will delete all but the first record. An alternative after the SORT is to run RANK CASES instead of MATCH FILES. This will create a rank variable and number the records for each ID variable sequentially beginning with 1. SORT CASES BY empid (A) svcdt (A) . MATCH FILES FILE=* /BY empid svcdt / FIRST = first. SELECT IF first. Norm Trondsen CDHS-CLPPB Program Evaluation & Research 850 Marina Bay Parkway Building P, Third Floor Richmond, CA 94808-6403 510.620.5655 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Friday, September 15, 2006 11:25 AM To: [hidden email] Subject: selecting most recent record I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual. Sydelle Raffe, Alameda County Social Services Agency Information Services Division e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ERPP request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_ reports/odm/index.cfm |
Correct idea, but not quite correct implementation. I think this is what you mean:
MATCH FILES FILE=* /BY empid / last = last . You don't want to use both id and date as BY variables; you just want to use id as the BY variable. And you want to use LAST (or sort in descending order) to get the most recent date. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Trondsen, Norman(DHS-CLPPB) Sent: Friday, September 15, 2006 3:39 PM To: [hidden email] Subject: Re: selecting most recent record Sydelle: Try this. I use it regularly on Medi-Cal data and lead poisoning case management data. After sorting the data set on the key VARS, MATCH FILES flags the first record for each patient by the date VAR. You then have options. You can run freqs on FIRST without deleting anything. Or SELECT IF will delete all but the first record. An alternative after the SORT is to run RANK CASES instead of MATCH FILES. This will create a rank variable and number the records for each ID variable sequentially beginning with 1. SORT CASES BY empid (A) svcdt (A) . MATCH FILES FILE=* /BY empid svcdt / FIRST = first. SELECT IF first. Norm Trondsen CDHS-CLPPB Program Evaluation & Research 850 Marina Bay Parkway Building P, Third Floor Richmond, CA 94808-6403 510.620.5655 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Friday, September 15, 2006 11:25 AM To: [hidden email] Subject: selecting most recent record I have a data set in which there are multiple records for each client. I would like to select the record with the most recent date to keep. I've tried the lag function and restructuring the data. Can anyone help? Thanks as usual. Sydelle Raffe, Alameda County Social Services Agency Information Services Division e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ERPP request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_ reports/odm/index.cfm |
Dear Friend
If your problem is this I have a soluction. Client Data 1 20/09/2010 1 21/09/2010 1 22/09/2010 2 01/01/1999 2 08/01/2000 3 10/10/2001 3 21/01/1999 This syntax is sufficient to decide the problem above * Identify Duplicate Cases. SORT CASES BY Client(A) Date(A) . MATCH FILES /FILE = * /BY Client /FIRST = PrimaryFirst /LAST = PrimaryLast. DO IF (PrimaryFirst). COMPUTE MatchSequence = 1 - PrimaryLast. ELSE. COMPUTE MatchSequence = MatchSequence + 1. END IF. LEAVE MatchSequence. FORMAT MatchSequence (f7). COMPUTE InDupGrp = MatchSequence > 0. SORT CASES InDupGrp(D). MATCH FILES /FILE = * /DROP = PrimaryFirst InDupGrp MatchSequence. VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary' . VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'. VARIABLE LEVEL PrimaryLast (ORDINAL). FREQUENCIES VARIABLES = PrimaryLast . EXECUTE. FILTER OFF. USE ALL. SELECT IF(NOT(PrimaryLast =0)). EXECUTE . Only the last case of each client is in the current file now. Good bye. Carlos Renato Statistician +55 (81)35811246 2006/9/15, Oliver, Richard <[hidden email]>: > > Correct idea, but not quite correct implementation. I think this is what > you mean: > > MATCH FILES FILE=* /BY empid / last = last . > > You don't want to use both id and date as BY variables; you just want to > use id as the BY variable. And you want to use LAST (or sort in descending > order) to get the most recent date. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Trondsen, Norman(DHS-CLPPB) > Sent: Friday, September 15, 2006 3:39 PM > To: [hidden email] > Subject: Re: selecting most recent record > > Sydelle: > > Try this. I use it regularly on Medi-Cal data and lead poisoning case > management data. After sorting the data set on the key VARS, MATCH FILES > flags the first record for each patient by the date VAR. You then have > options. You can run freqs on FIRST without deleting anything. Or SELECT > IF will delete all but the first record. > > An alternative after the SORT is to run RANK CASES instead of MATCH > FILES. This will create a rank variable and number the records for each ID > variable sequentially beginning with 1. > > SORT CASES BY > empid (A) svcdt (A) . > > MATCH FILES FILE=* /BY empid svcdt / FIRST = first. > > SELECT IF first. > > > > Norm Trondsen > > CDHS-CLPPB > > Program Evaluation & Research > > 850 Marina Bay Parkway > > Building P, Third Floor > > Richmond, CA 94808-6403 > > 510.620.5655 > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Raffe, Sydelle, SSA > Sent: Friday, September 15, 2006 11:25 AM > To: [hidden email] > Subject: selecting most recent record > > > I have a data set in which there are multiple records for each client. I > would like to select the record with the most recent date to keep. I've > tried the lag function and restructuring the data. Can anyone help? > Thanks as usual. > > Sydelle Raffe, Alameda County Social Services Agency Information Services > Division > e:mail: [hidden email] > phone: 510-271-9174 fax: 510-271-9107 > If you have a request for information, please submit an ERPP request form > at: > https://alamedasocialservices.org/staff/support_services/statistics_and_ > reports/odm/index.cfm > -- Carlos Renato |
In reply to this post by Raffe, Sydelle, SSA
Thanks to all who responded. I used Edward Boadie's approach and it worked fine.
Sydelle Raffe, Alameda County Social Services Agency Information Services Division e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ERPP request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm |
Free forum by Nabble | Edit this page |