|
I have a rather large dataset which includes multiple charges for inmates
admitted to and released from county jail. More specifically, the dataset contains an InmateId (Num), AdmissionID(Num), AdmissionDate (Date22), ReleaseDate (Date22), and ChargeCode (Num). I would like to create a new dataset that includes total number of inmates in jail in a given month along with âaverage daily countâ. Here is a sample of the current dataset. InmateID AdmissionID AdmissionDate ReleaseDate 11111 1 1-Apr-2001 2-Apr-2001 11111 1 1-Apr-2001 2-Apr-2001 22222 2 22-Sep-2002 23-Aug-2003 33333 3 30-Aug-2003 2-Feb-2004 33333 4 1-Jul-2001 4-Jul-2001 33333 4 1-Jul-2001 4-Jul-2001 44444 5 23-May-2004 27-Jun-2004 44444 5 23-May-2004 27-Jun-2004 44444 6 25-May-2009 26-May-2009 New DataSet with VARS MonthYear TotalInmates AvgPop(perMonthYear) Any suggested syntax would be greatly appreciated and I am also wondering about data validation options for the first dataset. For example, I would like to set up parameters that flag cases where the dates donât make sense. For example, an inmate release date occurs prior to admission date, or more complete validation such as an inmate is admitted; however, the release date suggests that the person is still in jail. Thanks, Damir ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Damir, Since I asked this question to the list late last year, I feel compelled to be the one to help and pay it forward. Originally Richard Ristow and Gene Maguin helped me. The first question I had is why there are duplicate records? I used the Identify Duplicate Cases syntax that can be generated by clicking Data>Identify Duplicates. As for the bad data, you'll see I wrote two conditions for bad data - first where the release data comes before the admit date and the second when the admission date on the current record is before the Inmate has been released on a prior episode. You can add any other conditions as you wish. I sometimes analyze data from a secure detention facility with 55 beds and we are never completely empty, so this code is written on the condition that there are no days where the facility is empty. Since you are working with jail data, I assume that assumption is safe also. If that's not the case, you can create your own table with the number of days in different months and years going back until you cover all year month combinations you're interested in and match those days onto the Census Days DATASET.
You will need to change where the Census Days dataset is saved to and retrieved from by changing the location
in the LOOP XSAVE and also in the GET FILE piece of the syntax to a folder you have access to. If there are a lot of issues with bad data, you can get accurate results regardless of the bad data by running the Identify Duplicate Cases on the same InmateID and Today and drop any duplicates. For example, if the same InmateID was in on the same day during two different episodes, you could drop the duplicates and still have an accurate count even though the underlying data may have issues. To illustrate this, say you have an InmateID 54223 that has an episode from 11/22/2008 to 12/15/2008 and also an episode from 12/13/2008 to 12/30/2008...running the syntax below without fixing the data beforehand will create the days with overlap multiple times, there would be two days for 12/13/2008, 12/14/2008, and 12/15/2008, which would inflate your average daily population. But if you run an Identify Duplicate Cases on InmateID and variable Today, only of of the two days that have overlap will be flagged as primary. You can then delete those that are flagged as duplicates and know that your aggregate data will not include any individual held on the same day, even though the underlying data may say that. I hope I've made sense. FYI, a census day is any day where the inmate has been in the faciliy, which includes both the admission and release date. Thanks, *Sample Data. DATASET NAME OriginalData. * Identify Duplicate Cases. SELECT IF PrimaryLast=1. DELETE VARIABLE PrimaryLast. SORT CASES BY InmateID AdmissionDate. COMPUTE BadData=0. FREQ BadData. *You will want to clean the data here. NUMERIC Today (DATE11) LOOP Today = AdmissionDate to ReleaseDate BY TIME.DAYS(1). GET FILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'. COMPUTE Year=XDATE.YEAR(Today). *Census Days. * Identify Duplicate Cases. RENAME VARIABLES PrimaryLast=UniqueDay. DATASET DECLARE DaysinMonth. DATASET DECLARE CensusDays. MATCH FILES /FILE=* COMPUTE AvgPop=CensusDays/DaysInMonth. On Thu, May 28, 2009 at 1:13 PM, Damir <[hidden email]> wrote: I have a rather large dataset which includes multiple charges for inmates |
|
Dear Ariel,
Thanks for the interesting syntax. However (please excuse me if I'm mistaking), there seem to be 2 errors in it. I think that IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1. EXE. should have been IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmissionDate) BadData=1. EXE. Kind regards, Ruben Date: Fri, 29 May 2009 11:47:20 -0500 From: [hidden email] Subject: Re: Avg. Daily Population and Validation To: [hidden email] Hi Damir, Since I asked this question to the list late last year, I feel compelled to be the one to help and pay it forward. Originally Richard Ristow and Gene Maguin helped me. The first question I had is why there are duplicate records? I used the Identify Duplicate Cases syntax that can be generated by clicking Data>Identify Duplicates. As for the bad data, you'll see I wrote two conditions for bad data - first where the release data comes before the admit date and the second when the admission date on the current record is before the Inmate has been released on a prior episode. You can add any other conditions as you wish. I sometimes analyze data from a secure detention facility with 55 beds and we are never completely empty, so this code is written on the condition that there are no days where the facility is empty. Since you are working with jail data, I assume that assumption is safe also. If that's not the case, you can create your own table with the number of days in different months and years going back until you cover all year month combinations you're interested in and match those days onto the Census Days DATASET.
You will need to change where the Census Days dataset is saved to and retrieved from by changing the location
If there are a lot of issues with bad data, you can get accurate results regardless of the bad data by running the Identify Duplicate Cases on the same InmateID and Today and drop any duplicates. For example, if the same InmateID was in on the same day during two different episodes, you could drop the duplicates and still have an accurate count even though the underlying data may have issues. To illustrate this, say you have an InmateID 54223 that has an episode from 11/22/2008 to 12/15/2008 and also an episode from 12/13/2008 to 12/30/2008...running the syntax below without fixing the data beforehand will create the days with overlap multiple times, there would be two days for 12/13/2008, 12/14/2008, and 12/15/2008, which would inflate your average daily population. But if you run an Identify Duplicate Cases on InmateID and variable Today, only of of the two days that have overlap will be flagged as primary. You can then delete those that are flagged as duplicates and know that your aggregate data will not include any individual held on the same day, even though the underlying data may say that. I hope I've made sense.in the LOOP XSAVE and also in the GET FILE piece of the syntax to a folder you have access to. FYI, a census day is any day where the inmate has been in the faciliy, which includes both the admission and release date. Thanks, Ariel *Sample Data. DATA LIST LIST /InmateID(F8) AdmissionID (A7) AdmissionDate (ADATE) ReleaseDate (ADATE). BEGIN DATA 11111 1 4/1/2001 4/2/2001 11111 1 4/1/2001 4/2/2001 22222 2 9/22/2002 8/23/2003 33333 3 8/30/2003 2/2/2004 33333 4 7/1/2001 7/4/2001 33333 4 7/1/2001 7/4/2001 44444 5 5/23/2004 6/27/2004 44444 5 5/23/2004 6/27/2004 44444 6 5/25/2009 5/26/2009 END DATA. DATASET NAME OriginalData. * Identify Duplicate Cases. SORT CASES BY InmateID(A) AdmissionID(A) AdmissionDate(A) ReleaseDate(A). MATCH FILES /FILE=* /BY InmateID AdmissionID AdmissionDate ReleaseDate /FIRST=PrimaryFirst /LAST=PrimaryLast. DO IF (PrimaryFirst). COMPUTE MatchSequence=1-PrimaryLast. ELSE. COMPUTE MatchSequence=MatchSequence+1. END IF. LEAVE MatchSequence. FORMAT MatchSequence (f7). COMPUTE InDupGrp=MatchSequence>0. SORT CASES InDupGrp(D). MATCH FILES /FILE=* /DROP=PrimaryFirst InDupGrp MatchSequence. VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary'. VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'. VARIABLE LEVEL PrimaryLast (ORDINAL). FREQUENCIES VARIABLES=PrimaryLast. EXECUTE. SELECT IF PrimaryLast=1. EXE. DELETE VARIABLE PrimaryLast. SORT CASES BY InmateID AdmissionDate. COMPUTE BadData=0. IF ReleaseDate<AdmissionDate BadData=1. IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1. EXE. FREQ BadData. *You will want to clean the data here. NUMERIC Today (DATE11) /AdmitToday DischToday (F2). VAR LABEL Today "Date, within patient's hospital stay" AdmitToday "Flag: Patient admitted this date" DischToday "Flag: Patient discharged this date". LOOP Today = AdmissionDate to ReleaseDate BY TIME.DAYS(1). . COMPUTE AdmitToday = (Today EQ AdmissionDate). . COMPUTE DischToday = (Today EQ ReleaseDate). . XSAVE OUTFILE='H:\Syntax Files\Listserv ADP Example - unroll.sav' /KEEP=InmateID AdmissionID AdmissionDate ReleaseDate Today AdmitToday DischToday. END LOOP. EXECUTE /* this one is needed */. GET FILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'. DATASET NAME CensusDaysCreated WINDOW=FRONT. COMPUTE Year=XDATE.YEAR(Today). COMPUTE Month=XDATE.MONTH(Today). EXE. *Census Days. CROSSTABS Year by Month. * Identify Duplicate Cases. SORT CASES BY Today(A). MATCH FILES /FILE=* /BY Today /FIRST=PrimaryFirst /LAST=PrimaryLast. DO IF (PrimaryFirst). COMPUTE MatchSequence=1-PrimaryLast. ELSE. COMPUTE MatchSequence=MatchSequence+1. END IF. LEAVE MatchSequence. FORMAT MatchSequence (f7). COMPUTE InDupGrp=MatchSequence>0. SORT CASES InDupGrp(D). MATCH FILES /FILE=* /DROP=PrimaryFirst InDupGrp MatchSequence. VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary'. VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'. VARIABLE LEVEL PrimaryLast (ORDINAL). FREQUENCIES VARIABLES=PrimaryLast. EXECUTE. RENAME VARIABLES PrimaryLast=UniqueDay. DATASET DECLARE DaysinMonth. AGGREGATE /OUTFILE='DaysinMonth' /BREAK=Year Month /DaysInMonth=SUM(UniqueDay). DATASET DECLARE CensusDays. AGGREGATE /OUTFILE='CensusDays' /BREAK=Year MOnth /Admissions=SUM(AdmitToday) /Releases=SUM(DischToday) /CensusDays=N. DATASET ACTIVATE CensusDays. MATCH FILES /FILE=* /TABLE='DaysinMonth' /BY Year Month. EXECUTE. COMPUTE AvgPop=CensusDays/DaysInMonth. EXE. On Thu, May 28, 2009 at 1:13 PM, Damir <[hidden email]> wrote: I have a rather large dataset which includes multiple charges for inmates Express yourself instantly with MSN Messenger! MSN Messenger |
|
Hi Ruben,
Thanks for catching my errors...the Admission Date in my datasets is always AdmitDate and I did forget to add a round bracket. Best, Ariel On Sat, May 30, 2009 at 4:54 PM, Ruben van den Berg <[hidden email]> wrote:
|
| Free forum by Nabble | Edit this page |
