Avg. Daily Population and Validation

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Avg. Daily Population and Validation

DKUKEC
I have a rather large dataset which includes multiple charges for inmates
admitted to and released from county jail.  More specifically, the dataset
contains an InmateId (Num), AdmissionID(Num),  AdmissionDate (Date22),
ReleaseDate (Date22), and ChargeCode (Num).  I would like to create a new
dataset that includes total number of inmates in jail in a given month
along with “average daily count”.

Here is a sample of the current dataset.

InmateID AdmissionID AdmissionDate ReleaseDate
11111  1  1-Apr-2001  2-Apr-2001
11111  1  1-Apr-2001  2-Apr-2001
22222  2  22-Sep-2002  23-Aug-2003
33333  3  30-Aug-2003  2-Feb-2004
33333  4  1-Jul-2001  4-Jul-2001
33333  4  1-Jul-2001  4-Jul-2001
44444  5  23-May-2004  27-Jun-2004
44444  5  23-May-2004  27-Jun-2004
44444  6  25-May-2009  26-May-2009

New DataSet with VARS

MonthYear TotalInmates AvgPop(perMonthYear)

Any suggested syntax would be greatly appreciated and I am also wondering
about data validation options for the first dataset.  For example, I would
like to set up parameters that flag cases where the dates don’t make
sense.  For example, an inmate release date occurs prior to admission date,
or more complete validation such as an inmate is admitted; however, the
release date suggests that the person is still in jail.

Thanks,
Damir

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Avg. Daily Population and Validation

ariel barak

Hi Damir,

Since I asked this question to the list late last year, I feel compelled to be the one to help and pay it forward. Originally Richard Ristow and Gene Maguin helped me. The first question I had is why there are duplicate records? I used the Identify Duplicate Cases syntax that can be generated by clicking Data>Identify Duplicates. As for the bad data, you'll see I wrote two conditions for bad data - first where the release data comes before the admit date and the second when the admission date on the current record is before the Inmate has been released on a prior episode. You can add any other conditions as you wish.

I sometimes analyze data from a secure detention facility with 55 beds and we are never completely empty, so this code is written on the condition that there are no days where the facility is empty. Since you are working with jail data, I assume that assumption is safe also. If that's not the case, you can create your own table with the number of days in different months and years going back until you cover all year month combinations you're interested in and match those days onto the Census Days DATASET.
 
You will need to change where the Census Days dataset is saved to and retrieved from by changing the location
in the LOOP XSAVE and also in the GET FILE piece of the syntax to a folder you have access to.

If there are a lot of issues with bad data, you can get accurate results regardless of the bad data by running the Identify Duplicate Cases on the same InmateID and Today and drop any duplicates. For example, if the same InmateID was in on the same day during two different episodes, you could drop the duplicates and still have an accurate count even though the underlying data may have issues. To illustrate this, say you have an InmateID 54223 that has an episode from 11/22/2008 to 12/15/2008 and also an episode from 12/13/2008 to 12/30/2008...running the syntax below without fixing the data beforehand will create the days with overlap multiple times, there would be two days for 12/13/2008, 12/14/2008, and 12/15/2008, which would inflate your average daily population. But if you run an Identify Duplicate Cases on InmateID and variable Today, only of of the two days that have overlap will be flagged as primary. You can then delete those that are flagged as duplicates and know that your aggregate data will not include any individual held on the same day, even though the underlying data may say that. I hope I've made sense.

FYI, a census day is any day where the inmate has been in the faciliy, which includes both the admission and release date.

Thanks,
Ariel

*Sample Data.
DATA LIST LIST /InmateID(F8) AdmissionID (A7) AdmissionDate (ADATE) ReleaseDate (ADATE).
BEGIN DATA
11111 1 4/1/2001 4/2/2001
11111 1 4/1/2001 4/2/2001
22222 2 9/22/2002 8/23/2003
33333 3 8/30/2003 2/2/2004
33333 4 7/1/2001 7/4/2001
33333 4 7/1/2001 7/4/2001
44444 5 5/23/2004 6/27/2004
44444 5 5/23/2004 6/27/2004
44444 6 5/25/2009 5/26/2009
END DATA.

DATASET NAME OriginalData.

* Identify Duplicate Cases.
SORT CASES BY InmateID(A) AdmissionID(A) AdmissionDate(A) ReleaseDate(A).
MATCH FILES
  /FILE=*
  /BY InmateID AdmissionID AdmissionDate ReleaseDate
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMAT  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.

SELECT IF PrimaryLast=1.
EXE.

DELETE VARIABLE PrimaryLast.

SORT CASES BY InmateID AdmissionDate.

COMPUTE BadData=0.
IF ReleaseDate<AdmissionDate BadData=1.
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1.
EXE.

FREQ BadData.

*You will want to clean the data here.

NUMERIC Today            (DATE11)
   /AdmitToday DischToday (F2).
VAR LABEL
    Today      "Date, within patient's hospital stay"
    AdmitToday "Flag: Patient admitted this date"
    DischToday "Flag: Patient discharged this date".

LOOP  Today = AdmissionDate to ReleaseDate BY TIME.DAYS(1).
.  COMPUTE AdmitToday = (Today EQ AdmissionDate).
.  COMPUTE DischToday = (Today EQ ReleaseDate).
.  XSAVE OUTFILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'
    /KEEP=InmateID AdmissionID AdmissionDate ReleaseDate Today AdmitToday DischToday.
END LOOP.
EXECUTE  /* this one is needed */.

GET FILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'.
DATASET NAME CensusDaysCreated WINDOW=FRONT.

COMPUTE Year=XDATE.YEAR(Today).
COMPUTE Month=XDATE.MONTH(Today).
EXE.

*Census Days.
CROSSTABS Year by Month.

* Identify Duplicate Cases.
SORT CASES BY Today(A).
MATCH FILES
  /FILE=*
  /BY Today
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMAT  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.

RENAME VARIABLES PrimaryLast=UniqueDay.

DATASET DECLARE DaysinMonth.
AGGREGATE
  /OUTFILE='DaysinMonth'
  /BREAK=Year Month
  /DaysInMonth=SUM(UniqueDay).

DATASET DECLARE CensusDays.
AGGREGATE
  /OUTFILE='CensusDays'
  /BREAK=Year MOnth
  /Admissions=SUM(AdmitToday)
  /Releases=SUM(DischToday)
  /CensusDays=N.
DATASET ACTIVATE CensusDays.

MATCH FILES /FILE=*
  /TABLE='DaysinMonth'
  /BY  Year Month.
EXECUTE.

COMPUTE AvgPop=CensusDays/DaysInMonth.
EXE.

On Thu, May 28, 2009 at 1:13 PM, Damir <[hidden email]> wrote:
I have a rather large dataset which includes multiple charges for inmates
admitted to and released from county jail.  More specifically, the dataset
contains an InmateId (Num), AdmissionID(Num),  AdmissionDate (Date22),
ReleaseDate (Date22), and ChargeCode (Num).  I would like to create a new
dataset that includes total number of inmates in jail in a given month
along with “average daily count”.

Here is a sample of the current dataset.

InmateID AdmissionID AdmissionDate ReleaseDate
11111  1  1-Apr-2001  2-Apr-2001
11111  1  1-Apr-2001  2-Apr-2001
22222  2  22-Sep-2002  23-Aug-2003
33333  3  30-Aug-2003  2-Feb-2004
33333  4  1-Jul-2001  4-Jul-2001
33333  4  1-Jul-2001  4-Jul-2001
44444  5  23-May-2004  27-Jun-2004
44444  5  23-May-2004  27-Jun-2004
44444  6  25-May-2009  26-May-2009

New DataSet with VARS

MonthYear TotalInmates AvgPop(perMonthYear)

Any suggested syntax would be greatly appreciated and I am also wondering
about data validation options for the first dataset.  For example, I would
like to set up parameters that flag cases where the dates don’t make
sense.  For example, an inmate release date occurs prior to admission date,
or more complete validation such as an inmate is admitted; however, the
release date suggests that the person is still in jail.

Thanks,
Damir

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Avg. Daily Population and Validation

Ruben Geert van den Berg
Dear Ariel,
 
Thanks for the interesting syntax. However (please excuse me if I'm mistaking), there seem to be 2 errors in it.
 
I think that
 
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1.
EXE.
 
should have been
 
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmissionDate) BadData=1.
EXE.

Kind regards,
 
Ruben


 



 

Date: Fri, 29 May 2009 11:47:20 -0500
From: [hidden email]
Subject: Re: Avg. Daily Population and Validation
To: [hidden email]

Hi Damir,
Since I asked this question to the list late last year, I feel compelled to be the one to help and pay it forward. Originally Richard Ristow and Gene Maguin helped me. The first question I had is why there are duplicate records? I used the Identify Duplicate Cases syntax that can be generated by clicking Data>Identify Duplicates. As for the bad data, you'll see I wrote two conditions for bad data - first where the release data comes before the admit date and the second when the admission date on the current record is before the Inmate has been released on a prior episode. You can add any other conditions as you wish.
I sometimes analyze data from a secure detention facility with 55 beds and we are never completely empty, so this code is written on the condition that there are no days where the facility is empty. Since you are working with jail data, I assume that assumption is safe also. If that's not the case, you can create your own table with the number of days in different months and years going back until you cover all year month combinations you're interested in and match those days onto the Census Days DATASET.
 
You will need to change where the Census Days dataset is saved to and retrieved from by changing the location
in the LOOP XSAVE and also in the GET FILE piece of the syntax to a folder you have access to.
If there are a lot of issues with bad data, you can get accurate results regardless of the bad data by running the Identify Duplicate Cases on the same InmateID and Today and drop any duplicates. For example, if the same InmateID was in on the same day during two different episodes, you could drop the duplicates and still have an accurate count even though the underlying data may have issues. To illustrate this, say you have an InmateID 54223 that has an episode from 11/22/2008 to 12/15/2008 and also an episode from 12/13/2008 to 12/30/2008...running the syntax below without fixing the data beforehand will create the days with overlap multiple times, there would be two days for 12/13/2008, 12/14/2008, and 12/15/2008, which would inflate your average daily population. But if you run an Identify Duplicate Cases on InmateID and variable Today, only of of the two days that have overlap will be flagged as primary. You can then delete those that are flagged as duplicates and know that your aggregate data will not include any individual held on the same day, even though the underlying data may say that. I hope I've made sense.
FYI, a census day is any day where the inmate has been in the faciliy, which includes both the admission and release date.
Thanks,
Ariel
*Sample Data.
DATA LIST LIST /InmateID(F8) AdmissionID (A7) AdmissionDate (ADATE) ReleaseDate (ADATE).
BEGIN DATA
11111 1 4/1/2001 4/2/2001
11111 1 4/1/2001 4/2/2001
22222 2 9/22/2002 8/23/2003
33333 3 8/30/2003 2/2/2004
33333 4 7/1/2001 7/4/2001
33333 4 7/1/2001 7/4/2001
44444 5 5/23/2004 6/27/2004
44444 5 5/23/2004 6/27/2004
44444 6 5/25/2009 5/26/2009
END DATA.
DATASET NAME OriginalData.
* Identify Duplicate Cases.
SORT CASES BY InmateID(A) AdmissionID(A) AdmissionDate(A) ReleaseDate(A).
MATCH FILES
  /FILE=*
  /BY InmateID AdmissionID AdmissionDate ReleaseDate
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMAT  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.
SELECT IF PrimaryLast=1.
EXE.
DELETE VARIABLE PrimaryLast.
SORT CASES BY InmateID AdmissionDate.
COMPUTE BadData=0.
IF ReleaseDate<AdmissionDate BadData=1.
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1.
EXE.
FREQ BadData.
*You will want to clean the data here.
NUMERIC Today            (DATE11)
   /AdmitToday DischToday (F2).
VAR LABEL
    Today      "Date, within patient's hospital stay"
    AdmitToday "Flag: Patient admitted this date"
    DischToday "Flag: Patient discharged this date".
LOOP  Today = AdmissionDate to ReleaseDate BY TIME.DAYS(1).
.  COMPUTE AdmitToday = (Today EQ AdmissionDate).
.  COMPUTE DischToday = (Today EQ ReleaseDate).
.  XSAVE OUTFILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'
    /KEEP=InmateID AdmissionID AdmissionDate ReleaseDate Today AdmitToday DischToday.
END LOOP.
EXECUTE  /* this one is needed */.
GET FILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'.
DATASET NAME CensusDaysCreated WINDOW=FRONT.
COMPUTE Year=XDATE.YEAR(Today).
COMPUTE Month=XDATE.MONTH(Today).
EXE.
*Census Days.
CROSSTABS Year by Month.
* Identify Duplicate Cases.
SORT CASES BY Today(A).
MATCH FILES
  /FILE=*
  /BY Today
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMAT  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.
RENAME VARIABLES PrimaryLast=UniqueDay.
DATASET DECLARE DaysinMonth.
AGGREGATE
  /OUTFILE='DaysinMonth'
  /BREAK=Year Month
  /DaysInMonth=SUM(UniqueDay).
DATASET DECLARE CensusDays.
AGGREGATE
  /OUTFILE='CensusDays'
  /BREAK=Year MOnth
  /Admissions=SUM(AdmitToday)
  /Releases=SUM(DischToday)
  /CensusDays=N.
DATASET ACTIVATE CensusDays.
MATCH FILES /FILE=*
  /TABLE='DaysinMonth'
  /BY  Year Month.
EXECUTE.
COMPUTE AvgPop=CensusDays/DaysInMonth.
EXE.


On Thu, May 28, 2009 at 1:13 PM, Damir <[hidden email]> wrote:
I have a rather large dataset which includes multiple charges for inmates
admitted to and released from county jail.  More specifically, the dataset
contains an InmateId (Num), AdmissionID(Num),  AdmissionDate (Date22),
ReleaseDate (Date22), and ChargeCode (Num).  I would like to create a new
dataset that includes total number of inmates in jail in a given month
along with “average daily count”.

Here is a sample of the current dataset.

InmateID AdmissionID AdmissionDate ReleaseDate
11111  1  1-Apr-2001  2-Apr-2001
11111  1  1-Apr-2001  2-Apr-2001
22222  2  22-Sep-2002  23-Aug-2003
33333  3  30-Aug-2003  2-Feb-2004
33333  4  1-Jul-2001  4-Jul-2001
33333  4  1-Jul-2001  4-Jul-2001
44444  5  23-May-2004  27-Jun-2004
44444  5  23-May-2004  27-Jun-2004
44444  6  25-May-2009  26-May-2009

New DataSet with VARS

MonthYear TotalInmates AvgPop(perMonthYear)

Any suggested syntax would be greatly appreciated and I am also wondering
about data validation options for the first dataset.  For example, I would
like to set up parameters that flag cases where the dates don’t make
sense.  For example, an inmate release date occurs prior to admission date,
or more complete validation such as an inmate is admitted; however, the
release date suggests that the person is still in jail.

Thanks,
Damir

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



Express yourself instantly with MSN Messenger! MSN Messenger
Reply | Threaded
Open this post in threaded view
|

Re: Avg. Daily Population and Validation

ariel barak
Hi Ruben,

Thanks for catching my errors...the Admission Date in my datasets is always AdmitDate and I did forget to add a round bracket.

Best,
Ariel

On Sat, May 30, 2009 at 4:54 PM, Ruben van den Berg <[hidden email]> wrote:
Dear Ariel,
 
Thanks for the interesting syntax. However (please excuse me if I'm mistaking), there seem to be 2 errors in it.
 
I think that
 
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1.
EXE.
 
should have been
 
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmissionDate) BadData=1.
EXE.

Kind regards,
 
Ruben


 



 

Date: Fri, 29 May 2009 11:47:20 -0500
From: [hidden email]

Subject: Re: Avg. Daily Population and Validation
To: [hidden email]


Hi Damir,
Since I asked this question to the list late last year, I feel compelled to be the one to help and pay it forward. Originally Richard Ristow and Gene Maguin helped me. The first question I had is why there are duplicate records? I used the Identify Duplicate Cases syntax that can be generated by clicking Data>Identify Duplicates. As for the bad data, you'll see I wrote two conditions for bad data - first where the release data comes before the admit date and the second when the admission date on the current record is before the Inmate has been released on a prior episode. You can add any other conditions as you wish.
I sometimes analyze data from a secure detention facility with 55 beds and we are never completely empty, so this code is written on the condition that there are no days where the facility is empty. Since you are working with jail data, I assume that assumption is safe also. If that's not the case, you can create your own table with the number of days in different months and years going back until you cover all year month combinations you're interested in and match those days onto the Census Days DATASET.
 
You will need to change where the Census Days dataset is saved to and retrieved from by changing the location
in the LOOP XSAVE and also in the GET FILE piece of the syntax to a folder you have access to.
If there are a lot of issues with bad data, you can get accurate results regardless of the bad data by running the Identify Duplicate Cases on the same InmateID and Today and drop any duplicates. For example, if the same InmateID was in on the same day during two different episodes, you could drop the duplicates and still have an accurate count even though the underlying data may have issues. To illustrate this, say you have an InmateID 54223 that has an episode from 11/22/2008 to 12/15/2008 and also an episode from 12/13/2008 to 12/30/2008...running the syntax below without fixing the data beforehand will create the days with overlap multiple times, there would be two days for 12/13/2008, 12/14/2008, and 12/15/2008, which would inflate your average daily population. But if you run an Identify Duplicate Cases on InmateID and variable Today, only of of the two days that have overlap will be flagged as primary. You can then delete those that are flagged as duplicates and know that your aggregate data will not include any individual held on the same day, even though the underlying data may say that. I hope I've made sense.
FYI, a census day is any day where the inmate has been in the faciliy, which includes both the admission and release date.
Thanks,
Ariel
*Sample Data.
DATA LIST LIST /InmateID(F8) AdmissionID (A7) AdmissionDate (ADATE) ReleaseDate (ADATE).
BEGIN DATA
11111 1 4/1/2001 4/2/2001
11111 1 4/1/2001 4/2/2001
22222 2 9/22/2002 8/23/2003
33333 3 8/30/2003 2/2/2004
33333 4 7/1/2001 7/4/2001
33333 4 7/1/2001 7/4/2001
44444 5 5/23/2004 6/27/2004
44444 5 5/23/2004 6/27/2004
44444 6 5/25/2009 5/26/2009
END DATA.
DATASET NAME OriginalData.
* Identify Duplicate Cases.
SORT CASES BY InmateID(A) AdmissionID(A) AdmissionDate(A) ReleaseDate(A).
MATCH FILES
  /FILE=*
  /BY InmateID AdmissionID AdmissionDate ReleaseDate
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMAT  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.
SELECT IF PrimaryLast=1.
EXE.
DELETE VARIABLE PrimaryLast.
SORT CASES BY InmateID AdmissionDate.
COMPUTE BadData=0.
IF ReleaseDate<AdmissionDate BadData=1.
IF (LAG(InmateID,1)=InmateID) AND (LAG(ReleaseDate,1)>AdmitDate BadData=1.
EXE.
FREQ BadData.
*You will want to clean the data here.
NUMERIC Today            (DATE11)
   /AdmitToday DischToday (F2).
VAR LABEL
    Today      "Date, within patient's hospital stay"
    AdmitToday "Flag: Patient admitted this date"
    DischToday "Flag: Patient discharged this date".
LOOP  Today = AdmissionDate to ReleaseDate BY TIME.DAYS(1).
.  COMPUTE AdmitToday = (Today EQ AdmissionDate).
.  COMPUTE DischToday = (Today EQ ReleaseDate).
.  XSAVE OUTFILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'
    /KEEP=InmateID AdmissionID AdmissionDate ReleaseDate Today AdmitToday DischToday.
END LOOP.
EXECUTE  /* this one is needed */.
GET FILE='H:\Syntax Files\Listserv ADP Example - unroll.sav'.
DATASET NAME CensusDaysCreated WINDOW=FRONT.
COMPUTE Year=XDATE.YEAR(Today).
COMPUTE Month=XDATE.MONTH(Today).
EXE.
*Census Days.
CROSSTABS Year by Month.
* Identify Duplicate Cases.
SORT CASES BY Today(A).
MATCH FILES
  /FILE=*
  /BY Today
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMAT  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.
RENAME VARIABLES PrimaryLast=UniqueDay.
DATASET DECLARE DaysinMonth.
AGGREGATE
  /OUTFILE='DaysinMonth'
  /BREAK=Year Month
  /DaysInMonth=SUM(UniqueDay).
DATASET DECLARE CensusDays.
AGGREGATE
  /OUTFILE='CensusDays'
  /BREAK=Year MOnth
  /Admissions=SUM(AdmitToday)
  /Releases=SUM(DischToday)
  /CensusDays=N.
DATASET ACTIVATE CensusDays.
MATCH FILES /FILE=*
  /TABLE='DaysinMonth'
  /BY  Year Month.
EXECUTE.
COMPUTE AvgPop=CensusDays/DaysInMonth.
EXE.


On Thu, May 28, 2009 at 1:13 PM, Damir <[hidden email]> wrote:
I have a rather large dataset which includes multiple charges for inmates
admitted to and released from county jail.  More specifically, the dataset
contains an InmateId (Num), AdmissionID(Num),  AdmissionDate (Date22),
ReleaseDate (Date22), and ChargeCode (Num).  I would like to create a new
dataset that includes total number of inmates in jail in a given month
along with “average daily count”.

Here is a sample of the current dataset.

InmateID AdmissionID AdmissionDate ReleaseDate
11111  1  1-Apr-2001  2-Apr-2001
11111  1  1-Apr-2001  2-Apr-2001
22222  2  22-Sep-2002  23-Aug-2003
33333  3  30-Aug-2003  2-Feb-2004
33333  4  1-Jul-2001  4-Jul-2001
33333  4  1-Jul-2001  4-Jul-2001
44444  5  23-May-2004  27-Jun-2004
44444  5  23-May-2004  27-Jun-2004
44444  6  25-May-2009  26-May-2009

New DataSet with VARS

MonthYear TotalInmates AvgPop(perMonthYear)

Any suggested syntax would be greatly appreciated and I am also wondering
about data validation options for the first dataset.  For example, I would
like to set up parameters that flag cases where the dates don’t make
sense.  For example, an inmate release date occurs prior to admission date,
or more complete validation such as an inmate is admitted; however, the
release date suggests that the person is still in jail.

Thanks,
Damir

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



Express yourself instantly with MSN Messenger! MSN Messenger