Help with Creating ADP

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with Creating ADP

Stella Vasquez

I have a rather large (36894 records) dataset covering the past 15 years. Each record represents a juveniles intake and release dates and times. I am trying to determine a method to calculate average daily population--that
is on any given year how many juveniles did we have.

 

Here is a small sample of the database:

 

ID

StartDateTime

EndDateTime

Status

32742

12/29/2015 10:17

07/12/2016 11:07

2

32746

01/05/2016 10:28

02/28/2016 1:08

2

32756

01/19/2016 10:10

08/16/2016 12:20

2

32759

01/26/2016 11:08

08/16/2016 12:07

2

32759

11/10/2016 13:04

11/22/2016 9:24

2

32759

11/22/2016 9:24

11/23/2016 11:06

2

32759

11/23/2016 11:06

2

32760

01/26/2016 10:42

06/10/2016 6:20

2

32763

01/26/2016 8:27

10/18/2016 11:35

2

32767

02/02/2016 9:27

12/23/2016 5:59

2

32768

02/02/2016 8:42

07/12/2016 9:59

2

32770

02/02/2016 8:42

06/21/2016 12:56

2

32770

08/08/2016 8:49

08/15/2016 14:44

2

32770

08/15/2016 14:44

08/18/2016 10:37

2

32770

08/18/2016 10:37

12/20/2016 10:18

2

32771

02/02/2016 8:42

08/02/2016 11:05

2

32772

02/09/2016 9:33

2

32773

02/02/2016 10:23

11/22/2016 9:49

2

32774

02/02/2016 10:23

2

32775

02/02/2016 13:30

06/28/2016 11:56

2

32780

02/09/2016 9:56

2

32781

02/09/2016 10:42

10/02/2016 6:23

2



Any suggestions?


Stella Vasquez, Continuous Improvement Bureau

Arizona Department of Juvenile Corrections | 1624 W. Adams St., Phoenix, AZ 85007 | 602-542-2272 | [hidden email]

 





**The information contained in this communication is privileged and confidential and is intended solely for the individual[s] and/or entities named herein. This information is not to be disseminated. If you have received this message in error, please reply to this message and notify the sender of the error and then permanently delete the message and sent item. Thank you.** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Help with Creating ADP

Andy W
There are probably better solutions, but one way is to expand the data using the LOOP and XSAVE trick to days. Then you can aggregate the number of observations to days. Example below.

************************************************************************.
*Change to a location on your machine.
FILE HANDLE save /NAME = "C:\Users\axw161530\Desktop\XSAVE_Examp".

*Example data.
DATA LIST LIST(",") /Id (F1.0) BeginDate (ADATE10) BeginTime (TIME5) EndDate (ADATE10) EndTime (TIME5).
BEGIN DATA
32742,12/29/2015, 10:17,07/12/2016, 11:07
32746,01/05/2016, 10:28,02/28/2016, 1:08
32756,01/19/2016, 10:10,08/16/2016, 12:20
32759,01/26/2016, 11:08,08/16/2016, 12:07
32759,11/10/2016, 13:04,11/22/2016, 9:24
32759,11/22/2016,  9:24,11/23/2016, 11:06
32759,11/23/2016, 11:06,
END DATA.
DATASET NAME Examp.

*Recoding missing end dates to current date time, assuming they are still in the facility.
IF MISSING(EndDate) EndDate = $TIME.

*Now looping over the dates for begin and end.
NUMERIC DayIn (ADATE10).
LOOP #i = 0 TO DATEDIFF(EndDate,BeginDate,"DAYS").
  COMPUTE DayIn = DATESUM(BeginDate,#i,"DAYS").
  XSAVE OUTFILE = "save\ExpandedData.sav" /KEEP Id DayIn.
END LOOP.

GET FILE = "save\ExpandedData.sav".
DATASET NAME ExpandedData.
DATASET DECLARE DayCount.
AGGREGATE OUTFILE = 'DayCount'
  /BREAK DayIn
  /JuvCount = N.
************************************************************************.

Even with 40,000 records this expanded data should not be too large a file for SPSS to deal with fairly quickly.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Help with Creating ADP

Rich Ulrich
In reply to this post by Stella Vasquez

Please be aware, and make your client aware, that there are some arbitrary decisions to be made about Who is Counted.

Also, be aware that a simple method of counting can be confused by decisions in the record-keeping:  You might want to

edit the file to fit some definitions before you do something like applying Andy's algorithm.


- What do they /want/ when they say "average daily population", and does that differ from "how many juveniles did we have"?

For instance, 32746 checks out at 1:08 in the morning of the 28th.  Should he be counted as being present on the 28th? (If not,,

you could edit the Exit, for this run, to read 23:59 on the 27th.)


 - I see that 32759 has a Startdate dated exactly the same as the Enddate on the previous line. A simple count would include

him twice for that day.


 If "How many different juveniles were in the year" is an interesting question, that requires some special counting or aggregation

within the years.


Hope this helps.

--

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Stella Vasquez <[hidden email]>
Sent: Tuesday, January 17, 2017 12:12:41 PM
To: [hidden email]
Subject: Help with Creating ADP
 

I have a rather large (36894 records) dataset covering the past 15 years. Each record represents a juveniles intake and release dates and times. I am trying to determine a method to calculate average daily population--that
is on any given year how many juveniles did we have.

 

Here is a small sample of the database:

 

ID

StartDateTime

EndDateTime

Status

32742

12/29/2015 10:17

07/12/2016 11:07

2

32746

01/05/2016 10:28

02/28/2016 1:08

2

32756

01/19/2016 10:10

08/16/2016 12:20

2

32759

01/26/2016 11:08

08/16/2016 12:07

2

32759

11/10/2016 13:04

11/22/2016 9:24

2

32759

11/22/2016 9:24

11/23/2016 11:06

2

32759

11/23/2016 11:06

2

32760

01/26/2016 10:42

06/10/2016 6:20

2

32763

01/26/2016 8:27

10/18/2016 11:35

2

32767

02/02/2016 9:27

12/23/2016 5:59

2

32768

02/02/2016 8:42

07/12/2016 9:59

2

32770

02/02/2016 8:42

06/21/2016 12:56

2

32770

08/08/2016 8:49

08/15/2016 14:44

2

32770

08/15/2016 14:44

08/18/2016 10:37

2

32770

08/18/2016 10:37

12/20/2016 10:18

2

32771

02/02/2016 8:42

08/02/2016 11:05

2

32772

02/09/2016 9:33

2

32773

02/02/2016 10:23

11/22/2016 9:49

2

32774

02/02/2016 10:23

2

32775

02/02/2016 13:30

06/28/2016 11:56

2

32780

02/09/2016 9:56

2

32781

02/09/2016 10:42

10/02/2016 6:23

2



Any suggestions?


Stella Vasquez, Continuous Improvement Bureau

Arizona Department of Juvenile Corrections | 1624 W. Adams St., Phoenix, AZ 85007 | 602-542-2272 | [hidden email]

 





**The information contained in this communication is privileged and confidential and is intended solely for the individual[s] and/or entities named herein. This information is not to be disseminated. If you have received this message in error, please reply to this message and notify the sender of the error and then permanently delete the message and sent item. Thank you.** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Help with Creating ADP

Andy W
Agree, good advice Rich.

My example set would not double count a line that had the same beginning end data, but one thing it would do is if the end date is before the begin date it would expand the data backwards. (This might be ok if it is because the dates were transposed, it wouldn't be ok if that was just a data error though.)

My original code example would double count if a person was input/output multiple times on the same day.

So if you had a person

12/29/2015 10:17 to 12/30/2015 11:17

and then that person went right back in later that day:

12/30/2015 19:17 to 1/10/2016 10:17

My code would spit out two 12/30 days for that person. One way to not double count though is to eliminate duplicate Ids + dates on the expanded dataset. See the updated example below.

************************************************************************.
*Change to a location on your machine.
FILE HANDLE save /NAME = "C:\Users\axw161530\Desktop\XSAVE_Examp".

*Example data.
DATA LIST LIST(",") /Id (F1.0) BeginDate (ADATE10) BeginTime (TIME5) EndDate (ADATE10) EndTime (TIME5).
BEGIN DATA
32742,12/29/2015, 10:17,07/12/2016, 11:07
32746,01/05/2016, 10:28,02/28/2016, 1:08
32756,01/19/2016, 10:10,08/16/2016, 12:20
32759,01/26/2016, 11:08,08/16/2016, 12:07
32759,11/10/2016, 13:04,11/22/2016, 9:24
32759,11/22/2016,  9:24,11/23/2016, 11:06
32759,11/23/2016, 11:06,
END DATA.
DATASET NAME Examp.

*Recoding missing end dates to current date time, assuming they are still in the facility.
IF MISSING(EndDate) EndDate = $TIME.

*Now looping over the dates for begin and end.
NUMERIC DayIn (ADATE10).
LOOP #i = 0 TO DATEDIFF(EndDate,BeginDate,"DAYS").
  COMPUTE DayIn = DATESUM(BeginDate,#i,"DAYS").
  XSAVE OUTFILE = "save\ExpandedData.sav" /KEEP Id DayIn.
END LOOP.

GET FILE = "save\ExpandedData.sav".
DATASET NAME ExpandedData.
*Eliminate duplicate person plus days.
SORT CASES BY Id DayIn.
MATCH FILES FILE = * /FIRST = Keep /BY Id DayIn.
SELECT IF Keep = 1.
DATASET DECLARE DayCount.
AGGREGATE OUTFILE = 'DayCount'
  /BREAK DayIn
  /JuvCount = N.
************************************************************************.

This does not count partial days, if you wanted to do fractional values for them that would take alittle more work.

Counting unique individuals per year I would not do it like this, but you could get there from this data as well. With the expanded dataset compute a year, then aggregate to unique IDs plus year, then a second aggregate up to the year variable (or just a frequency table would do).
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/