I have a rather large (36894 records) dataset covering the past 15 years. Each record represents a juveniles intake and release dates and times. I am trying to determine
a method to calculate average daily population--that Here is a small sample of the database:
Stella Vasquez, Continuous Improvement Bureau Arizona Department of Juvenile Corrections | 1624 W. Adams St., Phoenix, AZ 85007
| 602-542-2272 | [hidden email] **The information contained in this communication is privileged and confidential and is intended solely for the individual[s] and/or entities named herein. This information is not to be disseminated. If you have received this message in error, please reply to this message and notify the sender of the error and then permanently delete the message and sent item. Thank you.** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
There are probably better solutions, but one way is to expand the data using the LOOP and XSAVE trick to days. Then you can aggregate the number of observations to days. Example below.
************************************************************************. *Change to a location on your machine. FILE HANDLE save /NAME = "C:\Users\axw161530\Desktop\XSAVE_Examp". *Example data. DATA LIST LIST(",") /Id (F1.0) BeginDate (ADATE10) BeginTime (TIME5) EndDate (ADATE10) EndTime (TIME5). BEGIN DATA 32742,12/29/2015, 10:17,07/12/2016, 11:07 32746,01/05/2016, 10:28,02/28/2016, 1:08 32756,01/19/2016, 10:10,08/16/2016, 12:20 32759,01/26/2016, 11:08,08/16/2016, 12:07 32759,11/10/2016, 13:04,11/22/2016, 9:24 32759,11/22/2016, 9:24,11/23/2016, 11:06 32759,11/23/2016, 11:06, END DATA. DATASET NAME Examp. *Recoding missing end dates to current date time, assuming they are still in the facility. IF MISSING(EndDate) EndDate = $TIME. *Now looping over the dates for begin and end. NUMERIC DayIn (ADATE10). LOOP #i = 0 TO DATEDIFF(EndDate,BeginDate,"DAYS"). COMPUTE DayIn = DATESUM(BeginDate,#i,"DAYS"). XSAVE OUTFILE = "save\ExpandedData.sav" /KEEP Id DayIn. END LOOP. GET FILE = "save\ExpandedData.sav". DATASET NAME ExpandedData. DATASET DECLARE DayCount. AGGREGATE OUTFILE = 'DayCount' /BREAK DayIn /JuvCount = N. ************************************************************************. Even with 40,000 records this expanded data should not be too large a file for SPSS to deal with fairly quickly. |
In reply to this post by Stella Vasquez
Please be aware, and make your client aware, that there are some arbitrary decisions to be made about Who is Counted.
Also, be aware that a simple method of counting can be confused by decisions in the record-keeping: You might want to
edit the file to fit some definitions before you do something like applying Andy's algorithm.
- What do they /want/ when they say "average daily population", and does that differ from "how many juveniles did we have"? For instance, 32746 checks out at 1:08 in the morning of the 28th. Should he be counted as being present on the 28th? (If not,,
you could edit the Exit, for this run, to read 23:59 on the 27th.)
- I see that 32759 has a Startdate dated exactly the same as the Enddate on the previous line. A simple count would include him twice for that day.
If "How many different juveniles were in the year" is an interesting question, that requires some special counting or aggregation within the years.
Hope this helps. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Stella Vasquez <[hidden email]>
Sent: Tuesday, January 17, 2017 12:12:41 PM To: [hidden email] Subject: Help with Creating ADP I have a rather large (36894 records) dataset covering the past 15 years. Each record represents a juveniles intake and release dates and times. I am trying to determine
a method to calculate average daily population--that Here is a small sample of the database:
Stella Vasquez, Continuous Improvement Bureau Arizona Department of Juvenile Corrections | 1624 W. Adams St., Phoenix, AZ 85007
| 602-542-2272 | [hidden email] **The information contained in this communication is privileged and confidential and is intended solely for the individual[s] and/or entities named herein. This information is not to be disseminated. If you have received this message in error, please reply to this message and notify the sender of the error and then permanently delete the message and sent item. Thank you.** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Agree, good advice Rich.
My example set would not double count a line that had the same beginning end data, but one thing it would do is if the end date is before the begin date it would expand the data backwards. (This might be ok if it is because the dates were transposed, it wouldn't be ok if that was just a data error though.) My original code example would double count if a person was input/output multiple times on the same day. So if you had a person 12/29/2015 10:17 to 12/30/2015 11:17 and then that person went right back in later that day: 12/30/2015 19:17 to 1/10/2016 10:17 My code would spit out two 12/30 days for that person. One way to not double count though is to eliminate duplicate Ids + dates on the expanded dataset. See the updated example below. ************************************************************************. *Change to a location on your machine. FILE HANDLE save /NAME = "C:\Users\axw161530\Desktop\XSAVE_Examp". *Example data. DATA LIST LIST(",") /Id (F1.0) BeginDate (ADATE10) BeginTime (TIME5) EndDate (ADATE10) EndTime (TIME5). BEGIN DATA 32742,12/29/2015, 10:17,07/12/2016, 11:07 32746,01/05/2016, 10:28,02/28/2016, 1:08 32756,01/19/2016, 10:10,08/16/2016, 12:20 32759,01/26/2016, 11:08,08/16/2016, 12:07 32759,11/10/2016, 13:04,11/22/2016, 9:24 32759,11/22/2016, 9:24,11/23/2016, 11:06 32759,11/23/2016, 11:06, END DATA. DATASET NAME Examp. *Recoding missing end dates to current date time, assuming they are still in the facility. IF MISSING(EndDate) EndDate = $TIME. *Now looping over the dates for begin and end. NUMERIC DayIn (ADATE10). LOOP #i = 0 TO DATEDIFF(EndDate,BeginDate,"DAYS"). COMPUTE DayIn = DATESUM(BeginDate,#i,"DAYS"). XSAVE OUTFILE = "save\ExpandedData.sav" /KEEP Id DayIn. END LOOP. GET FILE = "save\ExpandedData.sav". DATASET NAME ExpandedData. *Eliminate duplicate person plus days. SORT CASES BY Id DayIn. MATCH FILES FILE = * /FIRST = Keep /BY Id DayIn. SELECT IF Keep = 1. DATASET DECLARE DayCount. AGGREGATE OUTFILE = 'DayCount' /BREAK DayIn /JuvCount = N. ************************************************************************. This does not count partial days, if you wanted to do fractional values for them that would take alittle more work. Counting unique individuals per year I would not do it like this, but you could get there from this data as well. With the expanded dataset compute a year, then aggregate to unique IDs plus year, then a second aggregate up to the year variable (or just a frequency table would do). |
Free forum by Nabble | Edit this page |