SPSSX Discussion

Percentiles with AGGREGATE command. Is this possible?

Classic

List

Threaded

4 messages Options

Frank Milthorpe-2

Percentiles with AGGREGATE command. Is this possible?

I am seeking to summarise traffic count data collected by automated counting systems. I have hourly counts for multiple days for multiple sites. Directionality is important as many locations have a peak direction of flow (towards the CBD has higher counts in the AM period whilst away from the CBD typically has higher flows in the PM period). For my requirements I am seeking details on an average school day as this provides typical flows used for planning purposes. I have a calendar and can readily restrict my data to the approximately 200 school days in the year. This is “real” data with issues! Low counts may be genuine as a result of weather, accidents, up-stream or down-stream congestion, partially faulty equipment which may only record some traffic or genuine variation. Consequently, median tends to be a more reliable central measure than mean as the low counts can be more readily discarded.

For each site, direction and hour I would like to obtain the median, and selected percentiles, say 10, 25, 50, 75 & 90 (clearly 50 is the median). The percentiles can be used to provide a “confidence interval” (not in the strict statistical terminology).

With the AGGREGATE command I can readily obtain the mean, median and many other statistics but seemingly not percentiles. With the FREQUENCIES command there is the PERCENTILES subcommand which allows me to specify the required percentiles. To obtain this for each site, direction and hour I need to SORT and SPLIT the file. The output from this command is a frequency reporting table which is not easy to readily incorporate with the data that is obtained via the AGGREGATE command.

I have contemplated using the FGT(varlist,value) option within AGGREGATE to find the fraction of cases greater than a certain value. However, with the large variation in counts between different sites (multi-lane roads vs busier single lane roads) and times of day this is not really practical. I have also considered sorting the data by site, direction, hour and count so that the data is in sequential order and then deriving percentiles by own calculations. To achieve this I can use the aggregate command to calculate the number of records for each site, direction, hour and write this record count onto the file. I can then create a sequence id (reset to one for the first record for each new site, direction and hour) and derive the percentile. This is a cumbersome process if there is a better alternative.

The final file that I seeking will have the following variables:

Site direction hour mean median p10 p20 p50 p75 p90.

I am seeking suggestions as to how I can summarise my data to obtain the variable listed above.

Thanking you in advance.

Cheers

Frank Milthorpe

Frank Milthorpe
Manager Transport Model Development

Bureau of Statistics and Analytics

Freight, Strategy and Planning
Transport for NSW

T 02 8588 5559 | BSA 02 8202 2702
Level 3 18 Lee Street Chippendale NSW 2008

This email (including any attachments) may contain confidential and/or legally privileged information and is intended only to be read or used by the addressee(s). If you have received this email in error, please notify the sender by return email, delete this email and destroy any copy. Any use, distribution, disclosure or copying of this email by a person who is not the intended recipient is not authorised.

Views expressed in this email are those of the individual sender, and are not necessarily the views of Transport for NSW, Department of Transport or any other NSW government agency. Transport for NSW and the Department of Transport assume no liability for any loss, damage or other consequence which may arise from opening or using an email or attachment.
Please visit us at http://www.transport.nsw.gov.au or http://www.transportnsw.info
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: Percentiles with AGGREGATE command. Is this possible?

Here is one way.

1. Sort the file by Site, Direction, and Hour and TrafficCount.
2. Use CasesToVars to create a file with 200+ variables, var001 to var215 (say),
which will be in order of increasing counts.
3. Use Mean.100(var001 to var215) to get the Mean.

4a. If all datasets were "n=215", then you just need to pick out the var-numbers that
correspond to the percentiles you want; save the file with just those few, renaming them.
or
4b. If the n is not equal to "215" but varies all over the place, you could use their actual n's
and set up your algorithm to pick out locations for the desired percentiles, which you find
by using the index for the vector for Traffic(215).

Does this do what you need?

--
Rich Ulrich

Date: Wed, 2 Dec 2015 15:39:05 +1100
From: [hidden email]
Subject: Percentiles with AGGREGATE command. Is this possible?
To: [hidden email]

I am seeking to summarise traffic count data collected by automated counting systems. I have hourly counts for multiple days for multiple sites.

.
.

The final file that I seeking will have the following variables:

Site direction hour mean median p10 p20 p50 p75 p90.

I am seeking suggestions as to how I can summarise my data to obtain the variable listed above.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

David Marso

Re: Percentiles with AGGREGATE command. Is this possible?

Administrator

In reply to this post by Frank Milthorpe-2

Study the RANK command .
NTILES followed by MATCH using FIRST function should suffice

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Art Kendall

Re: Percentiles with AGGREGATE command. Is this possible?

In reply to this post by Frank Milthorpe-2

Look up RANK with NTILES and By options

Art Kendall
Social Research Consultants