Clustering Census Data Time Series Variables

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustering Census Data Time Series Variables

David Short
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Census Data Time Series Variables

Art Kendall
How many cities do you have?
How many time points?
How many variables do you have repeats (times) for?

Do you know how to find a correlation/similarity/dissimilarity between the time series (sets of repeated measures) for 2 cities?

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Census Data Time Series Variables

David Short
Art,

To answer your questions: I have 8000 cities with time series of 5-50 data points each.  I want to cluster some cities on just one feature or characteristic.  In a second analysis, I want to cluster the cities based on multiple feature time series (The latter example would be similar to clustering the SKU's of a warehouse distribution center and using Sales, number of returns, etc. to cluster the SKU's).

I mostly use the SPSS GUI to get dissimilarity and similarity output.  But, I don't see any procedure to do what I'm asking with time series data that preserves both structure and scale, can use data with serial correlation, etc.  And, without a cookbook, I couldn't use the algorithms for dissimilarity in SPSS or know how to pick the correct one from those available for time series.

Short
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Census Data Time Series Variables

David Short
In reply to this post by David Short
Art,
As to the number of variables, it is census data so it could be hundreds of varibles, but realistically only5-20.

Short
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Census Data Time Series Variables

Art Kendall
In your discipline, how would you calculate some kind of  proximity/correlation/similarity/tracking measure between the sets of repeats/times for 1 pair of cities?

There are many ways to cluster cases once such a measure is chosen. If there is not a measure typically used between time series in your discipline, perhaps, other list members can suggest measures for looking at times series for 2 places.

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Census Data Time Series Variables

Maguin, Eugene
In reply to this post by David Short
This is not something I know anything about but I'm curious enough to ask. In a later reply to Art you say you have data on 8,000 cities and 5-50 data points for each city (I assume you mean that the employment data series varies in length from city to city with the shortest being 5 and the longest being 50. If that is not correct, let's say that it is for my question.) I'm curious about the analysis. Do you envision fitting an ARIMA or growth curve model to the data for each city and then clustering cities based on the model coefficients or are you planning to cluster the data series themselves?
Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Short
Sent: Sunday, June 19, 2016 3:39 AM
To: [hidden email]
Subject: Clustering Census Data Time Series Variables

Hello,

I'm using SPSS 23 and Windows 7 Ultimate.  My question is, in SPSS, is it possible to cluster on time series data?  For example, if i want to find clusters of cities based on their time series data relating to  the variable "number of people employed" over a period of 20 years, is this possible?
One target of the information is to find (cluster) cities with similar growth patterns.  If it's not possible, do you have a suggestion of where to look for a solution?

Much Thanks,
David Short




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Clustering-Census-Data-Time-Series-Variables-tp5732488.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Census Data Time Series Variables

Jon Peck
It seems to me that there isn't really enough data to fit individual ARIMA or similar models city by city.  Why not estimate a simpler city-specific growth curve using CURVEFIT; then aggregate the other variables appropriately so that you have a dataset with 8000 cases consisting for each of the aggregated variables and the coefficient(s) from CURVEFIT and feed that to a clustering routine?  You might want to standardize at least the employment data first in order to make the CURVEFIT coefficients comparable.

One unavoidable problem is that with different lengths of the time series, the curves will be estimated over different time periods, and macro effects will contaminate the growth estimates.  You might be able to check this if you can select a subset of the data covering the same time periods and see how this affects the results.

On Mon, Jun 20, 2016 at 7:23 AM, Maguin, Eugene <[hidden email]> wrote:
This is not something I know anything about but I'm curious enough to ask. In a later reply to Art you say you have data on 8,000 cities and 5-50 data points for each city (I assume you mean that the employment data series varies in length from city to city with the shortest being 5 and the longest being 50. If that is not correct, let's say that it is for my question.) I'm curious about the analysis. Do you envision fitting an ARIMA or growth curve model to the data for each city and then clustering cities based on the model coefficients or are you planning to cluster the data series themselves?
Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Short
Sent: Sunday, June 19, 2016 3:39 AM
To: [hidden email]
Subject: Clustering Census Data Time Series Variables

Hello,

I'm using SPSS 23 and Windows 7 Ultimate.  My question is, in SPSS, is it possible to cluster on time series data?  For example, if i want to find clusters of cities based on their time series data relating to  the variable "number of people employed" over a period of 20 years, is this possible?
One target of the information is to find (cluster) cities with similar growth patterns.  If it's not possible, do you have a suggestion of where to look for a solution?

Much Thanks,
David Short




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Clustering-Census-Data-Time-Series-Variables-tp5732488.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD