Correlation between two time series variables

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Correlation between two time series variables

E. Bernardo
Dear All,

The goal is to test if there is correlation between two continuous time series variables (Dependent=Y_t and Independent=X_t). The time series variables run from January 2010 up to December 2013. Can I use Linear Regression analysis so that the independent variables in my regression equation are X_t and Time? Comments on the use of Linear Regression Approach for time series data are welcome.

Thank you.
Eins
Reply | Threaded
Open this post in threaded view
|

Re: Correlation between two time series variables

Ryan
You should not employ OLS regression to calculate the correlation between two variables under a scenario in which each variable has been measured repeatedly [on the same subject] because it violates the assumption of independence. 

A linear mixed model could be employed to accurately estimate the linear correlation.

Ryan


On Mon, Jun 9, 2014 at 12:38 AM, E. Bernardo <[hidden email]> wrote:
Dear All,

The goal is to test if there is correlation between two continuous time series variables (Dependent=Y_t and Independent=X_t). The time series variables run from January 2010 up to December 2013. Can I use Linear Regression analysis so that the independent variables in my regression equation are X_t and Time? Comments on the use of Linear Regression Approach for time series data are welcome.

Thank you.
Eins

Reply | Threaded
Open this post in threaded view
|

Re: Correlation between two time series variables

Rich Ulrich
In a spirit of liberality, I would say that you can always calculate whatever
coefficient you want, and the definition of the coefficient probably stays the
same.  But here is what you lose when your assumptions fail badly: all the
statistical *tests* and the standard errors that generate those tests.   So, you
can get the r, but not the significance of the r.  A sample of hundreds might
"act as if" (for the purpose of testing) it has an N of just 2 or 3... which would
have a ridiculously large standard error.

Simple "repeated measures" for one subject is sometimes acceptable, if your
tests are within-subject and there is no time-series effect, that is, auto-correlation
between neighboring observations.  But the lack of "independence" is mainly
what puts time-series analyses into a category separate from ordinary regression.

Whenever there is dependency, special approaches are needed to devise tests that
either (a) adjust for the dependency by reducing the effective degrees of freedom,
or (b) remove the dependency from these data, or (c) use a different approach to
generate the standard errors for tests.

One of the approaches to achieving approximations to tests for time series tests is
by removing the dependency of serial data points by (for instance) removing linear
and other simple, polynomial trends.  Unfortunately, for cross-correlations, you also
run the risk of removing too much of the *true* effect at the same time.

I was surprised a few days ago when I was reminded that SPSS does have some
time-series provisions, over and beyond generating lag(X,p).  I can't comment more
on those, except that I don't remember reading questions about them.

--
Rich Ulrich



Date: Mon, 9 Jun 2014 08:57:09 -0400
From: [hidden email]
Subject: Re: Correlation between two time series variables
To: [hidden email]

You should not employ OLS regression to calculate the correlation between two variables under a scenario in which each variable has been measured repeatedly [on the same subject] because it violates the assumption of independence. 

A linear mixed model could be employed to accurately estimate the linear correlation.

Ryan


On Mon, Jun 9, 2014 at 12:38 AM, E. Bernardo <[hidden email]> wrote:
Dear All,

The goal is to test if there is correlation between two continuous time series variables (Dependent=Y_t and Independent=X_t). The time series variables run from January 2010 up to December 2013. Can I use Linear Regression analysis so that the independent variables in my regression equation are X_t and Time? Comments on the use of Linear Regression Approach for time series data are welcome.

Thank you.
Eins

Reply | Threaded
Open this post in threaded view
|

Re: Correlation between two time series variables

Anthony Babinec

Take a look at the following website:

 

http://www.tylervigen.com/

 

Many of the examples show spurious correlation involving

two nonstationary series. So, researcher beware!

 

Tony Babinec

[hidden email]

Eins

 

Reply | Threaded
Open this post in threaded view
|

Re: Correlation between two time series variables

Ryan
In reply to this post by Rich Ulrich
A simple correlation coefficient computed on all data which assumes independent pairs may very well differ from a correlation coefficient which takes into account the dependent nature of the repeated measurements. This is a complex issue that deserves more discussion, but no time right now. 

Ryan

On Jun 9, 2014, at 3:41 PM, Rich Ulrich <[hidden email]> wrote:

In a spirit of liberality, I would say that you can always calculate whatever
coefficient you want, and the definition of the coefficient probably stays the
same.  But here is what you lose when your assumptions fail badly: all the
statistical *tests* and the standard errors that generate those tests.   So, you
can get the r, but not the significance of the r.  A sample of hundreds might
"act as if" (for the purpose of testing) it has an N of just 2 or 3... which would
have a ridiculously large standard error.

Simple "repeated measures" for one subject is sometimes acceptable, if your
tests are within-subject and there is no time-series effect, that is, auto-correlation
between neighboring observations.  But the lack of "independence" is mainly
what puts time-series analyses into a category separate from ordinary regression.

Whenever there is dependency, special approaches are needed to devise tests that
either (a) adjust for the dependency by reducing the effective degrees of freedom,
or (b) remove the dependency from these data, or (c) use a different approach to
generate the standard errors for tests.

One of the approaches to achieving approximations to tests for time series tests is
by removing the dependency of serial data points by (for instance) removing linear
and other simple, polynomial trends.  Unfortunately, for cross-correlations, you also
run the risk of removing too much of the *true* effect at the same time.

I was surprised a few days ago when I was reminded that SPSS does have some
time-series provisions, over and beyond generating lag(X,p).  I can't comment more
on those, except that I don't remember reading questions about them.

--
Rich Ulrich



Date: Mon, 9 Jun 2014 08:57:09 -0400
From: [hidden email]
Subject: Re: Correlation between two time series variables
To: [hidden email]

You should not employ OLS regression to calculate the correlation between two variables under a scenario in which each variable has been measured repeatedly [on the same subject] because it violates the assumption of independence. 

A linear mixed model could be employed to accurately estimate the linear correlation.

Ryan


On Mon, Jun 9, 2014 at 12:38 AM, E. Bernardo <[hidden email]> wrote:
Dear All,

The goal is to test if there is correlation between two continuous time series variables (Dependent=Y_t and Independent=X_t). The time series variables run from January 2010 up to December 2013. Can I use Linear Regression analysis so that the independent variables in my regression equation are X_t and Time? Comments on the use of Linear Regression Approach for time series data are welcome.

Thank you.
Eins

Reply | Threaded
Open this post in threaded view
|

Re: Correlation between two time series variables

Ryan
All,

A quick follow-up. I decided to explore the possible difference between the Pearson Product Moment Correlation Coefficient calculated assuming the independent pairs in a fully balanced design without any missing data from a dataset with dependent pairs. In fact, I used the SPSS sample data file "dietstudy.sav" including only the first two time points. 

After giving this much thought as to how to appropriately parameterize the model to take into account the non-independent pairs, I restructured the data from wide to long and wrote the following code. 

MIXED y BY variable time
/FIXED = variable
/RANDOM variable | SUBJECT(patid) COVTYPE(UN)
/REPEATED = variable | subject(patid*time) COVTYPE(UN)
/PRINT=SOLUTION G R
/METHOD=ML.

Using G and R, I calculated the variance covariance matrix for the observations, typically denoted as V. 

Recall that: V = Z*G*TRANSPOS(Z) + R

where 

Z = design matrix for the random effects
G = covariance matrix for the random effects
R = covariance matrix for the residual errors

(By the way, I shouldn't have had to calculate V. SPSS really ought to provide this matrix as it does for G and R.)

I then transformed the covariance (first off-diagonal element) to a correlation by dividing the covariance by the product of the square root of the variances of the two variables (first two main diagonal elements). The estimated correlation coefficient was .1703.

Then I went back to the original file which was in wide format, stacked the data from the first two time points only into two variables, and calculated the Pearson Product Moment Correlation Coefficient, and lo and behold the estimated correlation coefficient was .1703.

Next, I decided to include all five time points (still a fully balanced design), and the correlation calculated from V obtained from solving equation above using G and R from the same MIXED model was -.0151, which again matched the Pearson Product Moment Correlation Coefficient (assuming independent pairs). 

I had initially thought the correlation would change if one were to take into account repeated measurements, but upon further thought I questioned whether it would remain the same in a fully balanced designed, which of course led me to explore this further. Anyway, using this specific dietstudy.sav sample file, the coefficients were identical, as was suggested would likely be the case by another poster. 

Note that I left out the other variables provided in the sample data file, even though it certainly would have made sense to include them if my objectives were different. I wanted to keep the statistical model as close as possible to what had been discussed previously. 

Best,

Ryan


On Mon, Jun 9, 2014 at 5:26 PM, Ryan Black <[hidden email]> wrote:
A simple correlation coefficient computed on all data which assumes independent pairs may very well differ from a correlation coefficient which takes into account the dependent nature of the repeated measurements. This is a complex issue that deserves more discussion, but no time right now. 

Ryan

On Jun 9, 2014, at 3:41 PM, Rich Ulrich <[hidden email]> wrote:

In a spirit of liberality, I would say that you can always calculate whatever
coefficient you want, and the definition of the coefficient probably stays the
same.  But here is what you lose when your assumptions fail badly: all the
statistical *tests* and the standard errors that generate those tests.   So, you
can get the r, but not the significance of the r.  A sample of hundreds might
"act as if" (for the purpose of testing) it has an N of just 2 or 3... which would
have a ridiculously large standard error.

Simple "repeated measures" for one subject is sometimes acceptable, if your
tests are within-subject and there is no time-series effect, that is, auto-correlation
between neighboring observations.  But the lack of "independence" is mainly
what puts time-series analyses into a category separate from ordinary regression.

Whenever there is dependency, special approaches are needed to devise tests that
either (a) adjust for the dependency by reducing the effective degrees of freedom,
or (b) remove the dependency from these data, or (c) use a different approach to
generate the standard errors for tests.

One of the approaches to achieving approximations to tests for time series tests is
by removing the dependency of serial data points by (for instance) removing linear
and other simple, polynomial trends.  Unfortunately, for cross-correlations, you also
run the risk of removing too much of the *true* effect at the same time.

I was surprised a few days ago when I was reminded that SPSS does have some
time-series provisions, over and beyond generating lag(X,p).  I can't comment more
on those, except that I don't remember reading questions about them.

--
Rich Ulrich



Date: Mon, 9 Jun 2014 08:57:09 -0400
From: [hidden email]
Subject: Re: Correlation between two time series variables
To: [hidden email]

You should not employ OLS regression to calculate the correlation between two variables under a scenario in which each variable has been measured repeatedly [on the same subject] because it violates the assumption of independence. 

A linear mixed model could be employed to accurately estimate the linear correlation.

Ryan


On Mon, Jun 9, 2014 at 12:38 AM, E. Bernardo <[hidden email]> wrote:
Dear All,

The goal is to test if there is correlation between two continuous time series variables (Dependent=Y_t and Independent=X_t). The time series variables run from January 2010 up to December 2013. Can I use Linear Regression analysis so that the independent variables in my regression equation are X_t and Time? Comments on the use of Linear Regression Approach for time series data are welcome.

Thank you.
Eins


Reply | Threaded
Open this post in threaded view
|

correlation between two time series variables?

E. Bernardo
In reply to this post by E. Bernardo
Dear all,

I have two continuous time series variables X_t and Y_t, where t = 1, 2, ...12 (monthly data). I want to compute the correlation coefficient between X_t and Y_t. Any suggestion on what correlation coefficient is appropriate considering that they are time series variables?

Thanks a lot.
Eins
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

Maguin, Eugene

Ryan posted a data example on 6/15/14 with the subject line “Correlation between two time series variables”. His syntax was

 

MIXED y BY variable time

/FIXED = variable

/RANDOM variable | SUBJECT(patid) COVTYPE(UN)

/REPEATED = variable | subject(patid*time) COVTYPE(UN)

/PRINT=SOLUTION G R

/METHOD=ML.

 

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of E. Bernardo
Sent: Tuesday, July 01, 2014 3:32 AM
To: [hidden email]
Subject: correlation between two time series variables?

 

Dear all,

 

I have two continuous time series variables X_t and Y_t, where t = 1, 2, ...12 (monthly data). I want to compute the correlation coefficient between X_t and Y_t. Any suggestion on what correlation coefficient is appropriate considering that they are time series variables?

 

Thanks a lot.

Eins

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

David Greenberg
In reply to this post by E. Bernardo
That the data are time series is irrelevant. With two continuous
variables, you'd want to compute Pearson's r. That the data are time
series is relevant for the interpretation of any correlation. You'd
want to worry about spuriousness from common trends, common
seasonality, and so forth. But if the correlation is what you really
want, that is not relevant. David Greenberg, Sociology Department, New
York University.

On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]> wrote:

> Dear all,
>
> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
> ...12 (monthly data). I want to compute the correlation coefficient between
> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
> considering that they are time series variables?
>
> Thanks a lot.
> Eins
> ===================== To manage your subscription to SPSSX-L, send a message
> to [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

Maurice Vergeer
David is correct.
And if, in the case of time series, you would want to see whether values are correlated with observations back in time of the same series or another series one needs to look at autocorrelations and cross-lagged correlations, available in SPSS (Analyse --> Forecasting -->autocorrelations or cross-correlations). You then get several lagged correlations for a number of different lags.

Hope that helps.
Maurice


On Tue, Jul 1, 2014 at 6:53 PM, David Greenberg <[hidden email]> wrote:
That the data are time series is irrelevant. With two continuous
variables, you'd want to compute Pearson's r. That the data are time
series is relevant for the interpretation of any correlation. You'd
want to worry about spuriousness from common trends, common
seasonality, and so forth. But if the correlation is what you really
want, that is not relevant. David Greenberg, Sociology Department, New
York University.

On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]> wrote:
> Dear all,
>
> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
> ...12 (monthly data). I want to compute the correlation coefficient between
> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
> considering that they are time series variables?
>
> Thanks a lot.
> Eins
> ===================== To manage your subscription to SPSSX-L, send a message
> to [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
________________________________________________
Maurice Vergeer
To contact me, see http://mauricevergeer.nl/node/5
To see my publications, see http://mauricevergeer.nl/node/1
________________________________________________
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

E. Bernardo
In reply to this post by David Greenberg
Hi David, thank you for your comments.
My goal is to determine if X_t increase (or decrease) as X_t  increase (or decrease) over time. Please note that the relationship between X_t and Y_t  is not causal.  Sorry, I am now puzzled why correlation for two time series variables is not relevant. Please elaborate.  

Eins



On Wednesday, July 2, 2014 12:53 AM, David Greenberg <[hidden email]> wrote:


That the data are time series is irrelevant. With two continuous
variables, you'd want to compute Pearson's r. That the data are time
series is relevant for the interpretation of any correlation. You'd
want to worry about spuriousness from common trends, common
seasonality, and so forth. But if the correlation is what you really
want, that is not relevant. David Greenberg, Sociology Department, New
York University.

On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]> wrote:

> Dear all,
>
> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
> ...12 (monthly data). I want to compute the correlation coefficient between
> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
> considering that they are time series variables?
>
> Thanks a lot.
> Eins
> ===================== To manage your subscription to SPSSX-L, send a message

> to
[hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

David Greenberg
Your message is internally contradictory. You say that the
relationship between X and Y is causal, and yet you want to know what
effect a change in one variable has on change in the other variable.
Time series analysis has many complications that cannot be explained
briefly and clearly in an e-mail message. Fortunately for you there
are lots of good time series textbooks written by econometricians. You
should get some of them and read them. David Greenberg, Sociology
Department, New York University

On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:

> Hi David, thank you for your comments.
> My goal is to determine if X_t increase (or decrease) as X_t  increase (or
> decrease) over time. Please note that the relationship between X_t and Y_t
> is not causal.  Sorry, I am now puzzled why correlation for two time series
> variables is not relevant. Please elaborate.
>
> Eins
>
>
>
> On Wednesday, July 2, 2014 12:53 AM, David Greenberg <[hidden email]> wrote:
>
>
> That the data are time series is irrelevant. With two continuous
> variables, you'd want to compute Pearson's r. That the data are time
> series is relevant for the interpretation of any correlation. You'd
> want to worry about spuriousness from common trends, common
> seasonality, and so forth. But if the correlation is what you really
> want, that is not relevant. David Greenberg, Sociology Department, New
> York University.
>
> On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]>
> wrote:
>> Dear all,
>>
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
>> ...12 (monthly data). I want to compute the correlation coefficient
>> between
>> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
>> considering that they are time series variables?
>>
>> Thanks a lot.
>> Eins
>> ===================== To manage your subscription to SPSSX-L, send a
>> message
>
>> to
> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
>> commands to manage subscriptions, send the command INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Correlation between two time series variables

E. Bernardo
In reply to this post by Ryan
Ryan and all, thank you for your comments. 
Ryan, the mixed procedure that you suggested answers my question. In my question, I have the dependent and independent time series variables.  Thank you.

I have a follow-up question. Can I still apply mixed procedure for two time series variables with non-causal relationship? That is, the time series variables cannot be classified as either dependent or independent variables.?

Thank you.
Eins

On Sunday, June 15, 2014 12:31 PM, Ryan Black <[hidden email]> wrote:


All,

A quick follow-up. I decided to explore the possible difference between the Pearson Product Moment Correlation Coefficient calculated assuming the independent pairs in a fully balanced design without any missing data from a dataset with dependent pairs. In fact, I used the SPSS sample data file "dietstudy.sav" including only the first two time points. 

After giving this much thought as to how to appropriately parameterize the model to take into account the non-independent pairs, I restructured the data from wide to long and wrote the following code. 

MIXED y BY variable time
/FIXED = variable
/RANDOM variable | SUBJECT(patid) COVTYPE(UN)
/REPEATED = variable | subject(patid*time) COVTYPE(UN)
/PRINT=SOLUTION G R
/METHOD=ML.

Using G and R, I calculated the variance covariance matrix for the observations, typically denoted as V. 

Recall that: V = Z*G*TRANSPOS(Z) + R

where 

Z = design matrix for the random effects
G = covariance matrix for the random effects
R = covariance matrix for the residual errors

(By the way, I shouldn't have had to calculate V. SPSS really ought to provide this matrix as it does for G and R.)

I then transformed the covariance (first off-diagonal element) to a correlation by dividing the covariance by the product of the square root of the variances of the two variables (first two main diagonal elements). The estimated correlation coefficient was .1703.

Then I went back to the original file which was in wide format, stacked the data from the first two time points only into two variables, and calculated the Pearson Product Moment Correlation Coefficient, and lo and behold the estimated correlation coefficient was .1703.

Next, I decided to include all five time points (still a fully balanced design), and the correlation calculated from V obtained from solving equation above using G and R from the same MIXED model was -.0151, which again matched the Pearson Product Moment Correlation Coefficient (assuming independent pairs). 

I had initially thought the correlation would change if one were to take into account repeated measurements, but upon further thought I questioned whether it would remain the same in a fully balanced designed, which of course led me to explore this further. Anyway, using this specific dietstudy.sav sample file, the coefficients were identical, as was suggested would likely be the case by another poster. 

Note that I left out the other variables provided in the sample data file, even though it certainly would have made sense to include them if my objectives were different. I wanted to keep the statistical model as close as possible to what had been discussed previously. 

Best,

Ryan


On Mon, Jun 9, 2014 at 5:26 PM, Ryan Black <[hidden email]> wrote:
A simple correlation coefficient computed on all data which assumes independent pairs may very well differ from a correlation coefficient which takes into account the dependent nature of the repeated measurements. This is a complex issue that deserves more discussion, but no time right now. 

Ryan

On Jun 9, 2014, at 3:41 PM, Rich Ulrich <[hidden email]> wrote:

In a spirit of liberality, I would say that you can always calculate whatever
coefficient you want, and the definition of the coefficient probably stays the
same.  But here is what you lose when your assumptions fail badly: all the
statistical *tests* and the standard errors that generate those tests.   So, you
can get the r, but not the significance of the r.  A sample of hundreds might
"act as if" (for the purpose of testing) it has an N of just 2 or 3... which would
have a ridiculously large standard error.

Simple "repeated measures" for one subject is sometimes acceptable, if your
tests are within-subject and there is no time-series effect, that is, auto-correlation
between neighboring observations.  But the lack of "independence" is mainly
what puts time-series analyses into a category separate from ordinary regression.

Whenever there is dependency, special approaches are needed to devise tests that
either (a) adjust for the dependency by reducing the effective degrees of freedom,
or (b) remove the dependency from these data, or (c) use a different approach to
generate the standard errors for tests.

One of the approaches to achieving approximations to tests for time series tests is
by removing the dependency of serial data points by (for instance) removing linear
and other simple, polynomial trends.  Unfortunately, for cross-correlations, you also
run the risk of removing too much of the *true* effect at the same time.

I was surprised a few days ago when I was reminded that SPSS does have some
time-series provisions, over and beyond generating lag(X,p).  I can't comment more
on those, except that I don't remember reading questions about them.

--
Rich Ulrich



Date: Mon, 9 Jun 2014 08:57:09 -0400
From: [hidden email]
Subject: Re: Correlation between two time series variables
To: [hidden email]

You should not employ OLS regression to calculate the correlation between two variables under a scenario in which each variable has been measured repeatedly [on the same subject] because it violates the assumption of independence. 

A linear mixed model could be employed to accurately estimate the linear correlation.

Ryan


On Mon, Jun 9, 2014 at 12:38 AM, E. Bernardo <[hidden email]> wrote:
Dear All,

The goal is to test if there is correlation between two continuous time series variables (Dependent=Y_t and Independent=X_t). The time series variables run from January 2010 up to December 2013. Can I use Linear Regression analysis so that the independent variables in my regression equation are X_t and Time? Comments on the use of Linear Regression Approach for time series data are welcome.

Thank you.
Eins




===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

E. Bernardo
In reply to this post by David Greenberg
Ryan Sorry, but I think my messages in this thread are not contradictory. 

My original question (FIRST MESSAGE for this thread):
On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]> wrote:
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2, ...12 (monthly data). I want to compute the correlation coefficient between X_t and Y_t. Any suggestion on what correlation coefficient is appropriate considering that they are time series variables?

My follow-up question (SECOND MESSAGE for this thread)
On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:
> Hi David, thank you for your comments. My goal is to determine if X_t increase (or decrease) as X_t  increase (or decrease) over time. Please note that the relationship between X_t and Y_t is not causal.  Sorry, I am now puzzled why correlation for two time series variables is not relevant. Please elaborate.





On Wednesday, July 2, 2014 10:50 AM, David Greenberg <[hidden email]> wrote:


Your message is internally contradictory. You say that the
relationship between X and Y is causal, and yet you want to know what
effect a change in one variable has on change in the other variable.
Time series analysis has many complications that cannot be explained
briefly and clearly in an e-mail message. Fortunately for you there
are lots of good time series textbooks written by econometricians. You
should get some of them and read them. David Greenberg, Sociology
Department, New York University

On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:

> Hi David, thank you for your comments.
> My goal is to determine if X_t increase (or decrease) as X_t  increase (or
> decrease) over time. Please note that the relationship between X_t and Y_t
> is not causal.  Sorry, I am now puzzled why correlation for two time series
> variables is not relevant. Please elaborate.
>
> Eins
>
>
>
> On Wednesday, July 2, 2014 12:53 AM, David Greenberg <[hidden email]> wrote:
>
>
> That the data are time series is irrelevant. With two continuous
> variables, you'd want to compute Pearson's r. That the data are time
> series is relevant for the interpretation of any correlation. You'd
> want to worry about spuriousness from common trends, common
> seasonality, and so forth. But if the correlation is what you really
> want, that is not relevant. David Greenberg, Sociology Department, New
> York University.
>
> On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]>
> wrote:
>> Dear all,
>>
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
>> ...12 (monthly data). I want to compute the correlation coefficient
>> between
>> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
>> considering that they are time series variables?
>>
>> Thanks a lot.
>> Eins
>> ===================== To manage your subscription to SPSSX-L, send a
>> message
>
>> to
> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
>> commands to manage subscriptions, send the command INFO REFCARD

>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

Ryan
Eins,

I did not state that your questions are contradictory; in fact, I haven't read your follow-up post all that closely. I answered the original question of calculating the correlation between two variables in a repeated measures situation resulting in non-independent pairs. I happened to just stumble upon a SUGI (SAS paper) that describes a very similar model I described earlier here:


This paper coupled with my previous post should provide you with the necessary info to calculate the correlation coefficient between X and Y in the presence of repeated measurements using the model I proposed. Whether or not this is the *optimal* model is something you will need to explore.

To answer another question you posed, this linear MIXED model I proposed (very similar to the one proposed in the SUGI paper) treats both X and Y as dependent variables which assume to arise from a multivariate normal distribution. 

An added benefit of this model is that it can handle unbalanced designs in which the number of repeated measurements on each subjects may vary.

Best 

Ryan


On Tue, Jul 1, 2014 at 11:14 PM, E. Bernardo <[hidden email]> wrote:
Ryan Sorry, but I think my messages in this thread are not contradictory. 

My original question (FIRST MESSAGE for this thread):
On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]> wrote:
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2, ...12 (monthly data). I want to compute the correlation coefficient between X_t and Y_t. Any suggestion on what correlation coefficient is appropriate considering that they are time series variables?

My follow-up question (SECOND MESSAGE for this thread)
On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:
> Hi David, thank you for your comments. My goal is to determine if X_t increase (or decrease) as X_t  increase (or decrease) over time. Please note that the relationship between X_t and Y_t is not causal.  Sorry, I am now puzzled why correlation for two time series variables is not relevant. Please elaborate.





On Wednesday, July 2, 2014 10:50 AM, David Greenberg <[hidden email]> wrote:


Your message is internally contradictory. You say that the
relationship between X and Y is causal, and yet you want to know what
effect a change in one variable has on change in the other variable.
Time series analysis has many complications that cannot be explained
briefly and clearly in an e-mail message. Fortunately for you there
are lots of good time series textbooks written by econometricians. You
should get some of them and read them. David Greenberg, Sociology
Department, New York University

On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:
> Hi David, thank you for your comments.
> My goal is to determine if X_t increase (or decrease) as X_t  increase (or
> decrease) over time. Please note that the relationship between X_t and Y_t
> is not causal.  Sorry, I am now puzzled why correlation for two time series
> variables is not relevant. Please elaborate.
>
> Eins
>
>
>
> On Wednesday, July 2, 2014 12:53 AM, David Greenberg <[hidden email]> wrote:
>
>
> That the data are time series is irrelevant. With two continuous
> variables, you'd want to compute Pearson's r. That the data are time
> series is relevant for the interpretation of any correlation. You'd
> want to worry about spuriousness from common trends, common
> seasonality, and so forth. But if the correlation is what you really
> want, that is not relevant. David Greenberg, Sociology Department, New
> York University.
>
> On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]>
> wrote:
>> Dear all,
>>
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
>> ...12 (monthly data). I want to compute the correlation coefficient
>> between
>> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
>> considering that they are time series variables?
>>
>> Thanks a lot.
>> Eins
>> ===================== To manage your subscription to SPSSX-L, send a
>> message
>
>> to
> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
>> commands to manage subscriptions, send the command INFO REFCARD

>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

E. Bernardo
In reply to this post by E. Bernardo
Sorry, the message below is not addressed directly to Ryan, but to David Greenberg.

On Wednesday, July 2, 2014 11:14 AM, E. Bernardo <[hidden email]> wrote:


Ryan Sorry, but I think my messages in this thread are not contradictory. 

My original question (FIRST MESSAGE for this thread):
On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]> wrote:
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2, ...12 (monthly data). I want to compute the correlation coefficient between X_t and Y_t. Any suggestion on what correlation coefficient is appropriate considering that they are time series variables?

My follow-up question (SECOND MESSAGE for this thread)
On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:
> Hi David, thank you for your comments. My goal is to determine if X_t increase (or decrease) as X_t  increase (or decrease) over time. Please note that the relationship between X_t and Y_t is not causal.  Sorry, I am now puzzled why correlation for two time series variables is not relevant. Please elaborate.





On Wednesday, July 2, 2014 10:50 AM, David Greenberg <[hidden email]> wrote:


Your message is internally contradictory. You say that the
relationship between X and Y is causal, and yet you want to know what
effect a change in one variable has on change in the other variable.
Time series analysis has many complications that cannot be explained
briefly and clearly in an e-mail message. Fortunately for you there
are lots of good time series textbooks written by econometricians. You
should get some of them and read them. David Greenberg, Sociology
Department, New York University

On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]> wrote:

> Hi David, thank you for your comments.
> My goal is to determine if X_t increase (or decrease) as X_t  increase (or
> decrease) over time. Please note that the relationship between X_t and Y_t
> is not causal.  Sorry, I am now puzzled why correlation for two time series
> variables is not relevant. Please elaborate.
>
> Eins
>
>
>
> On Wednesday, July 2, 2014 12:53 AM, David Greenberg <[hidden email]> wrote:
>
>
> That the data are time series is irrelevant. With two continuous
> variables, you'd want to compute Pearson's r. That the data are time
> series is relevant for the interpretation of any correlation. You'd
> want to worry about spuriousness from common trends, common
> seasonality, and so forth. But if the correlation is what you really
> want, that is not relevant. David Greenberg, Sociology Department, New
> York University.
>
> On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]>
> wrote:
>> Dear all,
>>
>> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
>> ...12 (monthly data). I want to compute the correlation coefficient
>> between
>> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
>> considering that they are time series variables?
>>
>> Thanks a lot.
>> Eins
>> ===================== To manage your subscription to SPSSX-L, send a
>> message
>
>> to
> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
>> commands to manage subscriptions, send the command INFO REFCARD

>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: correlation between two time series variables?

David Greenberg
Let me begin by repeating my answer to your original question, and
then make some additional remarks. The original question was how to
compute the correlation between two continuous time series variables.
I answered that it made no difference whether the data were time
series or not. The formula for Pearson's r is the same. In a later
message I said that your question was contradictory because you said
that the relationship between X and Y was not causal, and yet you
wanted to know the effect on Y of change in X. If the relationship is
not causal it is presumably spurious, and a change in X would not be
expected to produce change in Y.  Ryan posted a paper devoted to
analyses of repeated measures. If you really have time series data
(one entity with repeated observations for the two variables that
paper is irrelevant, as it deals with panel data (multiple entities
measured at repeated times). On the other hand, if you really have
panel data, then your question was phrased in a misleading manner. I
will end by repeating another observation of mine in an earlier
message. Time series data have many intricacies and complications
(such as trends, seasonality, nonstationarity). You cannot expect a
satisfactory answer to your question by posting it to a listserv and
getting brief suggestions. You need to read textbooks on time series
so that you will know what questions to ask, and have an idea of how
to proceed. It is reasonable to expect people to do this before
posting questions to the list, and you will find the list much more
useful if you do this.FOrtunately there are many textbooks available
that will help you, written at various levels of difficulty.   David
Greenberg, Sociology Department, New York University

On Fri, Jul 4, 2014 at 1:00 AM, E. Bernardo <[hidden email]> wrote:

> Sorry, the message below is not addressed directly to Ryan, but to David
> Greenberg.
>
> On Wednesday, July 2, 2014 11:14 AM, E. Bernardo <[hidden email]>
> wrote:
>
>
> Ryan Sorry, but I think my messages in this thread are not contradictory.
>
> My original question (FIRST MESSAGE for this thread):
> On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]>
> wrote:
>>> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
>>> ...12 (monthly data). I want to compute the correlation coefficient between
>>> X_t and Y_t. Any suggestion on what correlation coefficient is appropriate
>>> considering that they are time series variables?
>
> My follow-up question (SECOND MESSAGE for this thread)
> On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]>
> wrote:
>> Hi David, thank you for your comments. My goal is to determine if X_t
>> increase (or decrease) as X_t  increase (or decrease) over time. Please note
>> that the relationship between X_t and Y_t is not causal.  Sorry, I am now
>> puzzled why correlation for two time series variables is not relevant.
>> Please elaborate.
>
>
>
>
>
> On Wednesday, July 2, 2014 10:50 AM, David Greenberg <[hidden email]> wrote:
>
>
> Your message is internally contradictory. You say that the
> relationship between X and Y is causal, and yet you want to know what
> effect a change in one variable has on change in the other variable.
> Time series analysis has many complications that cannot be explained
> briefly and clearly in an e-mail message. Fortunately for you there
> are lots of good time series textbooks written by econometricians. You
> should get some of them and read them. David Greenberg, Sociology
> Department, New York University
>
> On Tue, Jul 1, 2014 at 10:40 PM, E. Bernardo <[hidden email]>
> wrote:
>> Hi David, thank you for your comments.
>> My goal is to determine if X_t increase (or decrease) as X_t  increase (or
>> decrease) over time. Please note that the relationship between X_t and Y_t
>> is not causal.  Sorry, I am now puzzled why correlation for two time
>> series
>> variables is not relevant. Please elaborate.
>>
>> Eins
>>
>>
>>
>> On Wednesday, July 2, 2014 12:53 AM, David Greenberg <[hidden email]> wrote:
>>
>>
>> That the data are time series is irrelevant. With two continuous
>> variables, you'd want to compute Pearson's r. That the data are time
>> series is relevant for the interpretation of any correlation. You'd
>> want to worry about spuriousness from common trends, common
>> seasonality, and so forth. But if the correlation is what you really
>> want, that is not relevant. David Greenberg, Sociology Department, New
>> York University.
>>
>> On Tue, Jul 1, 2014 at 3:32 AM, E. Bernardo <[hidden email]>
>> wrote:
>>> Dear all,
>>>
>>> I have two continuous time series variables X_t and Y_t, where t = 1, 2,
>>> ...12 (monthly data). I want to compute the correlation coefficient
>>> between
>>> X_t and Y_t. Any suggestion on what correlation coefficient is
>>> appropriate
>>> considering that they are time series variables?
>>>
>>> Thanks a lot.
>>> Eins
>>> ===================== To manage your subscription to SPSSX-L, send a
>>> message
>>
>>> to
>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command SIGNOFF SPSSX-L For a list
>>> of
>>> commands to manage subscriptions, send the command INFO REFCARD
>
>>
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD