This post was updated on .
@ David: You say that sufficient data is missing. Could you clarify what I should post to ease your helping efforts?
I realize that I have to be more precise re my goals. I want to do a small agenda-setting-study. My research question is: Does Twitter has an influence on Spon (a huge German news website)? Or more formally speaking: Does the content of Twitter on <any relevant topic> at T1 (e.g. morning of Day1) has an influence on the content of Spon at T2 (e.g. evening of Day1)? I used the 2014 European Parliament Elections as a case for testing. I coded <Outlet>, <Day>, <Time> and <Topics> to get an idea whether there may be an influence. With different content and outlets, this is often done in literature. Authors correlate <Outlet1_TopicX_T1> (one combined var) with <Outlet2_TopicX_T2> (another combined var) and they also correlate <Outlet2_TopicX_T1> with <Outlet1_TopicX_T2>. These two Pearson's are then used to test for significance of the direction (cross-lagged-correlation with Rozelle-Campbell-baseline). This can be done for a month or so with the effect that one can monitor quite closely whether there may be any connection. The last step (checking for significance), I think, must be done outside of SPSS and is not my concern right now. My problem is that I do not manage to get my data in shape so that I can correlate <Outlet1_TopicX_T1> (one combined var) with <Outlet2_TopicX_T2> (another combined var) and so on. Does that clarify my approach and problem? |
Administrator
|
Did you even bother to run the syntax I posted to see if it addresses your issues?
-------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Sorry David, I wasn't at my local PC with SPSS. I now tried your syntax. It produces an error but still spits out two new files.
File 1 switches cases to vars, right? So that I have n = 343 vars for: ArticleID, Outlet, Time and TopicCount. And file 2 contains total counts for every topic by Tx, Day, Outlet. That looks nice, thank you! However, I still don't understand: How can I use File 2 to compute the mentioned correlations? Maybe I should precede using your File 2 and IF argument to produce the combined vars? Maybe my lack of understanding is due to the error? SPSS says it occurs on line 9 (CORRELATIONS TOTALCount.1 WITH Totalcount.2 .): "9 Correlations Text: TOTALCount.1 Befehl: CORRELATIONS SYNTAXFEHLER: Ein Unterbefehls- oder Variablennamen wird hier erwartet. Möglicherweise wurde ein Variablenname falsch geschrieben. Gültige Unterbefehle für diese Prozedur sind VARIABLES, MISSING, PRINT, FORMAT, MATRIX und STATISTICS. Die Ausführung dieses Befehls wurde gestoppt. " In english: "[...] Syntaxerror: A sub-command or variablename is expected here. Maybe the var-name is written wrong. [...]" Wondered whether the syntax is case sensitive, but the error remains. Mayb its because in File 2 there is only one var named "TotalCount" but not "TotalCount 1" and "TotalCount 2"? |
I believe there has to be a
DATASET ACTIVATE agg. between the AGGREGATE and the SORT CASES commands. HTH, PR |
Hmm, after looking closer on your data I see you have coded all content variables as 1=Yes and 2=No, rather than 0=No.
You might want to do some data cleaning (several values 22)... see: DESCRIPTIVES VARIABLES=Outlet TO Future_of_EU /STATISTICS=MEAN STDDEV MIN MAX. Also, if you have values 1 & 2, be careful when interpreting TotalCount (or here countYes) as it is a SUM(...). With som modification of David Marso's code: DATASET COPY copydata. DATASET ACTIVATE copydata. RECODE ElectCamp TO Future_of_EU (1=1) (2=0) (ELSE= -1). MISSING VALUES ElectCamp TO Future_of_EU (-1). VALUE LABELS ElectCamp TO Future_of_EU 0 'No' 1 'Yes' -1 '--' . VARSTOCASES MAKE topic_count FROM ElectCamp TO Future_of_EU /INDEX=content(topic_count). DATASET DECLARE agg. AGGREGATE OUTFILE agg /BREAK Outlet day time content /countYes=SUM(topic_count) /totalCount=N. DATASET ACTIVATE agg. SORT CASES BY content day time. CASESTOVARS ID=content day . SPLIT FILE BY content. CORRELATIONS countYes.1 WITH countYes.2 .hopefully you have more data Days... HTH /PR |
Thank you, PRogman. Thats a good point, will code 0 = no in the future.
I tried your syntax and it works fine until it comes to the correlations. Here SPSS is saying that the vars are constant (screenshot:corr.png ) - and error, that I in this case do not quite understand. Why is that happening? Apart from that, I am still not sure whether the syntax is really computing what I need. Just to be sure: countYes.1 == Sum of <Yes> in <Twitter> countYes.2 == Sum of <Yes> in <Spon> totalCount.1 == Total number of Twitter n's totalCount.2 == Total number of Spon n's Is that right? If so, until now the correlation is not differentiating the time points, since countYes.1 and .2 are a SUM for all available Ts (that is T1 & T2). Is that right? |
This post was updated on .
Sorry for doubleposting...
I just completed David's syntax with PRogman's suggestion. This fixes the output file, however, the correlation in David's file stops with the same error (vars are constant). Here are the two files: David: DAVID.sav PRogman: PROGman.sav PRogman, in your output, the totalCounts indeed are constant, right? David, while having a look at your file, I finally realized that TotalCount.1 and .2 are merged vars for Twitter + Spon and T1 & T2. That's a huge step in the right direction. However, I still can't differentiate precise enough, since I actually would only need to correlate TotalCount.1 and .2 BY <Topic>. Could that be solved by IF content = Cand_Juncker...? To sum up: 1) Correlation in both files aren't working. 2) TotalCounts differ in both files (is that because PRogman recoded the data?). 3) Apart from that, to me as a nonspecialist the approach generally looks great. But, after corrs are finally working, how could I differentiate by Topic / content (by using IF..?)? |
Are there any further suggestions / hints? Pleaase..!
|
TotalCount differs because AGGREGATE sums up the values of topic_count; if the value is 1 it is actually a count, if it is 2 the count gets doubled, if topic_count is 22 it adds extra 20 counts (ie 10 cases). The RECODE command sets anything but 1 or 2 to -1 which is then declared missing.
totalCount gives Twitter and Spon, 317+26=343 cases. If you try FREQUENCIES VARIABLES=Day /STATISTICS=RANGE MINIMUM MAXIMUM MODE . CROSSTABS /TABLES=Outlet BY Time.you will find that 'day' is a constant (=1) and that Twitter=T1 and Spon=T2 . The SPLIT FILE runs CORRELATION on every 'content' which is just 1 line each... You might need to consult your local statistician. You could try this: DATASET ACTIVATE copydata. SORT CASES BY content outlet. SPLIT FILE LAYERED BY content. CROSSTABS /TABLES=Outlet BY topic_count /FORMAT=AVALUE TABLES /STATISTICS=CORR /CELLS=COUNT /COUNT ROUND CELL. SPLIT FILE OFF. /PR |
Administrator
|
In reply to this post by dave
Other than RTFM and pay attention?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by dave
Dave,
Even after all this back and forth, your stated goals given your data are confusing. We all have provided guesses - but it hasn't seemed to get us anywhere. How about this - you provide an example of the end goal the dataset should be in to conduct the analysis of interest. And then others can try to help get you to that end goal. As I (and I believe others) have said previously, it isn't clear (apparently to me or others on this list) given your original data how you estimate a "correlation" of anything. It appears you only have two time periods in the study in the example data you forwarded. You can't estimate a correlation between "Time 1 and Time 2" if you only have one observation of the time 1 and time 2 process. Andy |
Thanks for your reply, Andy. Just don't abandon me, guys!
What you are saying actually was my initial question: How should I enter my data if the goal is to do a cross-correlation between two time series? Its exactly the problem that I do not know how the data file should look in the end. I got some feedback here re my initial question and entered T1 for Outlet1 and T2 for Outlet2 respectively to these suggestions. I didn't proceed with coding the other Ts yet (I ll have 14, two per day for seven days, in the end) because I wanted to have the structure fixed first in order to avoid restructuring afterwards. But probably the lack of time points is the reason for the vars being constant? Maybe a citation of someone who did what I need to do but is speaking about it only in general terms could help (its a bit long one but it is very clear): "Cross-correlation is a measure between two variables separated by the appropriate amount of time lag for variable 1, which is believed to have an effect on variable 2, the proposed effect. This model produces two pairs of three different sets of correlations totaling six correlations. The first set is the synchronous correlation, the correlation between variable 1, cause, and variable 2,effect, measured at concurrent times (PX1Y1 and PX2Y2). The second set of correlations is the autocorrelation which is the correlation between the same variable at two different times (PX1X2 and Y1Y2). The third set is the cross-lagged correlation and is the correlation between variable 1 and variable 2 at different times (PX1Y2 and PY1X2). The logic behind using this model in its origin is that if the model has been built with the correct cause and effect identified then the correlation between variable 1, cause, and variable 2,effect, over time (PX1Y2) should be greater than the correlation between variable 2, effect, and variable 1, cause, over time (PY1X2). The two relationships of interest to scholars then are the cross-correlations as they indicate the level of influence between variable 1 and variable 2." This may be a bit complicated to read, so you may also have look at this visualizing: Unbenannt.png Does this clarify my burden? |
Sorry, I can't resist. This will be an entirely inadequate response.
Question: In a seven day period, how many of the articles on Ukraine published in the morning on outlet A subsequently appeared in the evening on outlet B. The data. Day UkraineA UkraineB 1 3 2 2 2 1 3 15 13 (guess what happened this day) 4 10 10 5 7 4 6 6 5 7 8 6 Correlation UkraineA with UkraineB. Completely inadequate response! Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of dave Sent: Thursday, August 14, 2014 9:33 AM To: [hidden email] Subject: Re: How to enter cross-sectional time-series data into SPSS for correlation Thanks for your reply, Andy. Just don't abandon me, guys! What you are saying actually was my initial question: How should I enter my data if the goal is to do a cross-correlation between two time series? Its exactly the problem that I do not know how the data file should look in the end. I got some feedback here re my initial question and entered T1 for Outlet1 and T2 for Outlet2 respectively to these suggestions. I didn't proceed with coding the other Ts yet (I ll have 14, two per day for seven days, in the end) because I wanted to have the structure fixed first in order to avoid restructuring afterwards. But probably the lack of time points is the reason for the vars being constant? Maybe a citation of someone who did what I need to do but is speaking about it only in general terms could help (its a bit long one but it is very clear): "Cross-correlation is a measure between two variables separated by the appropriate amount of time lag for variable 1, which is believed to have an effect on variable 2, the proposed effect. This model produces two pairs of three different sets of correlations totaling six correlations. The first set is the synchronous correlation, the correlation between variable 1, cause, and variable 2,effect, measured at concurrent times (PX1Y1 and PX2Y2). The second set of correlations is the autocorrelation which is the correlation between the same variable at two different times (PX1X2 and Y1Y2). The third set is the cross-lagged correlation and is the correlation between variable 1 and variable 2 at different times (PX1Y2 and PY1X2). The logic behind using this model in its origin is that if the model has been built with the correct cause and effect identified then the correlation between variable 1, cause, and variable 2,effect, over time (PX1Y2) should be greater than the correlation between variable 2, effect, and variable 1, cause, over time (PY1X2). The two relationships of interest to scholars then are the cross-correlations as they indicate the level of influence between variable 1 and variable 2." This may be a bit complicated to read, so you may also have look at this visualizing: Unbenannt.png <http://spssx-discussion.1045642.n5.nabble.com/file/n5726960/Unbenannt.png> Does this clarify my burden? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-time-series-data-into-SPSS-for-correlation-tp5726681p5726960.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by dave
The comparison of cross-lagged correlations to assess the relative
strength of causal influence from X to Y and Y to X was proposed some decades ago by Campbell and Kenny. It was soon discredited by critiques raised by various authors. A chapter of LINEAR PANEL ANALYSIS: MODELS OF QUANTITATIVE CHANGE by Ronald Kessler and me (Academic Press, out of print) is devoted to this issue. David F. Greenberg, Sociology Department, New York University On Thu, Aug 14, 2014 at 9:33 AM, dave <[hidden email]> wrote: > Thanks for your reply, Andy. Just don't abandon me, guys! > > What you are saying actually was my initial question: How should I enter my > data if the goal is to do a cross-correlation between two time series? Its > exactly the problem that I do not know how the data file should look in the > end. > > I got some feedback here re my initial question and entered T1 for Outlet1 > and T2 for Outlet2 respectively to these suggestions. I didn't proceed with > coding the other Ts yet (I ll have 14, two per day for seven days, in the > end) because I wanted to have the structure fixed first in order to avoid > restructuring afterwards. But probably the lack of time points is the reason > for the vars being constant? > > Maybe a citation of someone who did what I need to do but is speaking about > it only in general terms could help (its a bit long one but it is very > clear): > > "Cross-correlation is a measure between two variables separated by the > appropriate amount of time lag for variable 1, which is believed to have an > effect on variable 2, the proposed effect. This model produces two pairs of > three different sets of correlations totaling six correlations. The first > set is the synchronous correlation, the correlation between variable 1, > cause, and variable 2,effect, measured at concurrent times (PX1Y1 and > PX2Y2). The second set of correlations is the autocorrelation which is the > correlation between the same variable at two different times (PX1X2 and > Y1Y2). The third set is the cross-lagged correlation and is the correlation > between variable 1 and variable 2 at different times (PX1Y2 and PY1X2). The > logic behind using this model in its origin is that if the model has been > built with the correct cause and effect identified then the correlation > between variable 1, cause, and variable 2,effect, over time (PX1Y2) should > be greater than the correlation between variable 2, effect, and variable 1, > cause, over time (PY1X2). The two relationships of interest to scholars then > are the cross-correlations as they indicate the level of influence between > variable 1 and variable 2." > > This may be a bit complicated to read, so you may also have look at this > visualizing: Unbenannt.png > <http://spssx-discussion.1045642.n5.nabble.com/file/n5726960/Unbenannt.png> > > Does this clarify my burden? > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-time-series-data-into-SPSS-for-correlation-tp5726681p5726960.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I certainly may be wrong but I don't understand the analysis to have anything to do with cross correlation or cross lag--even though I said correlation earlier. I remember that question to be whether article within topics appearing on outlet A in the morning subsequently appeared on outlet B in the evening. As I remember it, there was never the question of the relationship between outlet B evening and outlet A the next morning. Maybe that was to be understood but not stated. And, if the interest really is how I stated it above, then the summary number is a percentage/proportion: 68% of topic 1 articles appearing on outlet A in the morning subsequently appeared on outlet B in the evening of the same day.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Greenberg Sent: Thursday, August 14, 2014 4:23 PM To: [hidden email] Subject: Re: How to enter cross-sectional time-series data into SPSS for correlation The comparison of cross-lagged correlations to assess the relative strength of causal influence from X to Y and Y to X was proposed some decades ago by Campbell and Kenny. It was soon discredited by critiques raised by various authors. A chapter of LINEAR PANEL ANALYSIS: MODELS OF QUANTITATIVE CHANGE by Ronald Kessler and me (Academic Press, out of print) is devoted to this issue. David F. Greenberg, Sociology Department, New York University On Thu, Aug 14, 2014 at 9:33 AM, dave <[hidden email]> wrote: > Thanks for your reply, Andy. Just don't abandon me, guys! > > What you are saying actually was my initial question: How should I > enter my data if the goal is to do a cross-correlation between two > time series? Its exactly the problem that I do not know how the data > file should look in the end. > > I got some feedback here re my initial question and entered T1 for > Outlet1 and T2 for Outlet2 respectively to these suggestions. I didn't > proceed with coding the other Ts yet (I ll have 14, two per day for > seven days, in the > end) because I wanted to have the structure fixed first in order to > avoid restructuring afterwards. But probably the lack of time points > is the reason for the vars being constant? > > Maybe a citation of someone who did what I need to do but is speaking > about it only in general terms could help (its a bit long one but it > is very > clear): > > "Cross-correlation is a measure between two variables separated by the > appropriate amount of time lag for variable 1, which is believed to > have an effect on variable 2, the proposed effect. This model produces > two pairs of three different sets of correlations totaling six > correlations. The first set is the synchronous correlation, the > correlation between variable 1, cause, and variable 2,effect, measured > at concurrent times (PX1Y1 and PX2Y2). The second set of correlations > is the autocorrelation which is the correlation between the same > variable at two different times (PX1X2 and Y1Y2). The third set is the > cross-lagged correlation and is the correlation between variable 1 and > variable 2 at different times (PX1Y2 and PY1X2). The logic behind > using this model in its origin is that if the model has been built > with the correct cause and effect identified then the correlation > between variable 1, cause, and variable 2,effect, over time (PX1Y2) > should be greater than the correlation between variable 2, effect, and > variable 1, cause, over time (PY1X2). The two relationships of > interest to scholars then are the cross-correlations as they indicate the level of influence between variable 1 and variable 2." > > This may be a bit complicated to read, so you may also have look at > this > visualizing: Unbenannt.png > <http://spssx-discussion.1045642.n5.nabble.com/file/n5726960/Unbenannt > .png> > > Does this clarify my burden? > > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-secti > onal-time-series-data-into-SPSS-for-correlation-tp5726681p5726960.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by dave
https://www.youtube.com/watch?v=lj60OAh7O5U
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Greenberg
@ David Greenberg: There is some criticism re CLC, that's right. However, there are also authors/papers stating that CLC with Rozelle-Campbell-baseline is a sufficient approach. So this seems to be an unsettled dispute and from what I have read so far, the approach isn't discredited. After all, it still is very frequently used, although I have to admit that SEM or Granger seem to be superior for that kind of analysis.
In general, yes, I failed. I will have a look at ways that are less sophisticated. Still, it hurts a bit that I failed with the structure. I just don't understand how so many damn social scientists can do CLC when it obviously is SO difficult to actually perform it. Thanks anyways, guys. Some of you really tried hard to save me. ;-) |
It's not so difficult to do at all. The important element is that a long data structure is never used. Visualize the following dataset. The association between alcohol use and depression symptoms is well known. Suppose N-many people report past week alcohol use (AU1-AU10) and past week depression symptoms (DS1-DS10) for 10 weeks. Wide format data.
Second, specialized software (Amos, EQS, Mplus, Lisrel, etc). You can set the problem up in as a multilevel model but it's more complicated from the discussion I have read. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of dave Sent: Friday, August 15, 2014 12:37 PM To: [hidden email] Subject: Re: How to enter cross-sectional time-series data into SPSS for correlation @ David Greenberg: There is some criticism re CLC, that's right. However, there are also authors/papers stating that CLC /with / Rozelle-Campbell-baseline is a sufficient approach. So this seems to be an unsettled dispute and from what I have read so far, the approach isn't discredited. After all, it still is very frequently used, although I have to admit that SEM or Granger seem to be superior for that kind of analysis. In general, yes, I failed. I will have a look at ways that are less sophisticated. Still, it hurts a bit that I failed with the structure. I just don't understand how so many damn social scientists can do CLC when it obviously is SO difficult to actually perform it. Thanks anyways, guys. Some of you really tried hard to save me. ;-) -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-time-series-data-into-SPSS-for-correlation-tp5726681p5726982.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by dave
I suggest that for evaluation purposes you enter the full data you have for two or three content topics. There is always a way to manipulate the dataset to a correct format for your requested analysis. The current database cannot be used as there are no variation (i.e. costants) within the topics. I have never heard of of Rozelle-Campbell baseline, but communication is not my field. There is certainly a way to calculate the baseline. Then there is the discussion whether it is valid, appropriate use or not.
/PR |
Free forum by Nabble | Edit this page |