Hello all,
I am having some trouble entering panel-data into SPSS. I have a bunch of articles from two online media outlets, covering a wide range off topics. The articles were recorded on seven days, twice a day (in the morning and in the evening), so that there are 7*2 = 14 time points. Within those time points, the articles were coded in a 1/0 schema depending on the policy field they are covering. My research question then is, whether for example the reporting of outlet A at T1 is correlated with the one of outlet B at T2. My problem is that I am quite uncertain on how to enter my data properly into spss. Should I go for a wide layout, were each row is a time point? Or should I rather go for a long layout and choose the articles to be in the rows? Or should I enter the data in two rows only, which consist of the media outlets, followed by total numbers for each time point and policy field? My goal is to extract Pearson correlation between the reportings in order to check for cross-lagged-correlation with Rozelle-Campbell-baseline afterwards, which is the the common way for agenda-setting-studies. I would be grateful if someone could give me a hint on how to precede. Thank you! Dave |
Administrator
|
Whatever is most convenient!
See CASESTOVARS and VARSTOCASES commands if the above is mysterious! Since you don't provide an online (available to all) RCB citation to view I leave it to you. So, perhaps a formula or reference in the future!?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hello David,
thanks for your quick reply. Sorry that I wasn't clear about that: It is not about reshaping the data. As of now, I only have frequencies in an xls. That is: I recorded how many articles Media A and B published at T1-14 on the different topics. My question is how I should enter those frequencies into SPSS so that I can correlate for example Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2 afterwards. Does my question make sense? :-) |
In reply to this post by dave
You use the same rules as everyone does, for any set of data.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Think first about, "How can I enter this information so that it is fast and easy to enter, accurately entered, and easy to verify?" If you have 5 pieces of information from each article, you should prefer to have a unique "article identifier", followed by those pieces of information. Then there is a separate file that gives specifics like time, outlet, etc., for each unique article. That avoids redundant data entry, which is both time-consuming and (often) a source of errors. - that is more important when there is more redundancy, and when the data-entry process does not allow for auto-duplication. Files can be joined together when you need to. -- Rich Ulrich > Date: Fri, 11 Jul 2014 03:01:31 -0700 > From: [hidden email] > Subject: How to enter cross-sectional time-series data into SPSS for correlation > To: [hidden email] > > Hello all, > > I am having some trouble entering panel-data into SPSS. > > I have a bunch of articles from two online media outlets, covering a wide > range off topics. The articles were recorded on seven days, twice a day (in > the morning and in the evening), so that there are 7*2 = 14 time points. > Within those time points, the articles were coded in a 1/0 schema depending > on the policy field they are covering. My research question then is, whether > for example the reporting of outlet A at T1 is correlated with the one of > outlet B at T2. > > My problem is that I am quite uncertain on how to enter my data properly > into spss. Should I go for a wide layout, were each row is a time point? Or > should I rather go for a long layout and choose the articles to be in the > rows? Or should I enter the data in two rows only, which consist of the > media outlets, followed by total numbers for each time point and policy > field? > > My goal is to extract Pearson correlation between the reportings in order to > check for cross-lagged-correlation with Rozelle-Campbell-baseline > afterwards, which is the the common way for agenda-setting-studies. > > I would be grateful if someone could give me a hint on how to precede. > > Thank you! > Dave > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-time-series-data-into-SPSS-for-correlation-tp5726681.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Thank you for your reply, Rich. I tried several approaches but failed to include all information so that I indeed can correlate for example Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2. As I understand it, for doing so, I need to include ALL data - that is: 1) time points T1-14; 2) Media 1 / 2; 3) policy fields - somehow into one data set. Isn't that right?
For example, I tried: 1) Each row / id is an article (observation), while the columns (vars) include time, media a/b and the policy field covered: Not working, because then I can only correlate aggregat-data for the policy fields that are not differentiated by time and outlet. 2) Each row / id is an outlet, so that there are actually only two rows, while the columns include ALL separate observations bundled (i.e.: <T1_OutletA_ForeignPolicy>, <T1_OutletB_ForeignPolicy>, <T2_OutletA_ForeignPolicy>, <T2_OutletB_ForeignPolicy> etc.): This structure allows me to drag in Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2 for Pearson BUT SPSS is saying that at least one var is constant. 3) And some other, in the end similar schemata that all didn't allow me to correlate Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2. I guess my problem actually isn't a real problem but that I, to be blunt, am just too stupid. Still, this is making me a bit mad since I am breeding over this for three days. |
Administrator
|
In reply to this post by dave
I would enter the data in a LONG file with the following variables:
ArticleID Outlet (1-2 --> your A & B, but easier as a numeric variable with 1 and 2) Day (1-7) Time (1-2) Then you can carry out whatever data management steps are required to get the data in shape for any particular analysis you have in mind. This may entail using SORT, the LAG function, VARSTOCASES, etc. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by dave
Reread my post!
"Since you don't provide an online (available to all) RCB citation to view I leave it to you. So, perhaps a formula or reference in the future!?" A REFERENCE OR A FORMULA!!!!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by dave
More generally: www.sscce.org.
Very useful suggestions! ------------------------------ On Fri, Jul 11, 2014 9:22 PM CEST David Marso wrote: >*Reread my post!* >"Since you don't provide an online (available to all) RCB citation to view I >leave it to you. >So, perhaps a formula or reference in the future!?" >A REFERENCE OR A FORMULA!!!! > > >dave wrote >> Thank you for your reply, Rich. I tried several approaches but failed to >> include all information so that I indeed can correlate for example >> Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2. As I understand >> it, for doing so, I need to include ALL data - that is: 1) time points >> T1-14; 2) Media 1 / 2; 3) policy fields - somehow into one data set. Isn't >> that right? >> >> For example, I tried: >> >> 1) Each row / id is an article (observation), while the columns (vars) >> include time, media a/b and the policy field covered: Not working, because >> then I can only correlate aggregat-data for the policy fields that are not >> differentiated by time and outlet. >> >> 2) Each row / id is an outlet, so that there are actually only two rows, >> while the columns include ALL separate observations bundled (i.e.: >> <T1_OutletA_ForeignPolicy> >> , >> <T1_OutletB_ForeignPolicy> >> , >> <T2_OutletA_ForeignPolicy> >> , >> <T2_OutletB_ForeignPolicy> >> etc.): This structure allows me to drag in Media_A_ForeignPolitics_T1 vs. >> Media_B_ForeignPolitics_T2 for Pearson BUT SPSS is saying that at least >> one var is constant. >> >> 3) And some other, in the end similar schemata that all didn't allow me to >> correlate Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2. >> >> I guess my problem actually isn't a real problem but that I, to be blunt, >> am just too stupid. Still, this is making me a bit mad since I am breeding >> over this for three days. > > > > > >----- >Please reply to the list and not to my personal email. >Those desiring my consulting or training services please feel free to email me. >--- >"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." >Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" >-- >View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-time-series-data-into-SPSS-for-correlation-tp5726681p5726691.html >Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by dave
The magic of database management lies in the fact that you
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
can join files, and you can re-arrange files. As Dave says, read up on some commands like LAG, CasesToVars, VarsToCases. For ease and accuracy of entry, I think Bruce is combining into one what I would start out with in two files. [Bruce] I would enter the data in a LONG file with the following variables:== For my approach, those four variables make a relatively SHORT file, with *one* line for each ArticleID. Then there is a second file which may be longer, if it has multiple lines for each ArticleID -- each line has a code for every relevant Topic for that ID. If this study has the raters being "blinded" to the source, then each rater sees only a unique ID; to that, he marks the columns as 1=YES / 0=NO (say) for each of the topics that appear in that article. Each line may have ArticleID, RaterID, Topic1 to Topic9 (say). If you instead enter in the Topics as a list of Topics, you would use SPSS to convert them to a set of TopicN variables, coded 0-1. - You may enter Topic by a "topic number", say, 1...9. - If you enter Topic by a label in text, you first convert the text to "topic number" by (say) Autorecode. Then you Match the ArticleID to its information with Outlet, Day, Time. -- Rich Ulrich > Date: Fri, 11 Jul 2014 10:32:07 -0700 > From: [hidden email] > Subject: Re: How to enter cross-sectional time-series data into SPSS for correlation > To: [hidden email] > > Thank you for your reply, Rich. I tried several approaches but failed to > include all information so that I indeed can correlate for example > Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2. As I understand > it, for doing so, I need to include ALL data - that is: 1) time points > T1-14; 2) Media 1 / 2; 3) policy fields - somehow into one data set. Isn't > that right? > > For example, I tried: > > 1) Each row / id is an article (observation), while the columns (vars) > include time, media a/b and the policy field covered: Not working, because > then I can only correlate aggregat-data for the policy fields that are not > differentiated by time and outlet. > > 2) Each row / id is an outlet, so that there are actually only two rows, > while the columns include ALL separate observations bundled (i.e.: > <T1_OutletA_ForeignPolicy>, <T1_OutletB_ForeignPolicy>, > <T2_OutletA_ForeignPolicy>, <T2_OutletB_ForeignPolicy> etc.): This structure > allows me to drag in Media_A_ForeignPolitics_T1 vs. > Media_B_ForeignPolitics_T2 for Pearson BUT SPSS is saying that at least one > var is constant. > > 3) And some other, in the end similar schemata that all didn't allow me to > correlate Media_A_ForeignPolitics_T1 vs. Media_B_ForeignPolitics_T2. > > I guess my problem actually isn't a real problem but that I, to be blunt, am > just too stupid. Still, this is making me a bit mad since I am breeding over > this for three days. > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-time-series-data-into-SPSS-for-correlation-tp5726681p5726688.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Dear all,
again many thanks for your replys. You are a huge help and I somewhat see things clearer now. I tried both approaches (by Bruce and Rich) first with some fake data and the structure in general is working (even by match by id with two different data sets). Its my first experience with SPSS syntax - wow! :) However, I am still unsure how I now could correlate the data. As mentioned, the goal is to for example to correlate Media_1_TopicA_T1 vs. Media_2_TopicA_T2 in order to see if there might be a connection. In a perfect world, I could enter the following into my syntax file: correlations /variables = media_1_topica_t1 vs. media_2_topicA_t2 /print = nosig. But I don't have such aggregated vars yet, right? I only have the parts, that is <outlet>, <day>, <time> and <topic>. Could you give me a hint on how I could produce Media_1_TopicA_T1 etc. vars? Should I look for the AGGREGATE command or is that not the right approach? |
Administrator
|
Perhaps this example will clarify what I had in mind. Notice that the N for the correlations is half the number of records in the (LONG-formatted) data file.
* Generate some fake data to illustrate. NEW FILE. DATASET CLOSE ALL. INPUT PROGRAM. LOOP ArticleID = 1 to 10. LEAVE ArticleID. LOOP Outlet = 1 to 2. LEAVE Outlet. LOOP Day = 1 to 7. LEAVE Day. LOOP Time = 1 to 2. COMPUTE Score = rv.normal(50,10). END CASE. END LOOP. END LOOP. END LOOP. END LOOP. END FILE. END INPUT PROGRAM. EXECUTE. FORMATS ArticleID to Time (F2.0). * Get paired scores needed for correlations. * Depending on the pair, resorting the data may be required. * Some examples. * [1] Morning vs Evening for same article, outlet & day. * The following SORT command is not really needed in this case, * as the data are already sorted in the correct order. * I include it just to illustrate the necessity of having the * data sorted in the right order. SORT CASES BY ArticleID Outlet Day Time. DO IF (Time EQ 2). . COMPUTE Morning = LAG(Score). . COMPUTE Evening = Score. END IF. * [2] Outlet A vs B for same article, day & time. SORT CASES BY ArticleID Day Time Outlet. DO IF (Outlet EQ 2). . COMPUTE OutletA = LAG(Score). . COMPUTE OutletB = Score. END IF. * Now compute the desired correlations. CORRELATIONS Morning with Evening / OutletA with OutletB.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
This post was updated on .
Thank you, Bruce. I tried to unterstand that syntax but am having some trouble.
I generated the fake data. Could you elaborate a bit on what "COMPUTE Score = rv.normal(50,10). " exactly does? Then I tried to reproduce your examples with that data but I am only creating blank vars (Outlet A, B, Morning, Evening). Could you give some more advise on the examples (if this question is not too general)? And do I see this right: In the end, I can't correlate media_1_topica_t1 vs. media_2_topicA_t2 with this approach but only morning against evening and outlet 1 vs. outlet 2, right? I am terrible sorry to bother with such questions. Until now, I only worked with easy structured spss data that containt id = person where each person had a value for the following vars, with no waves. |
Administrator
|
Dave, you need to become familiar with the Command Syntax Reference manual (aka. the FM). From the FM: RV.NORMAL. RV.NORMAL(mean, stddev). Numeric. Returns a random value from a normal distribution with specified mean and standard deviation. I generated fake data because I needed some data to illustrate how I was thinking about your problem. But you have the real data. So why are you generating fake data? You can create whatever paired variables you want. I just illustrated two possibilities. But, you say you want the correlation of "media_1_topicA_t1 vs. media_2_topicA_t2". My fake data did not include a topic variable--I missed that. How many topics are there? How does topic relate to ArticleID? I.e., does each article ID have only one topic? It would be MUCH easier for someone to help you if you provided the actual data, or a small subset of it that contains all of the key features. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
You are right, Bruce. I should get more used to the syntax and indeed, it is time for the "real" data.
Therefore, I finally transfered a first part of my data from XLS to SPSS and uploaded it here: http://userpage.fu-berlin.de/abdis/spss/as-study.sav For now I coded T1 (1st half of day 1) for OutletA and T2 (2nd half of day 1) for OutletB plus all relevant topic-vars in a binary schemata. I hope with that it ll be easier to tackle my problem (that is, how to correlate for example "outletA_topicA_t1 vs. outletB_topicA_t2"). I ll be on vacation from tomorrow on until early August but would be glad to find some more hints on my return. Thank you for your support, the help of this list is tremendous! |
Dear all,
I had a fresh look onto this. I am still having trouble combining my standard variables into new variables so that I can run a correlation for example between outletA_topicA_t1 vs. outletB_topicA_t2. Do tackle this, I read on the IF command for computing new vars and came up with this: IF ((Outlet = 1 AND Day = 1 AND Time = 1 AND ElectCamp = 1)) Twitter_D1_T2_ElectCamp = 1*. EXECUTE. * i.e.: outletA_topicA_t1 AND IF ((Outlet = 2 AND Day = 1 AND Time = 2 AND ElectCamp = 1)) Spon_D1_T2_ElectCamp = 1*. EXECUTE. * i.e.: outletB_topicA_t2 For beginning, does this make sense to you? |
Dear all,
I am desperate. Tried for a couple of hours but failed and my deadline is approaching fast. I entered my data as suggested (see link to spss file above). Could someone advise on how I could run a corr between outletA_topicA_t1 and outletB_topicA_t2? Bruce? Thank you! |
Administrator
|
Sorry...I'm on vacation for the next week or so, and don't have SPSS on this machine. Perhaps someone else will take the time to look at it. For those not reading via Nabble, here's the file link Dave referred to:
http://userpage.fu-berlin.de/abdis/spss/as-study.sav
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by dave
Given your data I do not understand how you would "correlate" outlets and times in your example. They aren't matched data. All I can think of is to do analysis of contingency tables and estimate Cramer's V. It is a bit shooting in the dark though with your current description.
***************************************. SPSSINC GETURI DATA URI="http://userpage.fu-berlin.de/abdis/spss/as-study.sav" FILETYPE=SAV DATASET=temp. CROSSTABS TABLE ElectCamp BY Outlet /STATISTICS = PHI. ***************************************. |
Let's get back to basics here: What is/are the hypothesis/hypotheses being
tested? I wouldn't put too much weight on the word correlation because that can mean many different kinds of tests. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Andy W Sent: Thursday, August 07, 2014 10:25 AM To: [hidden email] Subject: Re: How to enter cross-sectional time-series data into SPSS for correlation Given your data I do not understand how you would "correlate" outlets and times in your example. They aren't matched data. All I can think of is to do analysis of contingency tables and estimate Cramer's V. It is a bit shooting in the dark though with your current description. ***************************************. SPSSINC GETURI DATA URI="http://userpage.fu-berlin.de/abdis/spss/as-study.sav" FILETYPE=SAV DATASET=temp. CROSSTABS TABLE ElectCamp BY Outlet /STATISTICS = PHI. ***************************************. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-enter-cross-sectional-t ime-series-data-into-SPSS-for-correlation-tp5726681p5726908.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
Maybe something like this, posted with hesitance since you don't provide sufficient data to actually test the code. Nor do you specify the nature of WHAT exactly you want to correlate.
This code simply sums whatever the hell is in your data columns and pairs them by outlet x time. Just enough rope to hang yourself with I suspect! ---------------------------------------------------------- DATASET COPY copydata. DATASET ACTIVATE copydata. VARSTOCASES MAKE topic_count FROM ElectCamp TO Future_of_EU / INDEX=content(topic_count). DATASET DECLARE agg. AGGREGATE OUTFILE agg / BREAK Outlet day time content/TotalCount=SUM(topic_count). SORT CASES BY content day time. CASESTOVARS ID=content day . SPLIT FILE BY content. CORRELATIONS TOTALCount.1 WITH Totalcount.2 .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |