|
Dear all,
I have a data transformation problem and would greatly appreciate any suggestions on how to solve it. I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec). The data look like this (this is a simplified version with only two raters when in actuality there are nine): Rater Start End Var1 Var2 ... case1 R1 17.54 123.29 4 2 case2 R2 18.02 123.76 4 3 case3 R1 128.43 171.53 2 1 case4 R2 130.13 148.21 2 1 . . . I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset. Consequently, this is what the data should look like in the end: Start End Var1_R1 Var1_R2 Var2_R1 Var2_R2 ... case1 17.54 123.29 4 4 2 3 case2 128.43 171.53 2 . 1 . case3 130.13 148.21 . 2 . 1 . . . For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit. I am unsure how to go about solving this transformation task in an automated fashion in Stata - hence any help is much appreciated. Eike ___________________________________________________________ NEU: WEB.DE DSL für 19,99 EUR/mtl. und ohne Mindest-Laufzeit! http://produkte.web.de/go/02/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Eike,
>>I am unsure how to go about solving this transformation task in an automated fashion in Stata - hence any help is much appreciated. By the way this is a an spss list, not a Stata list. Nothing anybody writes here will work in Stata. Concepts, yes. Code, No. Why don't you post this on the Stata list. I know there is one. And, I'm sure there are seriously competent people on that list. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by emrinke
Dear Gene and other SPSS listers, sorry, this was a nightly mishap on my side - I am indeed more interested in getting this problem solved in SPSS than in Stata, as I am currently working with SPSS to do the reliability analyses of the rating data. Therefore, here is my post again - this time with the correct software title. I'd still appreciated any help with this data structure problem! Dear all, I have a data transformation problem and would greatly appreciate any suggestions on how to solve it. I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec). The data look like this (this is a simplified version with only two raters when in actuality there are nine): Rater Start End Var1 Var2 ... case1 R1 17.54 123.29 4 2 case2 R2 18.02 123.76 4 3 case3 R1 128.43 171.53 2 1 case4 R2 130.13 148.21 2 1 . . . I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset. Consequently, this is what the data should look like in the end: Start End Var1_R1 Var1_R2 Var2_R1 Var2_R2 ... case1 17.54 123.29 4 4 2 3 case2 128.43 171.53 2 . 1 . case3 130.13 148.21 . 2 . 1 . . . For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit. I am unsure how to go about solving this transformation task in an automated fashion in SPSS (!) - hence any help is much appreciated. Eike
-----Ursprüngliche Nachricht-----
Eike,
>>I am unsure how to go about solving this transformation task in an automated fashion in Stata - hence any help is much appreciated.
By the way this is a an spss list, not a Stata list. Nothing anybody writes here will work in Stata. Concepts, yes. Code, No. Why don't you post this on the Stata list. I know there is one. And, I'm sure there are seriously competent people on that list.
Gene Maguin
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
|
In reply to this post by emrinke
Eike,
How many segments per judge are you talking about? 10? 100? 1000? 5000? Ok. I don't know that I can give a complete solution because I suspect that 'issues', and maybe quite a few, are going to come up in the transformation and analysis phases that may need further fixing. I don't have a lot of confidence that what follows is going to be helpful. But, here's how I'd work on it. If you haven't already done so, do a frequencies on both start and end and see if you can group times. My impression from your description is that each judge will identify nonoverlapping segments such that start1 < end 1 < start2 < end2, etc. The problem is that any pair of judges may not agree very well on the start and end times. The killer problem is if a judge identifies as a segment as one segment that other judges identify as two adjacent segments. The simple check is that all judges have the same number of segments. After that you might want to compute segment lengths, inter-segment start and inter-segment end intervals. Do that by judge and look at the distributions by judge to see if you can find odd values. Perhaps there's no reason why all judges have the number of segments but each concantenate different segments. So, let's say there are no problems/any problems are now fixed. So now redo the frequency distributions on the start and end times. Basically you are going to be somewhere on this continuum. One end is that segment start (end) times are 'tightly' grouped; the other is that start (end) times are 'loosely' grouped. You have to define 'tightly' and 'loosely'. I can't. So you mentioned the idea of defining window of a certain width, say 5 seconds, and keeping records that are within the window and throwing out the window. Think about this window for a minute. It's got a width but how do you decide where on the time scale to place the left edge for that segment? I have not idea what to tell you to do, nor do I know what I'd do without having worked with data and knowing what the longterm analytical goals were. If the number of segments were small, I might define window boundaries in syntax for each segment and, using those boundaries, number the segments. With the segments numbered, a casestovars operation is easy. But with hundreds or thousands of segments, I'm not sure how to escape the burden of keying the boundary times. Far worse would be such loose groupings that a segment's start time (or end time) distribution overlaps with that for an adjacent segment. There's another angle on this. The above babblings assumes a long form file. But, a wide form file might work better for some things. If you did a casestovars right away so that raterIDs were casesIDs, then you could do frequencies start and end times by segment and so on. There may be advantages to this data structure but I'm not sure. At some point you may need to do a casestovars operation, if only for your analyses, the question is when is the optimal time and it simply depends. Gene Maguin >>I have a data transformation problem and would greatly appreciate any suggestions on how to solve it. I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec). The data look like this (this is a simplified version with only two raters when in actuality there are nine): Rater Start End Var1 Var2 ... case1 R1 17.54 123.29 4 2 case2 R2 18.02 123.76 4 3 case3 R1 128.43 171.53 2 1 case4 R2 130.13 148.21 2 1 . . . I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset. Consequently, this is what the data should look like in the end: Start End Var1_R1 Var1_R2 Var2_R1 Var2_R2 ... case1 17.54 123.29 4 4 2 3 case2 128.43 171.53 2 . 1 . case3 130.13 148.21 . 2 . 1 . . . For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
