Good morning,
My dissertation has a data set that I believe is reasonably straightforward. My hypotheses also seem basic. For the life of me, I can't figure it out in SPSS. I have 7 students. They were observed 10 times each T1...T2...etc. They were observed for two variables: IV Anxiety, DV Hyperactivity. Basic question: as anxiety increases, does hyperactivity increase. The obvious: I am trying to group students into high and low anxiety and high and low hyperactivity. The second question: does hyperactivity appear to follow from anxiety (if hyperactivity was observed within 30 seconds of anxiety behavior = anxiety led to hyperactivity in that instance). Forgetting the obvious theoretical questions, Here is a sample of the data in the configuration I believe should be in SPSS to get the comparisons I need. Anx Hyp St1_T1 11 7 St1_T2 9 11 St1_T3 11 12 St1_T4 0 0 St1_T5 0 0 St1_T6 0 0 St1_T7 0 0 St1_T8 0 0 St1_T9 0 0 St1_T10 0 0 St2_T1 16 7 St2_T2 32 21 St2_T3 33 10 St2_T4 30 13 St2_T5 25 13 St2_T6 36 25 St2_T7 0 0 St2_T8 0 0 St2_T9 0 0 St2_T10 0 0 St3_T1 31 10 St3_T2 25 17 St3_T3 31 14 St3_T4 21 25 St3_T5 28 19 St3_T6 21 15 St3_T7 0 0 St3_T8 0 0 St3_T9 0 0 St3_T10 0 0 St4_T1 25 22 St4_T2 25 25 St4_T3 30 37 St4_T4 25 32 St4_T5 29 27 St4_T6 35 35 St4_T7 21 28 St4_T8 27 26 St4_T9 17 16 St4_T10 17 15 ...etc The other way to orient the data, it seems to me, is: T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 St_1_Anx 11 9 11 0 0 0 0 0 0 0 St_2_Anx 16 32 33 30 25 36 0 0 0 0 St_3_Anx 31 25 31 21 28 21 0 0 0 0 St_4_Anx 25 25 30 25 29 35 21 27 17 17 St_5_Anx 23 23 31 27 19 19 16 19 13 15 St_6_Anx 34 51 53 53 42 52 43 32 46 9 St_7_Anx 29 31 31 36 0 0 0 0 0 0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 St_1_Hyp 7 11 12 0 0 0 0 0 0 0 St_2_Hyp 7 21 10 13 13 25 0 0 0 0 St_3_Hyp 10 17 14 25 19 15 0 0 0 0 St_4_Hyp 22 25 37 32 27 35 28 26 16 15 St_5_Hyp 1 10 30 17 19 35 26 30 39 16 St_6_Hyp 16 32 19 33 22 51 40 21 23 St_7_Hyp 34 30 23 35 0 0 0 0 0 0 My questions: Which data orientation is correct, and which process should I run in order to group the students in high/low anx and high/low hyp? I'm grateful for any suggestions that might be offered. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Which data layout you choose to use will depend upon the hypotheses you want to test. That being said, there is an obvious problem with thehttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396026/#R2 On Sun, Mar 4, 2018 at 12:46 PM, PsyDStats <[hidden email]> wrote: Good morning, |
In reply to this post by PsyDStats
First reactions: The zeroes sure do look a lot like they must be "Missing" - which means that the program options will be greater when using the long format. You could, for instance, use discriminant function on seven
Groups (= subjects) and see how they group when plotted.
However, cursory examination suggests that subject 1, with only 3 Times, barely overlaps the other subjects. Subject 1 is low, the rest, high? I think you need more data.
-- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Sunday, March 4, 2018 12:46:24 PM To: [hidden email] Subject: 2 Variables, 7 cases, 10 observations -- Simple? Good morning,
My dissertation has a data set that I believe is reasonably straightforward. My hypotheses also seem basic. For the life of me, I can't figure it out in SPSS. I have 7 students. They were observed 10 times each T1...T2...etc. They were observed for two variables: IV Anxiety, DV Hyperactivity. Basic question: as anxiety increases, does hyperactivity increase. The obvious: I am trying to group students into high and low anxiety and high and low hyperactivity. The second question: does hyperactivity appear to follow from anxiety (if hyperactivity was observed within 30 seconds of anxiety behavior = anxiety led to hyperactivity in that instance). Forgetting the obvious theoretical questions, Here is a sample of the data in the configuration I believe should be in SPSS to get the comparisons I need. Anx Hyp St1_T1 11 7 St1_T2 9 11 St1_T3 11 12 St1_T4 0 0 St1_T5 0 0 St1_T6 0 0 St1_T7 0 0 St1_T8 0 0 St1_T9 0 0 St1_T10 0 0 St2_T1 16 7 St2_T2 32 21 St2_T3 33 10 St2_T4 30 13 St2_T5 25 13 St2_T6 36 25 St2_T7 0 0 St2_T8 0 0 St2_T9 0 0 St2_T10 0 0 St3_T1 31 10 St3_T2 25 17 St3_T3 31 14 St3_T4 21 25 St3_T5 28 19 St3_T6 21 15 St3_T7 0 0 St3_T8 0 0 St3_T9 0 0 St3_T10 0 0 St4_T1 25 22 St4_T2 25 25 St4_T3 30 37 St4_T4 25 32 St4_T5 29 27 St4_T6 35 35 St4_T7 21 28 St4_T8 27 26 St4_T9 17 16 St4_T10 17 15 ...etc The other way to orient the data, it seems to me, is: T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 St_1_Anx 11 9 11 0 0 0 0 0 0 0 St_2_Anx 16 32 33 30 25 36 0 0 0 0 St_3_Anx 31 25 31 21 28 21 0 0 0 0 St_4_Anx 25 25 30 25 29 35 21 27 17 17 St_5_Anx 23 23 31 27 19 19 16 19 13 15 St_6_Anx 34 51 53 53 42 52 43 32 46 9 St_7_Anx 29 31 31 36 0 0 0 0 0 0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 St_1_Hyp 7 11 12 0 0 0 0 0 0 0 St_2_Hyp 7 21 10 13 13 25 0 0 0 0 St_3_Hyp 10 17 14 25 19 15 0 0 0 0 St_4_Hyp 22 25 37 32 27 35 28 26 16 15 St_5_Hyp 1 10 30 17 19 35 26 30 39 16 St_6_Hyp 16 32 19 33 22 51 40 21 23 St_7_Hyp 34 30 23 35 0 0 0 0 0 0 My questions: Which data orientation is correct, and which process should I run in order to group the students in high/low anx and high/low hyp? I'm grateful for any suggestions that might be offered. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by PsyDStats
Check the archives of this list for discussions pf coarsening measures.
In psychology, there is a variation in vocabulary. Some speak of the invidious median split and others speak of the nefarious median split. Even if this is only a pilot for your dissertation study, with such a tiny data set you want to throw away as little data as possible. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by PsyDStats
Maybe I missed this in your description but what are we looking at? I understand 7 persons observed 10 times and anxiety (AN) and hyperactivity (HA) observed at each time point. How did person St1 at T1 get a score of 11 and 7? A multi-item checklist? You mention 'within 30 seconds'. Does this mean that people were observed at 30 second intervals?
I agree with Art about coarsening data. You don't have much data to begin with and you're giving information away by dichotomizing. (Little rant: Unless these 7 people are extraordinarily rare or special, your committee did you a disservice by signing off on an N of 7 people.) The basic question: Are you wanting to know if (a) the within time point correlation between AN and HA is positive or (b) are you wanting to know if the correlation between AN at t(i) and HA at t(i+1) is positive? Question 2: This question reads like you want to treat AN and HA as absent (AN=0; HA=0) or present (AN>0; HA>0)? Is this true? Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of PsyDStats Sent: Sunday, March 4, 2018 12:46 PM To: [hidden email] Subject: 2 Variables, 7 cases, 10 observations -- Simple? Good morning, My dissertation has a data set that I believe is reasonably straightforward. My hypotheses also seem basic. For the life of me, I can't figure it out in SPSS. I have 7 students. They were observed 10 times each T1...T2...etc. They were observed for two variables: IV Anxiety, DV Hyperactivity. Basic question: as anxiety increases, does hyperactivity increase. The obvious: I am trying to group students into high and low anxiety and high and low hyperactivity. The second question: does hyperactivity appear to follow from anxiety (if hyperactivity was observed within 30 seconds of anxiety behavior = anxiety led to hyperactivity in that instance). Forgetting the obvious theoretical questions, Here is a sample of the data in the configuration I believe should be in SPSS to get the comparisons I need. Anx Hyp St1_T1 11 7 St1_T2 9 11 St1_T3 11 12 St1_T4 0 0 St1_T5 0 0 St1_T6 0 0 St1_T7 0 0 St1_T8 0 0 St1_T9 0 0 St1_T10 0 0 St2_T1 16 7 St2_T2 32 21 St2_T3 33 10 St2_T4 30 13 St2_T5 25 13 St2_T6 36 25 St2_T7 0 0 St2_T8 0 0 St2_T9 0 0 St2_T10 0 0 St3_T1 31 10 St3_T2 25 17 St3_T3 31 14 St3_T4 21 25 St3_T5 28 19 St3_T6 21 15 St3_T7 0 0 St3_T8 0 0 St3_T9 0 0 St3_T10 0 0 St4_T1 25 22 St4_T2 25 25 St4_T3 30 37 St4_T4 25 32 St4_T5 29 27 St4_T6 35 35 St4_T7 21 28 St4_T8 27 26 St4_T9 17 16 St4_T10 17 15 ...etc The other way to orient the data, it seems to me, is: T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 St_1_Anx 11 9 11 0 0 0 0 0 0 0 St_2_Anx 16 32 33 30 25 36 0 0 0 0 St_3_Anx 31 25 31 21 28 21 0 0 0 0 St_4_Anx 25 25 30 25 29 35 21 27 17 17 St_5_Anx 23 23 31 27 19 19 16 19 13 15 St_6_Anx 34 51 53 53 42 52 43 32 46 9 St_7_Anx 29 31 31 36 0 0 0 0 0 0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 St_1_Hyp 7 11 12 0 0 0 0 0 0 0 St_2_Hyp 7 21 10 13 13 25 0 0 0 0 St_3_Hyp 10 17 14 25 19 15 0 0 0 0 St_4_Hyp 22 25 37 32 27 35 28 26 16 15 St_5_Hyp 1 10 30 17 19 35 26 30 39 16 St_6_Hyp 16 32 19 33 22 51 40 21 23 St_7_Hyp 34 30 23 35 0 0 0 0 0 0 My questions: Which data orientation is correct, and which process should I run in order to group the students in high/low anx and high/low hyp? I'm grateful for any suggestions that might be offered. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by PsyDStats
First of all, THANK you very much for everyone's responses.
Secondly, yes, an N of 7 is sparse. The 0 means no observations. I'm hesitant to remove St1 because of the already small sample size, even though it lacks . I oriented the data vertically as in my first scenario and ran nonparametric stats comparing AN with HA. The scatter plot showed a monotonic relationship , so I ran Spearmans rho. I thought the higher specificity of Kendall's Tau would be useful, because of its accuracy over Spearman with smaller sample sizes. I also ran Somer's d, since I have dependent and independent variables. All showed positive correlations at .001. However, I'm worried that I'm missing something more essential with my data or that I've missed assumptions that made these metrics inappropriate to begin with. From your responses, I'm even more nervous. Thank you again for the interest in my situation and your helpful insights. I wish my committee and those I approached for participation were as engaged as you are. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Maguin, Eugene
Gene,
Thanks again for your response. The observations were made over the course of 5 minute intervals for 50 total minutes per student, which I aggregated to 10 segments per 50 minute period. Any given student could have multiple moments of AN and HA during the 5 minute segment coded. That's why St1 got a score of 11 (anxiety) and 7 (hyperactivity) in T1 (which is the summed observations in the first 5 minute segment). I did this because some subjects had over 300 observed moments of AN during the 50 minute period (super anxious kids, to put it clinically) and I wasn't sure that I'd get more bang from each individual observation in my data set over the totals for each 5 minute segment. The data set with all anxiety tallies across all students was 1074 data points. Another reason I chose to use the aggregate 5 minute segment totals was that it seemed to me that 1/0 coding for each moment observed resulted in discrete data, while the total for any given 5 minute segment was continuous, and continuous data might provide more options for analysis, in spite of the fact that the data turned out to be non-normalized. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by PsyDStats
One issue someone might raise is the oversampling in the sense you are getting up to 10 values for each of your seven subjects. This causes your n to be 10 times greater than it really is. To evaluate your data correctly you should be taking the mean value for anxiety and the other dependent variable and do your correlations on an n=7, not each pair of observations. Also, since some people have more zeros than others, those people are over-represented in your data.
Michael Kruger
'A True Prince' Statistical Analyst C.S. Mott Center Dept. of OB/GYN WSU School of Medicine 248-895-4728 From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Monday, March 5, 2018 10:47:46 AM To: [hidden email] Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple? First of all, THANK you very much for everyone's responses.
Secondly, yes, an N of 7 is sparse. The 0 means no observations. I'm hesitant to remove St1 because of the already small sample size, even though it lacks . I oriented the data vertically as in my first scenario and ran nonparametric stats comparing AN with HA. The scatter plot showed a monotonic relationship , so I ran Spearmans rho. I thought the higher specificity of Kendall's Tau would be useful, because of its accuracy over Spearman with smaller sample sizes. I also ran Somer's d, since I have dependent and independent variables. All showed positive correlations at .001. However, I'm worried that I'm missing something more essential with my data or that I've missed assumptions that made these metrics inappropriate to begin with. From your responses, I'm even more nervous. Thank you again for the interest in my situation and your helpful insights. I wish my committee and those I approached for participation were as engaged as you are. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
You make a good point about the oversampling. The nonparametric correlations I ran on the t1-t10 data set were all significant (> .8 p = .0001), and the same metrics on the data set with just 7 means were not significant. How best to understand this difference? Also, you mentioned that some have more zeros and that they would be overrepresented in the data set. Did you mean the subjects with fewer zeros or the subjects with more zeros? Thanks. On Mon, Mar 5, 2018 at 9:16 PM, Michael Kruger <[hidden email]> wrote:
|
In reply to this post by PsyDStats
If I understand how you ran the statistics that you ran, in no case did you look at the within-subject correlation. So, in no case did you have a test based on your sample size of 7, but, rather, on the 50-some periods
of observation. That does not give a valid test. If you want a between-subject correlation, look at the r for the sample of 7 scores, one per person. "People who are higher on the one are also higher on the other" is the between-subject test.
The discriminant function that I suggested earlier will give you the "within-subjects correlation", which removes the mean levels for each subject. It will give you a valid test.
When I first looked at the scores, I wondered if the point-differences at the low end should be more important than the point-differences at the high end. Since these scores are described as counts, you will likely have
a more robust analysis - and one where equal intervals are better respected - if you take the square-root of each count as your score to use in an analysis.
-- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Monday, March 5, 2018 10:47:46 AM To: [hidden email] Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple? First of all, THANK you very much for everyone's responses.
Secondly, yes, an N of 7 is sparse. The 0 means no observations. I'm hesitant to remove St1 because of the already small sample size, even though it lacks . I oriented the data vertically as in my first scenario and ran nonparametric stats comparing AN with HA. The scatter plot showed a monotonic relationship , so I ran Spearmans rho. I thought the higher specificity of Kendall's Tau would be useful, because of its accuracy over Spearman with smaller sample sizes. I also ran Somer's d, since I have dependent and independent variables. All showed positive correlations at .001. However, I'm worried that I'm missing something more essential with my data or that I've missed assumptions that made these metrics inappropriate to begin with. From your responses, I'm even more nervous. Thank you again for the interest in my situation and your helpful insights. I wish my committee and those I approached for participation were as engaged as you are. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by PsyDStats
If you are using each pair of observations as a case, then those subjects with more observations will be over-represented in your analysis than those with fewer observations.
Michael Kruger
'A True Prince' Statistical Analyst C.S. Mott Center Dept. of OB/GYN WSU School of Medicine 248-895-4728 From: Michael Bates <[hidden email]>
Sent: Monday, March 5, 2018 11:22:35 PM To: Michael Kruger Cc: [hidden email] Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple? You make a good point about the oversampling. The nonparametric correlations I ran on the t1-t10 data set were all significant (> .8 p = .0001), and the same metrics on the data set with just 7 means were not significant. How best to understand
this difference? Also, you mentioned that some have more zeros and that they would be overrepresented in the data set. Did you mean the subjects with fewer zeros or the subjects with more zeros? Thanks.
On Mon, Mar 5, 2018 at 9:16 PM, Michael Kruger
<[hidden email]> wrote:
|
In reply to this post by PsyDStats
Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down. As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures. Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA. I structured the data: Mean_AN Mean_HA St_1 10.33 10.00 St_2 28.67 14.83 St_3 26.17 16.67 St_4 25.10 26.30 St_5 20.50 22.30 St_6 41.50 25.70 St_7 31.75 30.50 If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)? Seriously, thanks for your thoughtful feedback. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
"the data also are not normally distributed,"
There is no assumption that the *data*** are normally distributed. For uses of the general linear model (regression, anova, correlations, etc., etc.) it is desirable that the *residuals * (aka errors in fit) are not very discrepant form normally distributed. Check to see whether CATREG handles repeated measures. Since CATREG has actual tests of whether there is a better fit with ordinal vs continuous assumptions it may be a way to look at your data. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by PsyDStats
I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats Sent: Sunday, March 11, 2018 2:47 PM To: [hidden email] Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple? Okay, so I've dabbled a little and successfully confused myself even more. Let me boil things down. As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures. Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA. I structured the data: Mean_AN Mean_HA St_1 10.33 10.00 St_2 28.67 14.83 St_3 26.17 16.67 St_4 25.10 26.30 St_5 20.50 22.30 St_6 41.50 25.70 St_7 31.75 30.50 If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)? Seriously, thanks for your thoughtful feedback. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Art Kendall
The widespread belief that the data need to be normally distributed is one of
the biggest statistical misconceptions people have, I reckon. And I think it comes down to not having a clear grasp of the distinctions between 3 different distributions: 1) The population distribution 2) The distributions for given samples from that population 3) The sampling distribution for some statistic Here is a figure I saw recently. I think it illustrates these distinctions very nicely. http://slideplayer.com/slide/8557877/26/images/8/Population+Distributions+vs.+Sampling+Distributions.jpg It uses a binary variable to illustrate, but I think one can easily imagine a corresponding figure where the Population and Sample distributions are continuous and where the sampling distribution is the sampling distribution of the mean rather than the sampling distribution of a proportion. If we use the single sample z-test to illustrate, it is not the sample distribution that needs to be normal, nor is it the population distribution. Rather, it is the /sampling distribution of the mean/ that needs to be (approximately) normal for the z-test to be valid. (The scores need to be independent too, of course.) If we shift the context to linear regression, I contend that it is the /sampling distributions of the regression parameters/ that need to be approximately normal in order for the t-tests on them to be valid, and for the F-test for the overall model to be valid. Again, the observations need to be independent, and the errors need to be uncorrelated with the explanatory variables. But the (approximate) normality requirement apples to the /sampling distributions of the parameters/. I know that the normality assumption for OLS regression is often said to apply to the errors. I have said that myself many times. But of late, I have come to the view that normality of the errors is a /sufficient/, but not a /necessary/ condition. The necessary normality condition is approximate normality of the sampling distributions of the parameters. And as n increases, that condition will be met, even if the errors are not normally distributed. Some of my thinking on this is due to reading what Jeffrey Wooldridge says about the assumptions for OLS regression in his popular econometrics textbook. I've attached a small set of slides in which I've summarized his main points. OLS_regression_assumptions_Wooldridge.pdf <http://spssx-discussion.1045642.n5.nabble.com/file/t7186/OLS_regression_assumptions_Wooldridge.pdf> Members who do not read the list via Nabble can find a link to download the file by viewing the thread here: http://spssx-discussion.1045642.n5.nabble.com/2-Variables-7-cases-10-observations-Simple-td5735614.html Finally, I always try to say approximate normality rather than normality, because I believe George Box was right when he commented that in the real world, normal distributions (and straight lines) don't really exist. Nevertheless, they can serve as useful approximations (or models) of real-world phenomena. See section 2.5 of this famous article: http://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf Cheers, Bruce Art Kendall wrote > "the data also are not normally distributed," > > There is no assumption that the *data*** are normally distributed. > > For uses of the general linear model (regression, anova, correlations, > etc., > etc.) it is desirable that the *residuals * (aka errors in fit) are not > very > discrepant form normally distributed. > > Check to see whether CATREG handles repeated measures. > Since CATREG has actual tests of whether there is a better fit with > ordinal > vs continuous assumptions it may be a way to look at your data. > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Maguin, Eugene
The raters (more than one for inter-rater reliability reasons) observed videos of children in a classroom setting for 50 minutes. Coding tracked defined anxiety behavior and defined hyperactivity behavior. The coders had coding sheets that had columns for Time (in seconds), AN and HA. They put a tally "1" in the variable column at the noted second if either of the behaviors was observed. So, for a given subject, the coding sheet had: Time AN HA 00:00:00 1 1 00:00:01 1 00:00:02 1 ...etc The tallies of both variables were totaled. In addition to the individual observations, time was aggregated in 5 minute blocks. If the child left the observation session (doctor's appointment, nurse's office, whatever), the totals for a given 5 minute block ended up totaled as 0. Each 5 minute aggregate was labeled T1, T2, etc. Only 7 children agreed to participate. Don't get me started.... My first research question related to the the amount of anxiety and hyperactivity: Do highly anxious children exhibit a higher level of HA than low anxiety children? Originally, I thought to run frequencies/correlations on the second-by-second data for each student. Then I thought to run the 5 minute block aggregates. I found that my data to be non-normalized, and I decided that Pearson's r was not appropriate to determine the correlation between the two variables. I ran a scatter plot to determine monotonicity and found an adequate relationship. So, I ran Spearman and Kendall on the 5 minute blocks, then Somers (because of the IV/DV nature of the data). I found a strong positive correlation in each output. As the comments came back from the SPSSX Extremely! helpful responders, I understood that my data might be overstated and I ran the same procedures on the means. This yielded no significant correlation in either direction. (Poop! Technical term?) That's the set up, in a nutshell. On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote: I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that? |
Ok, that’s very helpful, I think. So, at any given second there can be four code pairs (AN, HA): (0,0); (0,1); (1,0); (1,1). Except for kids leaving the setting,
there would be 3000 (50*60) records per kid; although many or even all records might be (0,0) for a given kid. I understand that it is extremely likely that your data is not in this form but if your data records were Kidid, observation, Anx, HA, where observation
equals time coded in seconds, you could split the file by kidid and crosstab Anx by HA.
Gene Maguin From: Michael Bates <[hidden email]>
The raters (more than one for inter-rater reliability reasons) observed videos of children in a classroom setting for 50 minutes. Coding tracked defined anxiety behavior and defined hyperactivity behavior. The coders had coding sheets that
had columns for Time (in seconds), AN and HA. They put a tally "1" in the variable column at the noted second if either of the behaviors was observed. So, for a given subject, the coding sheet had: Time AN HA 00:00:00 1 1 00:00:01 1 00:00:02 1 ...etc The tallies of both variables were totaled. In addition to the individual observations, time was aggregated in 5 minute blocks. If the child left the observation session (doctor's appointment, nurse's office, whatever), the totals for a
given 5 minute block ended up totaled as 0. Each 5 minute aggregate was labeled T1, T2, etc. Only 7 children agreed to participate. Don't get me started.... My first research question related to the the amount of anxiety and hyperactivity: Do highly anxious children exhibit a higher level of HA than low anxiety children? Originally, I thought to run frequencies/correlations on the second-by-second
data for each student. Then I thought to run the 5 minute block aggregates. I found that my data to be non-normalized, and I decided that Pearson's r was not appropriate to determine the correlation between the two variables. I ran a scatter plot to determine
monotonicity and found an adequate relationship. So, I ran Spearman and Kendall on the 5 minute blocks, then Somers (because of the IV/DV nature of the data). I found a strong positive correlation in each output. As the comments came back from the SPSSX Extremely!
helpful responders, I understood that my data might be overstated and I ran the same procedures on the means. This yielded no significant correlation in either direction. (Poop! Technical term?) Another research question related to the "causal" relationship between AN and HA. So, I coded any HA that followed withing 30 seconds of an AN observation (a theoretically defensible time frame) to indicate that for that pair of observations,
HA followed from AN. This I ran as the total of AN against the total HA within 30 seconds of AN by student (of the total AN observed for each student, how many were followed within 30 seconds by HA?). Interestingly,I found a strong positive relationship between
the variables. That's the set up, in a nutshell. On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote:
|
That's exactly right. It was #,# for 3000 data points per kid. Most were 0,0, In fact, I found 1074 individual seconds where one kids or another exhibited anxiety and fewer seconds of hyperactivity for any given child. My data files are as you describe: KidID Time AN HA You're saying I can create individual data sets for each child and run a cross tab AN by HA. It wouldn't matter about normal curve or not in that case, I could scrap the correlation metrics completely and rely on the cross tab alone. I would have to then describe the results by child in my results/discussion sections rather than having a statistic for overall relationship between variables, though my discussion would make that general statement based on the cross tabs. Have I got all that right? On Mon, Mar 12, 2018 at 11:13 AM, Maguin, Eugene <[hidden email]> wrote:
|
I’ll say Yes and No.
The within kid crosstabs will summarize the four outcomes and you could report the percentages for each outcome for each kid. The key word is “summarize”. (That’s
the Yes part). Of course, you can compute a chi-square value or a phi correlation but the chi square and phi significance test is based on the observations being independent. Yours are not. (That’s the No part). If summarizing is acceptable to your committee;
then, you’re done. That’s the key question. I want to point out that within your dataset, you can make up many different relationships to summarize. Right now, you are focusing on the percentage of observations
in which AN and HA both occurred at a given time point. Gene Maguin From: Michael Bates <[hidden email]>
That's exactly right. It was #,# for 3000 data points per kid. Most were 0,0, In fact, I found 1074 individual seconds where one kids or another exhibited anxiety and fewer seconds of hyperactivity for any given child. My data files are as you describe: KidID Time AN HA You're saying I can create individual data sets for each child and run a cross tab AN by HA. It wouldn't matter about normal curve or not in that case, I could scrap the correlation metrics completely and rely on the cross tab alone. I
would have to then describe the results by child in my results/discussion sections rather than having a statistic for overall relationship between variables, though my discussion would make that general statement based on the cross tabs. Have I got all that
right? On Mon, Mar 12, 2018 at 11:13 AM, Maguin, Eugene <[hidden email]> wrote:
|
Eugene, Thanks. So helpful. My only concern at this point is whether HA is up when AN is up and the converse. Also, do I need to include the 0's in my data set or only the hits? I assume yes, but I thought I'd ask, since many of my assumptions have turned out wrong. Here's the output from the cross tab for the first kid.
On Mon, Mar 12, 2018 at 1:29 PM, Maguin, Eugene <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |