SPSSX Discussion

2 Variables, 7 cases, 10 observations -- Simple?

Classic

List

Threaded

20 messages Options

PsyDStats

Mar 04, 2018; 5:46pm

2 Variables, 7 cases, 10 observations -- Simple?

8 posts

Mike

Mar 05, 2018; 2:06am

Re: 2 Variables, 7 cases, 10 observations -- Simple?

385 posts

Which data layout you choose to use will depend upon the hypotheses

you want to test. That being said, there is an obvious problem with the

second layout. I suggest you take a look at the following article in order

to understand the trip you have begun. Remember to follow the

yellow brick road.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396026/#R2

-Mike Palij

New York University

[hidden email]

On Sun, Mar 4, 2018 at 12:46 PM, PsyDStats <[hidden email]> wrote:

Good morning,

My dissertation has a data set that I believe is reasonably straightforward.
My hypotheses also seem basic. For the life of me, I can't figure it out in
SPSS.

I have 7 students. They were observed 10 times each T1...T2...etc. They were
observed for two variables: IV Anxiety, DV Hyperactivity. Basic question: as
anxiety increases, does hyperactivity increase. The obvious: I am trying to
group students into high and low anxiety and high and low hyperactivity. The
second question: does hyperactivity appear to follow from anxiety (if
hyperactivity was observed within 30 seconds of anxiety behavior = anxiety
led to hyperactivity in that instance). Forgetting the obvious theoretical
questions, Here is a sample of the data in the configuration I believe
should be in SPSS to get the comparisons I need.

Anx Hyp
St1_T1 11 7
St1_T2 9 11
St1_T3 11 12
St1_T4 0 0
St1_T5 0 0
St1_T6 0 0
St1_T7 0 0
St1_T8 0 0
St1_T9 0 0
St1_T10 0 0
St2_T1 16 7
St2_T2 32 21
St2_T3 33 10
St2_T4 30 13
St2_T5 25 13
St2_T6 36 25
St2_T7 0 0
St2_T8 0 0
St2_T9 0 0
St2_T10 0 0
St3_T1 31 10
St3_T2 25 17
St3_T3 31 14
St3_T4 21 25
St3_T5 28 19
St3_T6 21 15
St3_T7 0 0
St3_T8 0 0
St3_T9 0 0
St3_T10 0 0
St4_T1 25 22
St4_T2 25 25
St4_T3 30 37
St4_T4 25 32
St4_T5 29 27
St4_T6 35 35
St4_T7 21 28
St4_T8 27 26
St4_T9 17 16
St4_T10 17 15
...etc

The other way to orient the data, it seems to me, is:

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
St_1_Anx 11 9 11 0 0 0 0 0 0 0
St_2_Anx 16 32 33 30 25 36 0 0 0 0
St_3_Anx 31 25 31 21 28 21 0 0 0 0
St_4_Anx 25 25 30 25 29 35 21 27 17 17
St_5_Anx 23 23 31 27 19 19 16 19 13 15
St_6_Anx 34 51 53 53 42 52 43 32 46 9
St_7_Anx 29 31 31 36 0 0 0 0 0 0

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
St_1_Hyp 7 11 12 0 0 0 0 0 0 0
St_2_Hyp 7 21 10 13 13 25 0 0 0 0
St_3_Hyp 10 17 14 25 19 15 0 0 0 0
St_4_Hyp 22 25 37 32 27 35 28 26 16 15
St_5_Hyp 1 10 30 17 19 35 26 30 39 16
St_6_Hyp 16 32 19 33 22 51 40 21 23
St_7_Hyp 34 30 23 35 0 0 0 0 0 0

My questions: Which data orientation is correct, and which process should I
run in order to group the students in high/low anx and high/low hyp?

I'm grateful for any suggestions that might be offered.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Mar 05, 2018; 3:34am

Re: 2 Variables, 7 cases, 10 observations -- Simple?

1067 posts

In reply to this post by PsyDStats

First reactions: The zeroes sure do look a lot like they must be "Missing" - which means that the program options will be greater when using the long format. You could, for instance, use discriminant function on seven Groups (= subjects) and see how they group when plotted.

However, cursory examination suggests that subject 1, with only 3 Times, barely overlaps the other subjects. Subject 1 is low, the rest, high? I think you need more data.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Sunday, March 4, 2018 12:46:24 PM
To: [hidden email]
Subject: 2 Variables, 7 cases, 10 observations -- Simple?

Good morning,

My dissertation has a data set that I believe is reasonably straightforward.
My hypotheses also seem basic. For the life of me, I can't figure it out in
SPSS.

I have 7 students. They were observed 10 times each T1...T2...etc. They were
observed for two variables: IV Anxiety, DV Hyperactivity. Basic question: as
anxiety increases, does hyperactivity increase. The obvious: I am trying to
group students into high and low anxiety and high and low hyperactivity. The
second question: does hyperactivity appear to follow from anxiety (if
hyperactivity was observed within 30 seconds of anxiety behavior = anxiety
led to hyperactivity in that instance). Forgetting the obvious theoretical
questions, Here is a sample of the data in the configuration I believe
should be in SPSS to get the comparisons I need.

               Anx      Hyp
St1_T1 11      7
St1_T2 9       11
St1_T3 11      12
St1_T4 0       0
St1_T5 0       0
St1_T6 0       0
St1_T7 0       0
St1_T8 0       0
St1_T9 0       0
St1_T10 0       0
St2_T1 16      7
St2_T2 32      21
St2_T3 33      10
St2_T4 30      13
St2_T5 25      13
St2_T6 36      25
St2_T7 0       0
St2_T8 0       0
St2_T9 0       0
St2_T10 0       0
St3_T1 31      10
St3_T2 25      17
St3_T3 31      14
St3_T4 21      25
St3_T5 28      19
St3_T6 21      15
St3_T7 0       0
St3_T8 0       0
St3_T9 0       0
St3_T10 0       0
St4_T1 25      22
St4_T2 25      25
St4_T3 30      37
St4_T4 25      32
St4_T5 29      27
St4_T6 35      35
St4_T7 21      28
St4_T8 27      26
St4_T9 17      16
St4_T10 17      15
...etc

The other way to orient the data, it seems to me, is:

                T1      T2      T3      T4      T5      T6      T7      T8      T9      T10
St_1_Anx        11      9       11      0       0       0       0       0       0       0
St_2_Anx        16      32      33      30      25      36      0       0       0       0
St_3_Anx        31      25      31      21      28      21      0       0       0       0
St_4_Anx        25      25      30      25      29      35      21      27      17      17
St_5_Anx        23      23      31      27      19      19      16      19      13      15
St_6_Anx        34      51      53      53      42      52      43      32      46      9
St_7_Anx        29      31      31      36      0       0       0       0       0       0

               T1       T2      T3      T4      T5      T6      T7      T8      T9      T10
St_1_Hyp        7       11      12      0       0       0       0       0       0       0
St_2_Hyp        7       21      10      13      13      25      0       0       0       0
St_3_Hyp        10      17      14      25      19      15      0       0       0       0
St_4_Hyp        22      25      37      32      27      35      28      26      16      15
St_5_Hyp        1       10      30      17      19      35      26      30      39      16
St_6_Hyp        16      32      19      33      22      51      40      21      23
St_7_Hyp        34      30      23      35      0       0       0       0       0       0

My questions: Which data orientation is correct, and which process should I
run in order to group the students in high/low anx and high/low hyp?

I'm grateful for any suggestions that might be offered.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Mar 05, 2018; 12:47pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

2500 posts

In reply to this post by PsyDStats

Check the archives of this list for discussions pf coarsening measures.

In psychology, there is a variation in vocabulary. Some speak of the
invidious median split and others speak of the nefarious median split.

Even if this is only a pilot for your dissertation study, with such a tiny
data set you want to throw away as little data as possible.

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Maguin, Eugene

Mar 05, 2018; 2:24pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

1973 posts

In reply to this post by PsyDStats

Maybe I missed this in your description but what are we looking at? I understand 7 persons observed 10 times and anxiety (AN) and hyperactivity (HA) observed at each time point. How did person St1 at T1 get a score of 11 and 7? A multi-item checklist? You mention 'within 30 seconds'. Does this mean that people were observed at 30 second intervals?

I agree with Art about coarsening data. You don't have much data to begin with and you're giving information away by dichotomizing. (Little rant: Unless these 7 people are extraordinarily rare or special, your committee did you a disservice by signing off on an N of 7 people.)

The basic question: Are you wanting to know if (a) the within time point correlation between AN and HA is positive or (b) are you wanting to know if the correlation between AN at t(i) and HA at t(i+1) is positive?

Question 2: This question reads like you want to treat AN and HA as absent (AN=0; HA=0) or present (AN>0; HA>0)? Is this true?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of PsyDStats
Sent: Sunday, March 4, 2018 12:46 PM
To: [hidden email]
Subject: 2 Variables, 7 cases, 10 observations -- Simple?

Good morning,

My dissertation has a data set that I believe is reasonably straightforward.
My hypotheses also seem basic. For the life of me, I can't figure it out in SPSS.

I have 7 students. They were observed 10 times each T1...T2...etc. They were observed for two variables: IV Anxiety, DV Hyperactivity. Basic question: as anxiety increases, does hyperactivity increase. The obvious: I am trying to group students into high and low anxiety and high and low hyperactivity. The second question: does hyperactivity appear to follow from anxiety (if hyperactivity was observed within 30 seconds of anxiety behavior = anxiety led to hyperactivity in that instance). Forgetting the obvious theoretical questions, Here is a sample of the data in the configuration I believe should be in SPSS to get the comparisons I need.

Anx Hyp
St1_T1 11 7
St1_T2 9 11
St1_T3 11 12
St1_T4 0 0
St1_T5 0 0
St1_T6 0 0
St1_T7 0 0
St1_T8 0 0
St1_T9 0 0
St1_T10 0 0
St2_T1 16 7
St2_T2 32 21
St2_T3 33 10
St2_T4 30 13
St2_T5 25 13
St2_T6 36 25
St2_T7 0 0
St2_T8 0 0
St2_T9 0 0
St2_T10 0 0
St3_T1 31 10
St3_T2 25 17
St3_T3 31 14
St3_T4 21 25
St3_T5 28 19
St3_T6 21 15
St3_T7 0 0
St3_T8 0 0
St3_T9 0 0
St3_T10 0 0
St4_T1 25 22
St4_T2 25 25
St4_T3 30 37
St4_T4 25 32
St4_T5 29 27
St4_T6 35 35
St4_T7 21 28
St4_T8 27 26
St4_T9 17 16
St4_T10 17 15
...etc

The other way to orient the data, it seems to me, is:

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
St_1_Anx 11 9 11 0 0 0 0 0 0 0
St_2_Anx 16 32 33 30 25 36 0 0 0 0
St_3_Anx 31 25 31 21 28 21 0 0 0 0
St_4_Anx 25 25 30 25 29 35 21 27 17 17
St_5_Anx 23 23 31 27 19 19 16 19 13 15
St_6_Anx 34 51 53 53 42 52 43 32 46 9
St_7_Anx 29 31 31 36 0 0 0 0 0 0

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
St_1_Hyp 7 11 12 0 0 0 0 0 0 0
St_2_Hyp 7 21 10 13 13 25 0 0 0 0
St_3_Hyp 10 17 14 25 19 15 0 0 0 0
St_4_Hyp 22 25 37 32 27 35 28 26 16 15
St_5_Hyp 1 10 30 17 19 35 26 30 39 16
St_6_Hyp 16 32 19 33 22 51 40 21 23
St_7_Hyp 34 30 23 35 0 0 0 0 0 0

My questions: Which data orientation is correct, and which process should I run in order to group the students in high/low anx and high/low hyp?

I'm grateful for any suggestions that might be offered.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

PsyDStats

Mar 05, 2018; 3:47pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

In reply to this post by PsyDStats

First of all, THANK you very much for everyone's responses.

Secondly, yes, an N of 7 is sparse. The 0 means no observations. I'm
hesitant to remove St1 because of the already small sample size, even though
it lacks . I oriented the data vertically as in my first scenario and ran
nonparametric stats comparing AN with HA. The scatter plot showed a
monotonic relationship , so I ran Spearmans rho. I thought the higher
specificity of Kendall's Tau would be useful, because of its accuracy over
Spearman with smaller sample sizes. I also ran Somer's d, since I have
dependent and independent variables. All showed positive correlations at
.001. However, I'm worried that I'm missing something more essential with my
data or that I've missed assumptions that made these metrics inappropriate
to begin with. From your responses, I'm even more nervous.

Thank you again for the interest in my situation and your helpful insights.
I wish my committee and those I approached for participation were as engaged
as you are.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

PsyDStats

Mar 05, 2018; 4:21pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

In reply to this post by Maguin, Eugene

Gene,

Thanks again for your response.

The observations were made over the course of 5 minute intervals for 50
total minutes per student, which I aggregated to 10 segments per 50 minute
period. Any given student could have multiple moments of AN and HA during
the 5 minute segment coded. That's why St1 got a score of 11 (anxiety) and
7 (hyperactivity) in T1 (which is the summed observations in the first 5
minute segment). I did this because some subjects had over 300 observed
moments of AN during the 50 minute period (super anxious kids, to put it
clinically) and I wasn't sure that I'd get more bang from each individual
observation in my data set over the totals for each 5 minute segment. The
data set with all anxiety tallies across all students was 1074 data points.
Another reason I chose to use the aggregate 5 minute segment totals was that
it seemed to me that 1/0 coding for each moment observed resulted in
discrete data, while the total for any given 5 minute segment was
continuous, and continuous data might provide more options for analysis, in
spite of the fact that the data turned out to be non-normalized.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kruger, Michael

Mar 06, 2018; 3:16am

Re: 2 Variables, 7 cases, 10 observations -- Simple?

5 posts

In reply to this post by PsyDStats

One issue someone might raise is the oversampling in the sense you are getting up to 10 values for each of your seven subjects. This causes your n to be 10 times greater than it really is. To evaluate your data correctly you should be taking the mean value for anxiety and the other dependent variable and do your correlations on an n=7, not each pair of observations. Also, since some people have more zeros than others, those people are over-represented in your data.

Michael Kruger
'A True Prince'
Statistical Analyst
C.S. Mott Center
Dept. of OB/GYN
WSU School of Medicine
248-895-4728

From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Monday, March 5, 2018 10:47:46 AM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

PsyDStats

Mar 06, 2018; 4:22am

Re: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

You make a good point about the oversampling. The nonparametric correlations I ran on the t1-t10 data set were all significant (> .8 p = .0001), and the same metrics on the data set with just 7 means were not significant. How best to understand this difference? Also, you mentioned that some have more zeros and that they would be overrepresented in the data set. Did you mean the subjects with fewer zeros or the subjects with more zeros? Thanks.

On Mon, Mar 5, 2018 at 9:16 PM, Michael Kruger <[hidden email]> wrote:

One issue someone might raise is the oversampling in the sense you are getting up to 10 values for each of your seven subjects. This causes your n to be 10 times greater than it really is. To evaluate your data correctly you should be taking the mean value for anxiety and the other dependent variable and do your correlations on an n=7, not each pair of observations. Also, since some people have more zeros than others, those people are over-represented in your data.

Michael Kruger
'A True Prince'
Statistical Analyst
C.S. Mott Center
Dept. of OB/GYN
WSU School of Medicine
<a href="tel:(248)%20895-4728" value="+12488954728" target="_blank">248-895-4728

From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Monday, March 5, 2018 10:47:46 AM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

First of all, THANK you very much for everyone's responses.

Secondly, yes, an N of 7 is sparse. The 0 means no observations. I'm
hesitant to remove St1 because of the already small sample size, even though
it lacks . I oriented the data vertically as in my first scenario and ran
nonparametric stats comparing AN with HA. The scatter plot showed a
monotonic relationship , so I ran Spearmans rho. I thought the higher
specificity of Kendall's Tau would be useful, because of its accuracy over
Spearman with smaller sample sizes. I also ran Somer's d, since I have
dependent and independent variables. All showed positive correlations at
.001. However, I'm worried that I'm missing something more essential with my
data or that I've missed assumptions that made these metrics inappropriate
to begin with. From your responses, I'm even more nervous.

Thank you again for the interest in my situation and your helpful insights.
I wish my committee and those I approached for participation were as engaged
as you are.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

Rich Ulrich

Mar 06, 2018; 6:37am

Re: 2 Variables, 7 cases, 10 observations -- Simple?

1067 posts

In reply to this post by PsyDStats

If I understand how you ran the statistics that you ran, in no case did you look at the within-subject correlation. So, in no case did you have a test based on your sample size of 7, but, rather, on the 50-some periods of observation. That does not give a valid test. If you want a between-subject correlation, look at the r for the sample of 7 scores, one per person. "People who are higher on the one are also higher on the other" is the between-subject test.

The discriminant function that I suggested earlier will give you the "within-subjects correlation", which removes the mean levels for each subject. It will give you a valid test.

When I first looked at the scores, I wondered if the point-differences at the low end should be more important than the point-differences at the high end. Since these scores are described as counts, you will likely have a more robust analysis - and one where equal intervals are better respected - if you take the square-root of each count as your score to use in an analysis.

Rich Ulrich

Kruger, Michael

Mar 06, 2018; 11:25am

Re: 2 Variables, 7 cases, 10 observations -- Simple?

5 posts

In reply to this post by PsyDStats

If you are using each pair of observations as a case, then those subjects with more observations will be over-represented in your analysis than those with fewer observations.

Michael Kruger
'A True Prince'
Statistical Analyst
C.S. Mott Center
Dept. of OB/GYN
WSU School of Medicine
248-895-4728

From: Michael Bates <[hidden email]>
Sent: Monday, March 5, 2018 11:22:35 PM
To: Michael Kruger
Cc: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

On Mon, Mar 5, 2018 at 9:16 PM, Michael Kruger <[hidden email]> wrote:

One issue someone might raise is the oversampling in the sense you are getting up to 10 values for each of your seven subjects. This causes your n to be 10 times greater than it really is. To evaluate your data correctly you should be taking the mean value for anxiety and the other dependent variable and do your correlations on an n=7, not each pair of observations. Also, since some people have more zeros than others, those people are over-represented in your data.

Michael Kruger
'A True Prince'
Statistical Analyst
C.S. Mott Center
Dept. of OB/GYN
WSU School of Medicine
<a href="tel:(248)%20895-4728" value="+12488954728" target="_blank">248-895-4728

From: SPSSX(r) Discussion <[hidden email]> on behalf of PsyDStats <[hidden email]>
Sent: Monday, March 5, 2018 10:47:46 AM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

First of all, THANK you very much for everyone's responses.

Secondly, yes, an N of 7 is sparse. The 0 means no observations. I'm
hesitant to remove St1 because of the already small sample size, even though
it lacks . I oriented the data vertically as in my first scenario and ran
nonparametric stats comparing AN with HA. The scatter plot showed a
monotonic relationship , so I ran Spearmans rho. I thought the higher
specificity of Kendall's Tau would be useful, because of its accuracy over
Spearman with smaller sample sizes. I also ran Somer's d, since I have
dependent and independent variables. All showed positive correlations at
.001. However, I'm worried that I'm missing something more essential with my
data or that I've missed assumptions that made these metrics inappropriate
to begin with. From your responses, I'm even more nervous.

Thank you again for the interest in my situation and your helpful insights.
I wish my committee and those I approached for participation were as engaged
as you are.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

PsyDStats

Mar 11, 2018; 6:47pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

In reply to this post by PsyDStats

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between
Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside
from being a very small N, the data also are not normally distributed, so
I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the
two variables. I re-ran the Spearman, Kendall's, and Somer's and found that
there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and
if the N was more substantial, and if I had only kept up with my piano
lessons as a kid and I were now conducting at the Met instead of a doctoral
student in psychology, would you say that I've chosen the correct measures
for my data set and structured it properly in SPSS? Would you say that there
is a chance the outcomes of the procedures I've run are useful to me in
accepting or rejecting the Null Hypothesis, which in this case is that I
should go back to piano (My mother's null hypothesis that I would amount to
Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Mar 12, 2018; 12:59pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

2500 posts

"the data also are not normally distributed,"

There is no assumption that the *data*** are normally distributed.

For uses of the general linear model (regression, anova, correlations, etc.,
etc.) it is desirable that the *residuals * (aka errors in fit) are not very
discrepant form normally distributed.

Check to see whether CATREG handles repeated measures.
Since CATREG has actual tests of whether there is a better fit with ordinal
vs continuous assumptions it may be a way to look at your data.

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Maguin, Eugene

Mar 12, 2018; 2:13pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

1973 posts

In reply to this post by PsyDStats

I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats
Sent: Sunday, March 11, 2018 2:47 PM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Mar 12, 2018; 2:34pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

Administrator

3512 posts

In reply to this post by Art Kendall

The widespread belief that the data need to be normally distributed is one of
the biggest statistical misconceptions people have, I reckon. And I think
it comes down to not having a clear grasp of the distinctions between 3
different distributions:

1) The population distribution
2) The distributions for given samples from that population
3) The sampling distribution for some statistic

Here is a figure I saw recently. I think it illustrates these distinctions
very nicely.

http://slideplayer.com/slide/8557877/26/images/8/Population+Distributions+vs.+Sampling+Distributions.jpg

It uses a binary variable to illustrate, but I think one can easily imagine
a corresponding figure where the Population and Sample distributions are
continuous and where the sampling distribution is the sampling distribution
of the mean rather than the sampling distribution of a proportion.

If we use the single sample z-test to illustrate, it is not the sample
distribution that needs to be normal, nor is it the population distribution.
Rather, it is the /sampling distribution of the mean/ that needs to be
(approximately) normal for the z-test to be valid. (The scores need to be
independent too, of course.)

If we shift the context to linear regression, I contend that it is the
/sampling distributions of the regression parameters/ that need to be
approximately normal in order for the t-tests on them to be valid, and for
the F-test for the overall model to be valid. Again, the observations need
to be independent, and the errors need to be uncorrelated with the
explanatory variables. But the (approximate) normality requirement apples
to the /sampling distributions of the parameters/.

I know that the normality assumption for OLS regression is often said to
apply to the errors. I have said that myself many times. But of late, I
have come to the view that normality of the errors is a /sufficient/, but
not a /necessary/ condition. The necessary normality condition is
approximate normality of the sampling distributions of the parameters. And
as n increases, that condition will be met, even if the errors are not
normally distributed.

Some of my thinking on this is due to reading what Jeffrey Wooldridge says
about the assumptions for OLS regression in his popular econometrics
textbook. I've attached a small set of slides in which I've summarized his
main points.

OLS_regression_assumptions_Wooldridge.pdf
<http://spssx-discussion.1045642.n5.nabble.com/file/t7186/OLS_regression_assumptions_Wooldridge.pdf>

Members who do not read the list via Nabble can find a link to download the
file by viewing the thread here:

http://spssx-discussion.1045642.n5.nabble.com/2-Variables-7-cases-10-observations-Simple-td5735614.html

Finally, I always try to say approximate normality rather than normality,
because I believe George Box was right when he commented that in the real
world, normal distributions (and straight lines) don't really exist.
Nevertheless, they can serve as useful approximations (or models) of
real-world phenomena. See section 2.5 of this famous article:

http://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf

Cheers,
Bruce

Art Kendall wrote

> "the data also are not normally distributed,"
>
> There is no assumption that the *data*** are normally distributed.
>
> For uses of the general linear model (regression, anova, correlations,
> etc.,
> etc.) it is desirable that the *residuals * (aka errors in fit) are not
> very
> discrepant form normally distributed.
>
> Check to see whether CATREG handles repeated measures.
> Since CATREG has actual tests of whether there is a better fit with
> ordinal
> vs continuous assumptions it may be a way to look at your data.
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

... [show rest of quote]

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

PsyDStats

Mar 12, 2018; 3:42pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

In reply to this post by Maguin, Eugene

The raters (more than one for inter-rater reliability reasons) observed videos of children in a classroom setting for 50 minutes. Coding tracked defined anxiety behavior and defined hyperactivity behavior. The coders had coding sheets that had columns for Time (in seconds), AN and HA. They put a tally "1" in the variable column at the noted second if either of the behaviors was observed. So, for a given subject, the coding sheet had:

Time AN HA

00:00:00 1 1

00:00:01 1

00:00:02 1

...etc

The tallies of both variables were totaled. In addition to the individual observations, time was aggregated in 5 minute blocks. If the child left the observation session (doctor's appointment, nurse's office, whatever), the totals for a given 5 minute block ended up totaled as 0. Each 5 minute aggregate was labeled T1, T2, etc. Only 7 children agreed to participate. Don't get me started....

My first research question related to the the amount of anxiety and hyperactivity: Do highly anxious children exhibit a higher level of HA than low anxiety children? Originally, I thought to run frequencies/correlations on the second-by-second data for each student. Then I thought to run the 5 minute block aggregates. I found that my data to be non-normalized, and I decided that Pearson's r was not appropriate to determine the correlation between the two variables. I ran a scatter plot to determine monotonicity and found an adequate relationship. So, I ran Spearman and Kendall on the 5 minute blocks, then Somers (because of the IV/DV nature of the data). I found a strong positive correlation in each output. As the comments came back from the SPSSX Extremely! helpful responders, I understood that my data might be overstated and I ran the same procedures on the means. This yielded no significant correlation in either direction. (Poop! Technical term?)

Another research question related to the "causal" relationship between AN and HA. So, I coded any HA that followed withing 30 seconds of an AN observation (a theoretically defensible time frame) to indicate that for that pair of observations, HA followed from AN. This I ran as the total of AN against the total HA within 30 seconds of AN by student (of the total AN observed for each student, how many were followed within 30 seconds by HA?). Interestingly,I found a strong positive relationship between the variables.

That's the set up, in a nutshell.

On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote:

I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats
Sent: Sunday, March 11, 2018 2:47 PM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

Maguin, Eugene

Mar 12, 2018; 4:13pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

1973 posts

Ok, that’s very helpful, I think. So, at any given second there can be four code pairs (AN, HA): (0,0); (0,1); (1,0); (1,1). Except for kids leaving the setting, there would be 3000 (50*60) records per kid; although many or even all records might be (0,0) for a given kid. I understand that it is extremely likely that your data is not in this form but if your data records were Kidid, observation, Anx, HA, where observation equals time coded in seconds, you could split the file by kidid and crosstab Anx by HA.

Gene Maguin

From: Michael Bates <[hidden email]>
Sent: Monday, March 12, 2018 11:42 AM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Time AN HA

00:00:00 1 1

00:00:01 1

00:00:02 1

...etc

That's the set up, in a nutshell.

On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote:

I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats
Sent: Sunday, March 11, 2018 2:47 PM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

PsyDStats

Mar 12, 2018; 4:33pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

That's exactly right. It was #,# for 3000 data points per kid. Most were 0,0, In fact, I found 1074 individual seconds where one kids or another exhibited anxiety and fewer seconds of hyperactivity for any given child.

My data files are as you describe:

KidID Time AN HA

You're saying I can create individual data sets for each child and run a cross tab AN by HA. It wouldn't matter about normal curve or not in that case, I could scrap the correlation metrics completely and rely on the cross tab alone. I would have to then describe the results by child in my results/discussion sections rather than having a statistic for overall relationship between variables, though my discussion would make that general statement based on the cross tabs. Have I got all that right?

On Mon, Mar 12, 2018 at 11:13 AM, Maguin, Eugene <[hidden email]> wrote:

Ok, that’s very helpful, I think. So, at any given second there can be four code pairs (AN, HA): (0,0); (0,1); (1,0); (1,1). Except for kids leaving the setting, there would be 3000 (50*60) records per kid; although many or even all records might be (0,0) for a given kid. I understand that it is extremely likely that your data is not in this form but if your data records were Kidid, observation, Anx, HA, where observation equals time coded in seconds, you could split the file by kidid and crosstab Anx by HA.

Gene Maguin

From: Michael Bates <[hidden email]>
Sent: Monday, March 12, 2018 11:42 AM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]

Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

The raters (more than one for inter-rater reliability reasons) observed videos of children in a classroom setting for 50 minutes. Coding tracked defined anxiety behavior and defined hyperactivity behavior. The coders had coding sheets that had columns for Time (in seconds), AN and HA. They put a tally "1" in the variable column at the noted second if either of the behaviors was observed. So, for a given subject, the coding sheet had:

Time AN HA

00:00:00 1 1

00:00:01 1

00:00:02 1

...etc

The tallies of both variables were totaled. In addition to the individual observations, time was aggregated in 5 minute blocks. If the child left the observation session (doctor's appointment, nurse's office, whatever), the totals for a given 5 minute block ended up totaled as 0. Each 5 minute aggregate was labeled T1, T2, etc. Only 7 children agreed to participate. Don't get me started....

My first research question related to the the amount of anxiety and hyperactivity: Do highly anxious children exhibit a higher level of HA than low anxiety children? Originally, I thought to run frequencies/correlations on the second-by-second data for each student. Then I thought to run the 5 minute block aggregates. I found that my data to be non-normalized, and I decided that Pearson's r was not appropriate to determine the correlation between the two variables. I ran a scatter plot to determine monotonicity and found an adequate relationship. So, I ran Spearman and Kendall on the 5 minute blocks, then Somers (because of the IV/DV nature of the data). I found a strong positive correlation in each output. As the comments came back from the SPSSX Extremely! helpful responders, I understood that my data might be overstated and I ran the same procedures on the means. This yielded no significant correlation in either direction. (Poop! Technical term?)

Another research question related to the "causal" relationship between AN and HA. So, I coded any HA that followed withing 30 seconds of an AN observation (a theoretically defensible time frame) to indicate that for that pair of observations, HA followed from AN. This I ran as the total of AN against the total HA within 30 seconds of AN by student (of the total AN observed for each student, how many were followed within 30 seconds by HA?). Interestingly,I found a strong positive relationship between the variables.

That's the set up, in a nutshell.

On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote:

I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats
Sent: Sunday, March 11, 2018 2:47 PM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

... [show rest of quote]

Maguin, Eugene

Mar 12, 2018; 6:29pm

Re: 2 Variables, 7 cases, 10 observations -- Simple?

1973 posts

I’ll say Yes and No.

The within kid crosstabs will summarize the four outcomes and you could report the percentages for each outcome for each kid. The key word is “summarize”. (That’s the Yes part). Of course, you can compute a chi-square value or a phi correlation but the chi square and phi significance test is based on the observations being independent. Yours are not. (That’s the No part). If summarizing is acceptable to your committee; then, you’re done. That’s the key question.

I want to point out that within your dataset, you can make up many different relationships to summarize. Right now, you are focusing on the percentage of observations in which AN and HA both occurred at a given time point.

Gene Maguin

From: Michael Bates <[hidden email]>
Sent: Monday, March 12, 2018 12:33 PM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

My data files are as you describe:

KidID Time AN HA

On Mon, Mar 12, 2018 at 11:13 AM, Maguin, Eugene <[hidden email]> wrote:

Ok, that’s very helpful, I think. So, at any given second there can be four code pairs (AN, HA): (0,0); (0,1); (1,0); (1,1). Except for kids leaving the setting, there would be 3000 (50*60) records per kid; although many or even all records might be (0,0) for a given kid. I understand that it is extremely likely that your data is not in this form but if your data records were Kidid, observation, Anx, HA, where observation equals time coded in seconds, you could split the file by kidid and crosstab Anx by HA.

Gene Maguin

From: Michael Bates <[hidden email]>
Sent: Monday, March 12, 2018 11:42 AM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]

Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

The raters (more than one for inter-rater reliability reasons) observed videos of children in a classroom setting for 50 minutes. Coding tracked defined anxiety behavior and defined hyperactivity behavior. The coders had coding sheets that had columns for Time (in seconds), AN and HA. They put a tally "1" in the variable column at the noted second if either of the behaviors was observed. So, for a given subject, the coding sheet had:

Time AN HA

00:00:00 1 1

00:00:01 1

00:00:02 1

...etc

The tallies of both variables were totaled. In addition to the individual observations, time was aggregated in 5 minute blocks. If the child left the observation session (doctor's appointment, nurse's office, whatever), the totals for a given 5 minute block ended up totaled as 0. Each 5 minute aggregate was labeled T1, T2, etc. Only 7 children agreed to participate. Don't get me started....

My first research question related to the the amount of anxiety and hyperactivity: Do highly anxious children exhibit a higher level of HA than low anxiety children? Originally, I thought to run frequencies/correlations on the second-by-second data for each student. Then I thought to run the 5 minute block aggregates. I found that my data to be non-normalized, and I decided that Pearson's r was not appropriate to determine the correlation between the two variables. I ran a scatter plot to determine monotonicity and found an adequate relationship. So, I ran Spearman and Kendall on the 5 minute blocks, then Somers (because of the IV/DV nature of the data). I found a strong positive correlation in each output. As the comments came back from the SPSSX Extremely! helpful responders, I understood that my data might be overstated and I ran the same procedures on the means. This yielded no significant correlation in either direction. (Poop! Technical term?)

Another research question related to the "causal" relationship between AN and HA. So, I coded any HA that followed withing 30 seconds of an AN observation (a theoretically defensible time frame) to indicate that for that pair of observations, HA followed from AN. This I ran as the total of AN against the total HA within 30 seconds of AN by student (of the total AN observed for each student, how many were followed within 30 seconds by HA?). Interestingly,I found a strong positive relationship between the variables.

That's the set up, in a nutshell.

On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote:

I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats
Sent: Sunday, March 11, 2018 2:47 PM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

... [show rest of quote]

PsyDStats

Mar 12, 2018; 6:56pm

Fwd: 2 Variables, 7 cases, 10 observations -- Simple?

8 posts

Eugene,

Thanks. So helpful. My only concern at this point is whether HA is up when AN is up and the converse. Also, do I need to include the 0's in my data set or only the hits? I assume yes, but I thought I'd ask, since many of my assumptions have turned out wrong. Here's the output from the cross tab for the first kid.

*AN_St1 HA_St1 Crosstabulation**
Count
		HA_St1		Total
		0	1	Total
AN_St1	0	2942	29	2971
	1	26	1	27
	2	2	0	2
Total		2970	30	3000

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	2.031^a	2	.362
Likelihood Ratio	1.236	2	.539
Linear-by-Linear Association	1.386	1	.239
N of Valid Cases	3000
a. 3 cells (50.0%) have expected count less than 5. The minimum expected count is .02.

Directional Measures
			Value	Asymptotic Standardized Error^a	Approximate T^b	Approximate Significance
Ordinal by Ordinal	Somers' d	Symmetric	.024	.033	.722	.470
		AN_St1 Dependent	.024	.033	.722	.470
		HA_St1 Dependent	.025	.034	.722	.470
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.

On Mon, Mar 12, 2018 at 1:29 PM, Maguin, Eugene <[hidden email]> wrote:

I’ll say Yes and No.

The within kid crosstabs will summarize the four outcomes and you could report the percentages for each outcome for each kid. The key word is “summarize”. (That’s the Yes part). Of course, you can compute a chi-square value or a phi correlation but the chi square and phi significance test is based on the observations being independent. Yours are not. (That’s the No part). If summarizing is acceptable to your committee; then, you’re done. That’s the key question.

I want to point out that within your dataset, you can make up many different relationships to summarize. Right now, you are focusing on the percentage of observations in which AN and HA both occurred at a given time point.

Gene Maguin

From: Michael Bates <[hidden email]>
Sent: Monday, March 12, 2018 12:33 PM

To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

That's exactly right. It was #,# for 3000 data points per kid. Most were 0,0, In fact, I found 1074 individual seconds where one kids or another exhibited anxiety and fewer seconds of hyperactivity for any given child.

My data files are as you describe:

KidID Time AN HA

You're saying I can create individual data sets for each child and run a cross tab AN by HA. It wouldn't matter about normal curve or not in that case, I could scrap the correlation metrics completely and rely on the cross tab alone. I would have to then describe the results by child in my results/discussion sections rather than having a statistic for overall relationship between variables, though my discussion would make that general statement based on the cross tabs. Have I got all that right?

On Mon, Mar 12, 2018 at 11:13 AM, Maguin, Eugene <[hidden email]> wrote:

Ok, that’s very helpful, I think. So, at any given second there can be four code pairs (AN, HA): (0,0); (0,1); (1,0); (1,1). Except for kids leaving the setting, there would be 3000 (50*60) records per kid; although many or even all records might be (0,0) for a given kid. I understand that it is extremely likely that your data is not in this form but if your data records were Kidid, observation, Anx, HA, where observation equals time coded in seconds, you could split the file by kidid and crosstab Anx by HA.

Gene Maguin

From: Michael Bates <[hidden email]>
Sent: Monday, March 12, 2018 11:42 AM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]

Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

The raters (more than one for inter-rater reliability reasons) observed videos of children in a classroom setting for 50 minutes. Coding tracked defined anxiety behavior and defined hyperactivity behavior. The coders had coding sheets that had columns for Time (in seconds), AN and HA. They put a tally "1" in the variable column at the noted second if either of the behaviors was observed. So, for a given subject, the coding sheet had:

Time AN HA

00:00:00 1 1

00:00:01 1

00:00:02 1

...etc

The tallies of both variables were totaled. In addition to the individual observations, time was aggregated in 5 minute blocks. If the child left the observation session (doctor's appointment, nurse's office, whatever), the totals for a given 5 minute block ended up totaled as 0. Each 5 minute aggregate was labeled T1, T2, etc. Only 7 children agreed to participate. Don't get me started....

My first research question related to the the amount of anxiety and hyperactivity: Do highly anxious children exhibit a higher level of HA than low anxiety children? Originally, I thought to run frequencies/correlations on the second-by-second data for each student. Then I thought to run the 5 minute block aggregates. I found that my data to be non-normalized, and I decided that Pearson's r was not appropriate to determine the correlation between the two variables. I ran a scatter plot to determine monotonicity and found an adequate relationship. So, I ran Spearman and Kendall on the 5 minute blocks, then Somers (because of the IV/DV nature of the data). I found a strong positive correlation in each output. As the comments came back from the SPSSX Extremely! helpful responders, I understood that my data might be overstated and I ran the same procedures on the means. This yielded no significant correlation in either direction. (Poop! Technical term?)

Another research question related to the "causal" relationship between AN and HA. So, I coded any HA that followed withing 30 seconds of an AN observation (a theoretically defensible time frame) to indicate that for that pair of observations, HA followed from AN. This I ran as the total of AN against the total HA within 30 seconds of AN by student (of the total AN observed for each student, how many were followed within 30 seconds by HA?). Interestingly,I found a strong positive relationship between the variables.

That's the set up, in a nutshell.

On Mon, Mar 12, 2018 at 9:13 AM, Maguin, Eugene <[hidden email]> wrote:

I'd like to better understand your study design (I've re-read the initial round of posts.) In your reply, you said, you observed the target kid for a total of 50 minutes. What I'm curious about is the data collection procedures used. I think these matter in understanding what your data analysis options are. The target behaviors were anxiety (Anx) and hyperactivity (HA). For example, you could have tabulated using two hand counters or hash-marks on paper each time a target behavior occurred so that at any point in time, the HA counter or hash count would show a count of 15 and the Anx counter or hash count would show a count of 8. Alternatively, you could have recorded in a data sequence the behaviors as they occurred so now the recording sheet/data recorder shows, for example, A,H,H,A,A,H,H,H. Alternatively, you could have designated HA as the stimulus behavior and Anx as the response behavior and coded (HA,Anx) if Anx followed HA within 'x' seconds; otherwise, coded (HA,NoAnx). In one of your replies, you mentioned something about 30 seconds. What's the story with that?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of PsyDStats
Sent: Sunday, March 11, 2018 2:47 PM
To: [hidden email]
Subject: Re: 2 Variables, 7 cases, 10 observations -- Simple?

Okay, so I've dabbled a little and successfully confused myself even more.
Let me boil things down.

As I mentioned, I am interested to know if there's a correlation between Anxiety (AN) and Hyperactivity (HA). Does HA go up when AN goes up? Aside from being a very small N, the data also are not normally distributed, so I've been working with nonparametric measures.

Based on the suggestions so far, I ran the means for each student for the two variables. I re-ran the Spearman, Kendall's, and Somer's and found that there was a very poor correlation between AN and HA.

I structured the data:

Mean_AN Mean_HA
St_1 10.33 10.00
St_2 28.67 14.83
St_3 26.17 16.67
St_4 25.10 26.30
St_5 20.50 22.30
St_6 41.50 25.70
St_7 31.75 30.50

If there were no data points missing (those pesky 0's in my data set), and if the N was more substantial, and if I had only kept up with my piano lessons as a kid and I were now conducting at the Met instead of a doctoral student in psychology, would you say that I've chosen the correct measures for my data set and structured it properly in SPSS? Would you say that there is a chance the outcomes of the procedures I've run are useful to me in accepting or rejecting the Null Hypothesis, which in this case is that I should go back to piano (My mother's null hypothesis that I would amount to Null if I gave up the piano)?

Seriously, thanks for your thoughtful feedback.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]