SPSSX Discussion

How to do a Ladder graph

Classic

List

Threaded

16 messages Options

Art Kendall

How to do a Ladder graph

In do not have access to my notes and files and the moment.

I have often found that it helps clients to understand a pre-post difference to construct a ladder graph.

This graph has 2 vertical axes with the same scaling. e.g., percentiles, z-scores, or in the example below the mean of a summative score with items on 1 to 7 response scale. There is a rung for each case.
The rung connects the value for the pretest to the value for the posttest for that case.

data list list/PreTest PostTest (2f3.1).
begin data
4.1 4.3
2.5 5.3
4.6 3.3
4.1 4.8
3.3 5.6
3.6 3.8
3.0 5.7
4.1 1.7
4.5 4.8
5.2 3.5
4.6 4.3
3.6 4.5
3.6 5.0
4.1 4.8
4.3 4.1
5.3 2.5
3.3 4.6
4.8 4.1
5.6 3.3
3.8 3.6
5.7 3.0
1.7 4.1
4.8 4.5
3.5 5.2
4.3 4.6
4.5 3.6
5.0 3.6
4.8 4.1
5.3 5.3
5.5 5.5
6.1 6.1
end data.
descriptives variables = PreTest PostTest.

This works well when there is only a small n.

Art Kendall
Social Research Consultants

Mike

Re: How to do a Ladder graph

I'm not familiar with ladder graphs but it sounds to me that you

want something like the plot that is produced in GLM when

pre and post are treated as two levels of a within-subject variable

and a new variable "group" provides a single line for each subject

in the graph. Add the following to you program and skip to the plot:

compute group=$casenum.
GLM PreTest PostTest by group/
wsfactor=pre_post(2)/
WSDESIGN=pre_post/
PLOT=PROFILE (pre_post*group).

It's a pretty ugly graph and it doesn't provide the dual y-axes you're

requesting but is this a crude approximation of what you want?

I'd be interesting in learning to create a "correct" version

-Mike Palij

New York University

[hidden email]

On Sat, Jan 30, 2016 at 12:02 PM, Art Kendall <[hidden email]> wrote:

In do not have access to my notes and files and the moment.

I have often found that it helps clients to understand a pre-post difference
to construct a ladder graph.

This graph has 2 vertical axes with the same scaling. e.g., percentiles,
z-scores, or in the example below the mean of a summative score with items
on 1 to 7 response scale. There is a rung for each case.
The rung connects the value for the pretest to the value for the posttest
for that case.

data list list/PreTest PostTest (2f3.1).
begin data
4.1 4.3
2.5 5.3
4.6 3.3
4.1 4.8
3.3 5.6
3.6 3.8
3.0 5.7
4.1 1.7
4.5 4.8
5.2 3.5
4.6 4.3
3.6 4.5
3.6 5.0
4.1 4.8
4.3 4.1
5.3 2.5
3.3 4.6
4.8 4.1
5.6 3.3
3.8 3.6
5.7 3.0
1.7 4.1
4.8 4.5
3.5 5.2
4.3 4.6
4.5 3.6
5.0 3.6
4.8 4.1
5.3 5.3
5.5 5.5
6.1 6.1
end data.
descriptives variables = PreTest PostTest.

This works well when there is only a small n.

-----
Art Kendall
Social Research Consultants
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-do-a-Ladder-graph-tp5731410.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Andy W

Re: How to do a Ladder graph

In reply to this post by Art Kendall

Is this what you mean Art?

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=PreTest PostTest MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: PreTest=col(source(s), name("PreTest"))
DATA: PostTest=col(source(s), name("PostTest"))
TRANS: pre = eval("Pre")
TRANS: pos = eval("Post")
GUIDE: axis(dim(1))
GUIDE: axis(dim(2), label("Test Scores"))
ELEMENT: edge(position((pre*PreTest)+(pos*PostTest)))
END GPL.

I wrote an article about why I don't like these types of slopegraphs, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2410875, a scatterplot is almost always better. For example here with the scatterplot the scores have an obvious negative correlation, and the three scores that have equal pre-post are outliers.

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=PreTest PostTest MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: PreTest=col(source(s), name("PreTest"))
DATA: PostTest=col(source(s), name("PostTest"))
GUIDE: axis(dim(1), label("PreTest"))
GUIDE: axis(dim(2), label("PostTest"))
ELEMENT: line(position(PreTest*PreTest))
ELEMENT: point(position(PreTest*PostTest))
END GPL.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Art Kendall

Re: How to do a Ladder graph

Thanks to both list members who responded.
I concur that ladder graphs are sub-optimal when only one visualization is used (e.g., in a publication).

Both what you suggested and what Michael Palij suggested produce visualizations to emphasize the there are differences among individuals.

However, in a repeated measure situation such as a pre-test with post test after a treatment/lesson/etc, or test-retest reliability it has been my experience it helps clients/students understand to use 3 visualizations. A scatter plot, a ladder graph, and box plots.

Sometimes it helps understanding to randomize the post-test a few times and show how the different plots change.
----
One of my soapboxes is that it is fruitful to look at data in multiple ways.
---
P.S. wrt parallel coordinate plots, the older type of profile plot which is a specific type of parallel coordinate plot has been very useful in interpreting profiles derived via clustering.

The profile plot Michael Palij produces a ladder graph which is useful as an additional perspective.
However, in other situations (e.g., cluster profiles) the profile plot without the repeated vertical axes can be affected by seeming vertical drift since in clustering the information is the shape (also elevation, and scatter) of the profile rather than trend as in time series.

Art Kendall
Social Research Consultants

Bruce Weaver

Re: How to do a Ladder graph

Administrator

I agree with Art's soapbox point below.

When teaching one-way repeated measures ANOVA, I use a plot similar to the ladder graph Andy generated above (but with 3 conditions/treatments, not 2) to talk about the Treatment x Subject interaction term, which is the error term for that model. As for any other interaction, the more the lines depart from parallel, the greater the interaction effect--and in that particular context, the larger the MS_error, so the smaller the F-ratio (all other things being equal).

Art Kendall wrote

Thanks to both list members who responded.
I concur that ladder graphs are sub-optimal when only one visualization is used (e.g., in a publication).

Both what you suggested and what Michael Palij suggested produce visualizations to emphasize the there are differences among individuals.

However, in a repeated measure situation such as a pre-test with post test after a treatment/lesson/etc, or test-retest reliability it has been my experience it helps clients/students understand to use 3 visualizations. A scatter plot, a ladder graph, and box plots.

Sometimes it helps understanding to randomize the post-test a few times and show how the different plots change.
----
One of my soapboxes is that it is fruitful to look at data in multiple ways.
---
P.S. wrt parallel coordinate plots, the older type of profile plot which is a specific type of parallel coordinate plot has been very useful in interpreting profiles derived via clustering.

The profile plot Michael Palij produces a ladder graph which is useful as an additional perspective.
However, in other situations (e.g., cluster profiles) the profile plot without the repeated vertical axes can be affected by seeming vertical drift since in clustering the information is the shape (also elevation, and scatter) of the profile rather than trend as in time series.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Mike

Re: How to do a Ladder graph

Just to add to Bruce's comment below, specifically "all other
things being equal", let's consider the classic Reaction Time (RT)
experiment by Donders which uses a one-way three level
within-subject design where the levels refer to different RT tasks:

(1) Simple RT: press a button/key on a keyboard as soon as a
colored visual stimulus (e.g., a square) or auditory stimulus is
presented
(2) Go-No Go RT: press a button to one specific color/sound but
do not respond to any other color/sound
(3) Choice RT: press one button if one specific color/sound is
presented, press another button if another color/sound is presented
(both version of this experiment is available on the http://opl.apa.org
website; look under the studies tab on the homepage).

In general, we refer to the analysis of such a design as using a
one-way repeated measures ANOVA but some sources refer
to the analysis as "two-way ANOVA without replication". Glass
and Hopkins in their stat text point this out (in the different
editions)
but perhaps more importantly, if one wanted to do a one-way
repeated measure ANOVA in Excel (after activating the toolpak),
the analysis that one would select from the list of ANOVA analyses
is "two-way ANOVA without replication".

At first glance, this might be strange until one understand the logic
of the repeated measures ANOVA. The analysis assumes that
there are two sources that produce systematic variance in the
dependent variable (in the Donders' case, it would be variance
in RT performance):

(A) Differences among Subjects: since subjects/participants have
multiple measurement made on them, we can calculate a mean
for each subject and we expect that subjects will systematically
differ from each other. This results in a reduction in what would
have been part if the error term -- in old version of SAS one had
to specify what error terms to use in GLM and some folks treated
within-subject designs as between-subjects design, leaving out
the subject effect which was added to a single error term (one
dissertation I was a reader on did this and the analyses had to
be redone). SAS implemented the "repeated" keyword/command
which I think should have helped stopped this problem from
occurring (SPSS and BMDP inferred the structure of the design
from how the data was specified).
NOTE: If subjects DO NOT DIFFER, then an independent
groups ANOVA should be done because the subject variance
will still be subtracted from the error term (subject by treatment
interaction), an unwarranted reduction in the error term.

(B) Differences among levels of the independent variable: in
the Donders example, this represents differences among the
mean RT for each task.

In Excel, the "two-way ANOVA without replication" provides
the following output:
(a) Subject means/desc stats: the mean/variance across the levels
of independent variable for each subject
(b) Treatment means/desc stats: the mean/variance for each level
of the independent variable (in the Donders task, the mean RT
for each task).

The (classic) simplified ANOVA table produced in Excel has three
sources of variance:
(i) Rows: differences among subjects; this should be statistically
significant but most people just ignore this entry
(ii) Columns: difference among treatment levels (in the Donders
task, the means for the Simple, Go-No Go, and Choice RT task)
(iii) Error Term: the subject by treatment interaction (this is what
Bruce refers to below).

The key point that I want to make is the differences among subjects,
as reflected in the Row entry above or the Between-Subjects table
in SPSS, has a greater effect on the error term than the degree of
interaction. The larger the difference among subjects/participants,
the smaller the error term, no matter how big the interaction is.

An additional point is the degree of correlation between values
at different levels of the independent variable -- higher correlations
will produce stronger subject effects. To better under this point,
consider the correlated groups t-test; one form of the formula is:

t = (M1 - M2) /sqrt(VarErr1 + VarErr2 - 2*r*SE1*SE2)

The denominator/error term consists of:
Variance errors for each group (i.e., VarErr1, VarErr2)
Standard errors for each group (I.e., SE1, SE2)
Pearson r between pairs of values (i.e., r)
and the constant 2.
The larger the correlation is between value for the two levels
of the independent variable, the smaller the error term.
If the Pearson r = 0, the test become the independent groups
t-test.
A similar logic holds for ANOVA: the higher the correlation
between values for different levels of the independent variable,
the smaller the error term, regardless of the size of the
interaction.

Reaction Time studies typically show big subject effects,
especially if there is a large range of ages (e.g., 18 -50).
With other dependent measures, YMMV.

HTH.

-Mike Palij
New York University
[hidden email]

----- Original Message -----
On Sunday, January 31, 2016 4:41 PM, Bruce Weaver wrote:

>
>I agree with Art's soapbox point below.
>
> When teaching one-way repeated measures ANOVA, I use a plot similar to
> the
> ladder graph Andy generated above (but with 3 conditions/treatments,
> not 2)
> to talk about the Treatment x Subject interaction term, which is the
> error
> term for that model. As for any other interaction, the more the lines
> depart from parallel, the greater the interaction effect--and in that
> particular context, the larger the MS_error, so the smaller the
> F-ratio (all
> other things being equal).
>
>
>
> Art Kendall wrote
>> Thanks to both list members who responded.
>> I concur that ladder graphs are sub-optimal when only one
>> visualization is
>> used (e.g., in a publication).
>>
>> Both what you suggested and what Michael Palij suggested produce
>> visualizations to emphasize the there are differences among
>> individuals.
>>
>> However, in a repeated measure situation such as a pre-test with
>> post
>> test after a treatment/lesson/etc, or test-retest reliability it has
>> been
>> my experience it helps clients/students understand to use 3
>> visualizations. A scatter plot, a ladder graph, and box plots.
>>
>> Sometimes it helps understanding to randomize the post-test a few
>> times
>> and show how the different plots change.
>> ----
>> One of my soapboxes is that it is fruitful to look at data in
>> multiple
>> ways.
>> ---
>> P.S. wrt parallel coordinate plots, the older type of profile plot
>> which
>> is a specific type of parallel coordinate plot has been very useful
>> in
>> interpreting profiles derived via clustering.
>>
>> The profile plot Michael Palij produces a ladder graph which is
>> useful as
>> an additional perspective.
>> However, in other situations (e.g., cluster profiles) the profile
>> plot
>> without the repeated vertical axes can be affected by seeming
>> vertical
>> drift since in clustering the information is the shape (also
>> elevation,
>> and scatter) of the profile rather than trend as in time series.
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/How-to-do-a-Ladder-graph-tp5731410p5731414.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: How to do a Ladder graph

Mostly, good and useful comments, but ...

> Date: Mon, 1 Feb 2016 13:21:54 -0500
> From: [hidden email]
> Subject: Re: How to do a Ladder graph
> To: [hidden email]
>

...

>
> (A) Differences among Subjects: since subjects/participants have
> multiple measurement made on them, we can calculate a mean
> for each subject and we expect that subjects will systematically
> differ from each other. This results in a reduction in what would
> have been part if the error term -- in old version of SAS one had
> to specify what error terms to use in GLM and some folks treated
> within-subject designs as between-subjects design, leaving out
> the subject effect which was added to a single error term (one
> dissertation I was a reader on did this and the analyses had to
> be redone). SAS implemented the "repeated" keyword/command
> which I think should have helped stopped this problem from
> occurring (SPSS and BMDP inferred the structure of the design
> from how the data was specified).
> NOTE: If subjects DO NOT DIFFER, then an independent
> groups ANOVA should be done because the subject variance
> will still be subtracted from the error term (subject by treatment
> interaction), an unwarranted reduction in the error term.

My reaction to the Note: NO! - "DO NOT DIFFER" is apparently a description of
a test result, since the next phrases describe subtracting variances...

Someone is claiming that what must be described as analyzing "according to
the actual design" would provide "an unwarranted reduction in the error term"?

I admit that the common examples won't show much difference. But I recall
an extreme example: Consider a paired t-test with very similar means and
a highly negative correlation. (Fish were faced with a forced choice, target A
vs target B, and the time spent for A+B was always 10 or close to it.)
- The test with-versus-without the subject effect changes both the sum of squares
(minimally, in this example) and the d.f. (a lot, in this example).

Ignoring pairing in the paired t-test, which is the simplest Repeated Measure,
gives a test that is too weak when the correlation is positive and too strong
when it is negative (ignoring cases where tiny d.f. affects the power).

I can't agree that it is ever a mistake to analyze according to the design;
though the observation that a variance is exactly ZERO raises the question
of "What does describe this design?"

--
Rich Ulrich

Bruce Weaver

Re: How to do a Ladder graph

Administrator

Thanks Rich. I had a similar reaction to that NOTE, but didn't have time today to articulate it.

Rich Ulrich wrote

Mostly, good and useful comments, but ...

> Date: Mon, 1 Feb 2016 13:21:54 -0500
> From: [hidden email]
> Subject: Re: How to do a Ladder graph
> To: [hidden email]
>

...
>
> (A) Differences among Subjects: since subjects/participants have
> multiple measurement made on them, we can calculate a mean
> for each subject and we expect that subjects will systematically
> differ from each other. This results in a reduction in what would
> have been part if the error term -- in old version of SAS one had
> to specify what error terms to use in GLM and some folks treated
> within-subject designs as between-subjects design, leaving out
> the subject effect which was added to a single error term (one
> dissertation I was a reader on did this and the analyses had to
> be redone). SAS implemented the "repeated" keyword/command
> which I think should have helped stopped this problem from
> occurring (SPSS and BMDP inferred the structure of the design
> from how the data was specified).
> NOTE: If subjects DO NOT DIFFER, then an independent
> groups ANOVA should be done because the subject variance
> will still be subtracted from the error term (subject by treatment
> interaction), an unwarranted reduction in the error term.

My reaction to the Note: NO! - "DO NOT DIFFER" is apparently a description of
a test result, since the next phrases describe subtracting variances...

Someone is claiming that what must be described as analyzing "according to
the actual design" would provide "an unwarranted reduction in the error term"?

I admit that the common examples won't show much difference. But I recall
an extreme example: Consider a paired t-test with very similar means and
a highly negative correlation. (Fish were faced with a forced choice, target A
vs target B, and the time spent for A+B was always 10 or close to it.)
- The test with-versus-without the subject effect changes both the sum of squares
(minimally, in this example) and the d.f. (a lot, in this example).

Ignoring pairing in the paired t-test, which is the simplest Repeated Measure,
gives a test that is too weak when the correlation is positive and too strong
when it is negative (ignoring cases where tiny d.f. affects the power).

I can't agree that it is ever a mistake to analyze according to the design;
though the observation that a variance is exactly ZERO raises the question
of "What does describe this design?"

--
Rich Ulrich

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike

Re: How to do a Ladder graph

In reply to this post by Rich Ulrich

----- Original Message -----
On Monday, February 01, 2016 8:44 PM,Rich Ulrich wortel
>
>Mostly, good and useful comments, but ...

Thank you. However, some of the comments you make below
are unclear to me and perhaps you can clarify them a little

On Mon, 1 Feb 2016 13:21:54 -0500, Mike Palij wrote:
>> (A) Differences among Subjects: since subjects/participants have
>> multiple measurement made on them, we can calculate a mean
>> for each subject and we expect that subjects will systematically
>> differ from each other.

If one has access to Glass & Hopkins, 1st edition, see Table 19.6, p463
or
If one has access to Glass & Hopkins, 3rd edition, see Table 20.1, p575
(the latter resembles the output from Excel more than the former).
I will use their terms and indexes in the equations I present below:

In the decomposition of the Sum of Squares in the repeated measures
ANOVA, the equation for the Total sum of square (SS-total) is
divided into the following components::

SS-total = SS-Treatment(T) + SS-Subjects(P) + SS-Error(TxP)

NOTE: Glass & Hiopkins use T to refer to the Treatement variable
and t for the level of T; sum of t = T, the total number of treatments.

P is used to refer to the total number of Persons or Subjects and
p is used to refer to a specific person/subject; sum(p) = P, or
the number of subjects.

This makes the presentation consistent with that
of Glass & Hopkins, 1st edition, Table 19.6, p 463 and
Glass & Hopkins, 3rd Ed, Table 20.1, p 574

SS-Subjects = sum(Mean for Subject p - Grand Mean)^2
To simply notion, let (Mean for Subject p -Grand Mean) = alpha-p
and let (mean for Subject p - Grand mean) ^2 = alpha-p^2
which provides
SS-Subjects = sum(alpha-p^2

df-subjects or DF-P = P -1 or the number subjects/persons - 1

MS-Subjects = SS-Subject / df-subjests

SS-Treatment = sum(Mean for Treatment level t - Grand Mean^2
To simplify notation, llet (Mean for Treatment level t - Grand Mean) =
beta-t
and (mean for Treatment level t - Grand Mean)^2 = beta-t^2
SS-Treatments= sum(beta-t^2)

df-Treatment or DF-T = T -1 or the number of treatment levels -1

The calculation of SS-Error(TxP) or the interaction between subjects and
treaments is laborious by hand so simple formula to obtain it (under the
assumption all the math done so far is correct) is:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment

NOTE: It is assumed that both SS-Subject and SS-Treatment represent
systematic sources of variance, so substracting them from SS-total
should provide a sum of squares that reflects random error. But if
SS-Subjects(P) is NOT A SOURCE OF SIGNIFICANT VARIANCE
THEN SUBTRACTING IT FROM SS-Total will cause SS-Error to be
underestimated.

So, it should be obvious that if there are no significant differences
among subjects in a within-subject design, it is inappropriate to
calcualte
the SS-Subjects/Persons and substract it from SS-Total to get the
the SS-error -- one should do an independnet groups ANOVA instead.

Things might be clearer in the correlated group t-test. Recall the
formula for it is:

t = (M1 - M2) /sqrt(VarErr1 + VarErr2 - 2*r*SE1*SE2)

Now, according to Hoyle, if the Pearson f is not significant
then the entire term -2*r*SE1*SE2 should be set to zero and
one is left with a version of the independent groups t-test.
But if one leaves that term while r is not significant, it will reduce
the
error variance (i.e, sum of VarErr1 + VarErr2)) and make the
donominator inappropriately smaller. How much of an effect
this has on the t-test depends upon how large the nonsignificant
- 2*r*SE1*SE2 is but it could provide seriously misleading
results.

Why not simply do the correct test for the situation one has?
Is some statistical religious law being violated if one does so?

>>This results in a reduction in what would
>> have been part if the error term -- in old version of SAS one had
>> to specify what error terms to use in GLM and some folks treated
>> within-subject designs as between-subjects design, leaving out
>> the subject effect which was added to a single error term (one
>> dissertation I was a reader on did this and the analyses had to
>> be redone). SAS implemented the "repeated" keyword/command
>> which I think should have helped stopped this problem from
>> occurring (SPSS and BMDP inferred the structure of the design
>> from how the data was specified).
>> NOTE: If subjects DO NOT DIFFER, then an independent
>> groups ANOVA should be done because the subject variance
>> will still be subtracted from the error term (subject by treatment
>> interaction), an unwarranted reduction in the error term.
>
>My reaction to the Note: NO! - "DO NOT DIFFER" is apparently a
>description of
>a test result, since the next phrases describe subtracting variances...

Yes, see above:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment

>Someone is claiming that what must be described as analyzing "according
>to
>the actual design" would provide "an unwarranted reduction in the
>error term"?

If one asserts that one is using a within-subject design but (when T >
2) and the
variance-covariance matrix is an idtentity matrix (that is, the
responses are
not correlated as one would expect and which a repeated-measures ANOVA
assumes), then one really shouldn't do a repeated measure ANOVA because
SS-Subjects will be calcualted and as a sample value is unlikely to be
exactly
equal to zero, thus giving rise to an inappropriate SS-Error(TxP) if one
is
using the formula (the alternative is to take the cell mean for the
combination Tt by Pp
and subtract the Treatment mean for that t level + subtract the Subject
mean for the
p level +the grand mean -- a hand operation that probably drove many
students
mad, using the following is so much easier if one has been doing the
math correctly:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment

>I admit that the common examples won't show much difference. But I
>recall
>an extreme example: Consider a paired t-test with very similar means
>and
>a highly negative correlation.

We've done this dance before and simply put I doubt that Gosset/Student
or
Sir Ronald ever considered the case where measuring the same individual
twice would lead to a negative correlation because this occurs only in
rare situations. If the negative correlation is "real", then I think
one has a couple
of options:
(1) take the absolute value off the negative pearson r, do the
appropriate calculation,
and see if it make sense,
or
(2) Make up you own damned test. If you have a pair of values x and y
on the same
individual, then Y would have to be blow the Y mean in order to obtain a
negative
correlation. Given that Gosset's concerns was with making sure that
contents of
vats of ale and stout did not vary significantly from vat to vat, I
think that Guinness
would have fired his ass if he ever obtain a negative correlation
between two measures
that should be the same.

If you are studying fish that give rise to strange patterns that are not
appropriate
for existing tests, then this is a wonderful opportunity for one to come
up with
a new test -- afterall they might name after you, especially if some
else created
the test before; see:
https://en.wikipedia.org/wiki/Stigler's_law_of_eponymy

[snip]
>I can't agree that it is ever a mistake to analyze according to the
>design;
>though the observation that a variance is exactly ZERO raises the
>question
>of "What does describe this design?"

I think you are unclear here. If you do a two-level within-subject
design
and obtain a nonsignificant Pearson of 0.10, between your two measures,
are you saying that you should still use a correlated groups t-test --
even if
you ave no evidence of a correlation between your pairs? Indeed you
have
evidence that the sample Pearson r should be set to zero, then the term
- 2*r*SE1*SE2 in the denominator should be set to zero. If you do not
set it
to zero because you think that even without evidence of r > 0.00 you
are
"entitled" to do that version of the test, I'd really like to hear the
justification
and why it makes any sense instead of a misleading result.

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: How to do a Ladder graph

Administrator

I don't have time to get fully engaged in this discussion right now, but I do have the 3rd Edition of Glass & Hopkins, so decided to run their one-way repeated measures ANOVA example. Here's the syntax:

* One-way repeated measures ANOVA from Table 20.1
* in 3rd Edition of Glass & Hopkins (p. 574).

DATA LIST list / ID T1 to T5 (6F2.0).
BEGIN DATA
1 1 2 4 5 6
2 1 1 3 5 6
3 1 2 5 7 6
4 2 1 4 6 7
5 3 3 5 8 8
6 2 2 4 7 8
7 1 3 4 6 7
8 0 2 5 7 8
9 3 3 6 8 9
10 2 2 4 7 8
END DATA.

GLM T1 T2 T3 T4 T5
/WSFACTOR=Treatment 5 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Treatment)
/EMMEANS=TABLES(Treatment)
/CRITERIA=ALPHA(.05)
/WSDESIGN=Treatment.

* Results match those shown by G&H.
* SS_subjects = SS_error in the "Tests of Between
* Subjects Effects" table.
* SS_T and SS_T*S are in the sphericity assumed
* rows in the "Tests of Within Subjects Effects" table.
* Results for polynomial contrasts also match G&H results.

CORRELATIONS T1 to T5.

Here are the SS from that model (using sphericity assumed results):

SS_Between-Ss = 26.8
SS_Within-Ss = 277.2
SS_T = 263.8
SS_TxS = 13.4
SS_Total = 304

These results match those shown by G&H in their Table 20.1.

And here is the correlation matrix (view in fixed font):

Correlations
T1 T2 T3 T4 T5
T1 Pearson Correlation 1 .374 .355 .578 .565
Sig. (2-tailed) .287 .315 .080 .089
N 10 10 10 10 10
T2 Pearson Correlation .374 1 .643 .616 .526
Sig. (2-tailed) .287 .045 .058 .118
N 10 10 10 10 10
T3 Pearson Correlation .355 .643 1 .809 .597
Sig. (2-tailed) .315 .045 .005 .068
N 10 10 10 10 10
T4 Pearson Correlation .578 .616 .809 1 .800
Sig. (2-tailed) .080 .058 .005 .005
N 10 10 10 10 10
T5 Pearson Correlation .565 .526 .597 .800 1
Sig. (2-tailed) .089 .118 .068 .005
N 10 10 10 10 10

I know I said I didn't have time to get engaged too deeply in this, but now I'm curious. In this example, SS_Between-Ss is pretty small relative to SS_Total (8.8% of the total SS, approximately).

And only 3 of the 10 correlations achieve statistical significance at the .05 level (r_23, r_34 and r_45).

From your point of view, Mike, how would this affect the way you analyze the data? Or would it?

Cheers,
Bruce

Mike wrote

----- Original Message -----
On Monday, February 01, 2016 8:44 PM,Rich Ulrich wortel
>
>Mostly, good and useful comments, but ...

Thank you. However, some of the comments you make below
are unclear to me and perhaps you can clarify them a little

On Mon, 1 Feb 2016 13:21:54 -0500, Mike Palij wrote:
>> (A) Differences among Subjects: since subjects/participants have
>> multiple measurement made on them, we can calculate a mean
>> for each subject and we expect that subjects will systematically
>> differ from each other.

If one has access to Glass & Hopkins, 1st edition, see Table 19.6, p463
or
If one has access to Glass & Hopkins, 3rd edition, see Table 20.1, p575
(the latter resembles the output from Excel more than the former).
I will use their terms and indexes in the equations I present below:

In the decomposition of the Sum of Squares in the repeated measures
ANOVA, the equation for the Total sum of square (SS-total) is
divided into the following components::

SS-total = SS-Treatment(T) + SS-Subjects(P) + SS-Error(TxP)

NOTE: Glass & Hiopkins use T to refer to the Treatement variable
and t for the level of T; sum of t = T, the total number of treatments.

P is used to refer to the total number of Persons or Subjects and
p is used to refer to a specific person/subject; sum(p) = P, or
the number of subjects.

This makes the presentation consistent with that
of Glass & Hopkins, 1st edition, Table 19.6, p 463 and
Glass & Hopkins, 3rd Ed, Table 20.1, p 574

SS-Subjects = sum(Mean for Subject p - Grand Mean)^2
To simply notion, let (Mean for Subject p -Grand Mean) = alpha-p
and let (mean for Subject p - Grand mean) ^2 = alpha-p^2
which provides
SS-Subjects = sum(alpha-p^2

df-subjects or DF-P = P -1 or the number subjects/persons - 1

MS-Subjects = SS-Subject / df-subjests

SS-Treatment = sum(Mean for Treatment level t - Grand Mean^2
To simplify notation, llet (Mean for Treatment level t - Grand Mean) =
beta-t
and (mean for Treatment level t - Grand Mean)^2 = beta-t^2
SS-Treatments= sum(beta-t^2)

df-Treatment or DF-T = T -1 or the number of treatment levels -1

The calculation of SS-Error(TxP) or the interaction between subjects and
treaments is laborious by hand so simple formula to obtain it (under the
assumption all the math done so far is correct) is:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment

NOTE: It is assumed that both SS-Subject and SS-Treatment represent
systematic sources of variance, so substracting them from SS-total
should provide a sum of squares that reflects random error. But if
SS-Subjects(P) is NOT A SOURCE OF SIGNIFICANT VARIANCE
THEN SUBTRACTING IT FROM SS-Total will cause SS-Error to be
underestimated.

So, it should be obvious that if there are no significant differences
among subjects in a within-subject design, it is inappropriate to
calcualte
the SS-Subjects/Persons and substract it from SS-Total to get the
the SS-error -- one should do an independnet groups ANOVA instead.

Things might be clearer in the correlated group t-test. Recall the
formula for it is:

t = (M1 - M2) /sqrt(VarErr1 + VarErr2 - 2*r*SE1*SE2)

Now, according to Hoyle, if the Pearson f is not significant
then the entire term -2*r*SE1*SE2 should be set to zero and
one is left with a version of the independent groups t-test.
But if one leaves that term while r is not significant, it will reduce
the
error variance (i.e, sum of VarErr1 + VarErr2)) and make the
donominator inappropriately smaller. How much of an effect
this has on the t-test depends upon how large the nonsignificant
- 2*r*SE1*SE2 is but it could provide seriously misleading
results.

Why not simply do the correct test for the situation one has?
Is some statistical religious law being violated if one does so?

>>This results in a reduction in what would
>> have been part if the error term -- in old version of SAS one had
>> to specify what error terms to use in GLM and some folks treated
>> within-subject designs as between-subjects design, leaving out
>> the subject effect which was added to a single error term (one
>> dissertation I was a reader on did this and the analyses had to
>> be redone). SAS implemented the "repeated" keyword/command
>> which I think should have helped stopped this problem from
>> occurring (SPSS and BMDP inferred the structure of the design
>> from how the data was specified).
>> NOTE: If subjects DO NOT DIFFER, then an independent
>> groups ANOVA should be done because the subject variance
>> will still be subtracted from the error term (subject by treatment
>> interaction), an unwarranted reduction in the error term.
>
>My reaction to the Note: NO! - "DO NOT DIFFER" is apparently a
>description of
>a test result, since the next phrases describe subtracting variances...

Yes, see above:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment

>Someone is claiming that what must be described as analyzing "according
>to
>the actual design" would provide "an unwarranted reduction in the
>error term"?

If one asserts that one is using a within-subject design but (when T >
2) and the
variance-covariance matrix is an idtentity matrix (that is, the
responses are
not correlated as one would expect and which a repeated-measures ANOVA
assumes), then one really shouldn't do a repeated measure ANOVA because
SS-Subjects will be calcualted and as a sample value is unlikely to be
exactly
equal to zero, thus giving rise to an inappropriate SS-Error(TxP) if one
is
using the formula (the alternative is to take the cell mean for the
combination Tt by Pp
and subtract the Treatment mean for that t level + subtract the Subject
mean for the
p level +the grand mean -- a hand operation that probably drove many
students
mad, using the following is so much easier if one has been doing the
math correctly:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment

>I admit that the common examples won't show much difference. But I
>recall
>an extreme example: Consider a paired t-test with very similar means
>and
>a highly negative correlation.

We've done this dance before and simply put I doubt that Gosset/Student
or
Sir Ronald ever considered the case where measuring the same individual
twice would lead to a negative correlation because this occurs only in
rare situations. If the negative correlation is "real", then I think
one has a couple
of options:
(1) take the absolute value off the negative pearson r, do the
appropriate calculation,
and see if it make sense,
or
(2) Make up you own damned test. If you have a pair of values x and y
on the same
individual, then Y would have to be blow the Y mean in order to obtain a
negative
correlation. Given that Gosset's concerns was with making sure that
contents of
vats of ale and stout did not vary significantly from vat to vat, I
think that Guinness
would have fired his ass if he ever obtain a negative correlation
between two measures
that should be the same.

If you are studying fish that give rise to strange patterns that are not
appropriate
for existing tests, then this is a wonderful opportunity for one to come
up with
a new test -- afterall they might name after you, especially if some
else created
the test before; see:
https://en.wikipedia.org/wiki/Stigler's_law_of_eponymy

[snip]
>I can't agree that it is ever a mistake to analyze according to the
>design;
>though the observation that a variance is exactly ZERO raises the
>question
>of "What does describe this design?"

I think you are unclear here. If you do a two-level within-subject
design
and obtain a nonsignificant Pearson of 0.10, between your two measures,
are you saying that you should still use a correlated groups t-test --
even if
you ave no evidence of a correlation between your pairs? Indeed you
have
evidence that the sample Pearson r should be set to zero, then the term
- 2*r*SE1*SE2 in the denominator should be set to zero. If you do not
set it
to zero because you think that even without evidence of r > 0.00 you
are
"entitled" to do that version of the test, I'd really like to hear the
justification
and why it makes any sense instead of a misleading result.

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike

Re: How to do a Ladder graph

Okay, I'll try to keep this brief:

(1) Don't use N=10 in real life if you're doing research
unless you are sure that the effects are huge, otherwise,
your tests will be severely underpowered. An obtained
correlation would have to be greater than r(8) = 0.632
in order to be significant at the .05 level (two-tailed).
Cohen (1992; Power Primer in Psych Bull) has
r= .50 as a "large" effect, so any correlation would have
to be extra large to be significant. If you run the
variables t1 to t5 through the reliability procedure, you'll
see that he mean correlation r=0.586 or greater than large
by Cohen's criteria, the minimum r=.355 which would
require about N=45 to be significant at alpha= .05, two-tailed.
NOTE: r(8) = 0.632 and other values are obtained from
Table 8.6 for critical r for different df in Gravetter & Wallnau's
text; but I'm sure there's a similar one in Glass & Hopkins

Morale: If you're constructing a dataset for
pedagogical purposes, then keeping sample size
small for a constructed dataset makes for good example
or homework but bad research.

(2) I'm not sure what your point is with the correlation
matrix given what I say in (1) above but I think that
maybe what you really wanted to do is test the hypothesis
that the correlation matrix is equal to an identity matrix, that
is, none of the correlations is significant. However,
for N=10, lack of power would probably make the test
inconclusive.

(3) As important as the correlations are in this situation
the question of whether there are significant differences
among subjects is more important because of how the
error term is calculated. Recall:

SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment(T)

Unfortunately, the SPSS output does not provide the F for
differences among subjects so one does not really know if
there are significant differences among subjects (one could
do hand calculations). Excel does provide this value and
it is F(9,49) = 8.00, p< .001 which means that calculating
the SS-error using the formula above is appropriate. It
should be obvious that if this F was not significant, one would
have an inappropriate error term (as long as lack of power
were not the reason). One really should do this check.

(2) Finally, the Mauchly test for sphericity is nonsignificant,
so you can assume that you have sphericity though this is
probably underpowered as well. Note that we don't need
"compound symmetry" (equal variances and equal correlations)
instead a pattern of different variances and/or correlations
could meet the criteria for sphericity. But let me point out
that if one views this as a 2-way ANOVA without replication,
it should be obvious that we have a mixed design: the
trials/treatment are a fixed effect and subjects are a random
effect. I believe one could use the Mixed procedure in SPSS
and try out various variance-covariance matrices to do a more
appropriate analysis but I'll leave to other who are familiar
with the procedure.

I hope that I've answered you questions.

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, February 02, 2016 6:27 PM
Subject: Re: How to do a Ladder graph

>I don't have time to get fully engaged in this discussion right now,
>but I do
> have the 3rd Edition of Glass & Hopkins, so decided to run their
> one-way
> repeated measures ANOVA example. Here's the syntax:
>
> * One-way repeated measures ANOVA from Table 20.1
> * in 3rd Edition of Glass & Hopkins (p. 574).
>
> DATA LIST list / ID T1 to T5 (6F2.0).
> BEGIN DATA
> 1 1 2 4 5 6
> 2 1 1 3 5 6
> 3 1 2 5 7 6
> 4 2 1 4 6 7
> 5 3 3 5 8 8
> 6 2 2 4 7 8
> 7 1 3 4 6 7
> 8 0 2 5 7 8
> 9 3 3 6 8 9
> 10 2 2 4 7 8
> END DATA.
>
> GLM T1 T2 T3 T4 T5
> /WSFACTOR=Treatment 5 Polynomial
> /METHOD=SSTYPE(3)
> /PLOT=PROFILE(Treatment)
> /EMMEANS=TABLES(Treatment)
> /CRITERIA=ALPHA(.05)
> /WSDESIGN=Treatment.
>
> * Results match those shown by G&H.
> * SS_subjects = SS_error in the "Tests of Between
> * Subjects Effects" table.
> * SS_T and SS_T*S are in the sphericity assumed
> * rows in the "Tests of Within Subjects Effects" table.
> * Results for polynomial contrasts also match G&H results.
>
> CORRELATIONS T1 to T5.
>
>
> Here are the SS from that model (using sphericity assumed results):
>
> SS_Between-Ss = 26.8
> SS_Within-Ss = 277.2
> SS_T = 263.8
> SS_TxS = 13.4
> SS_Total = 304
>
> These results match those shown by G&H in their Table 20.1.
>
> And here is the correlation matrix (view in fixed font):
>
> Correlations
> T1 T2 T3 T4 T5
> T1 Pearson Correlation 1 .374 .355 .578 .565
> Sig. (2-tailed) .287 .315 .080 .089
> N 10 10 10 10 10
> T2 Pearson Correlation .374 1 .643 .616 .526
> Sig. (2-tailed) .287 .045 .058 .118
> N 10 10 10 10 10
> T3 Pearson Correlation .355 .643 1 .809 .597
> Sig. (2-tailed) .315 .045 .005 .068
> N 10 10 10 10 10
> T4 Pearson Correlation .578 .616 .809 1 .800
> Sig. (2-tailed) .080 .058 .005 .005
> N 10 10 10 10 10
> T5 Pearson Correlation .565 .526 .597 .800 1
> Sig. (2-tailed) .089 .118 .068 .005
> N 10 10 10 10 10
>
>
> I know I said I didn't have time to get engaged too deeply in this,
> but now
> I'm curious. In this example, SS_Between-Ss is pretty small relative
> to
> SS_Total (8.8% of the total SS, approximately).
>
> And only 3 of the 10 correlations achieve statistical significance at
> the
> .05 level (r_23, r_34 and r_45).
>
> From your point of view, Mike, how would this affect the way you
> analyze the
> data? Or would it?
>
> Cheers,
> Bruce
>
>
>
> Mike wrote
>> ----- Original Message -----
>> On Monday, February 01, 2016 8:44 PM,Rich Ulrich wortel
>>>
>>>Mostly, good and useful comments, but ...
>>
>> Thank you. However, some of the comments you make below
>> are unclear to me and perhaps you can clarify them a little
>>
>> On Mon, 1 Feb 2016 13:21:54 -0500, Mike Palij wrote:
>>>> (A) Differences among Subjects: since subjects/participants have
>>>> multiple measurement made on them, we can calculate a mean
>>>> for each subject and we expect that subjects will systematically
>>>> differ from each other.
>>
>> If one has access to Glass & Hopkins, 1st edition, see Table 19.6,
>> p463
>> or
>> If one has access to Glass & Hopkins, 3rd edition, see Table 20.1,
>> p575
>> (the latter resembles the output from Excel more than the former).
>> I will use their terms and indexes in the equations I present below:
>>
>> In the decomposition of the Sum of Squares in the repeated measures
>> ANOVA, the equation for the Total sum of square (SS-total) is
>> divided into the following components::
>>
>> SS-total = SS-Treatment(T) + SS-Subjects(P) + SS-Error(TxP)
>>
>> NOTE: Glass & Hiopkins use T to refer to the Treatement variable
>> and t for the level of T; sum of t = T, the total number of
>> treatments.
>>
>> P is used to refer to the total number of Persons or Subjects and
>> p is used to refer to a specific person/subject; sum(p) = P, or
>> the number of subjects.
>>
>> This makes the presentation consistent with that
>> of Glass & Hopkins, 1st edition, Table 19.6, p 463 and
>> Glass & Hopkins, 3rd Ed, Table 20.1, p 574
>>
>> SS-Subjects = sum(Mean for Subject p - Grand Mean)^2
>> To simply notion, let (Mean for Subject p -Grand Mean) = alpha-p
>> and let (mean for Subject p - Grand mean) ^2 = alpha-p^2
>> which provides
>> SS-Subjects = sum(alpha-p^2
>>
>> df-subjects or DF-P = P -1 or the number subjects/persons - 1
>>
>> MS-Subjects = SS-Subject / df-subjests
>>
>> SS-Treatment = sum(Mean for Treatment level t - Grand Mean^2
>> To simplify notation, llet (Mean for Treatment level t - Grand Mean)
>> =
>> beta-t
>> and (mean for Treatment level t - Grand Mean)^2 = beta-t^2
>> SS-Treatments= sum(beta-t^2)
>>
>> df-Treatment or DF-T = T -1 or the number of treatment levels -1
>>
>> The calculation of SS-Error(TxP) or the interaction between subjects
>> and
>> treaments is laborious by hand so simple formula to obtain it (under
>> the
>> assumption all the math done so far is correct) is:
>>
>> SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment
>>
>> NOTE: It is assumed that both SS-Subject and SS-Treatment represent
>> systematic sources of variance, so substracting them from SS-total
>> should provide a sum of squares that reflects random error. But if
>> SS-Subjects(P) is NOT A SOURCE OF SIGNIFICANT VARIANCE
>> THEN SUBTRACTING IT FROM SS-Total will cause SS-Error to be
>> underestimated.
>>
>> So, it should be obvious that if there are no significant differences
>> among subjects in a within-subject design, it is inappropriate to
>> calcualte
>> the SS-Subjects/Persons and substract it from SS-Total to get the
>> the SS-error -- one should do an independnet groups ANOVA instead.
>>
>> Things might be clearer in the correlated group t-test. Recall the
>> formula for it is:
>>
>> t = (M1 - M2) /sqrt(VarErr1 + VarErr2 - 2*r*SE1*SE2)
>>
>> Now, according to Hoyle, if the Pearson f is not significant
>> then the entire term -2*r*SE1*SE2 should be set to zero and
>> one is left with a version of the independent groups t-test.
>> But if one leaves that term while r is not significant, it will
>> reduce
>> the
>> error variance (i.e, sum of VarErr1 + VarErr2)) and make the
>> donominator inappropriately smaller. How much of an effect
>> this has on the t-test depends upon how large the nonsignificant
>> - 2*r*SE1*SE2 is but it could provide seriously misleading
>> results.
>>
>> Why not simply do the correct test for the situation one has?
>> Is some statistical religious law being violated if one does so?
>>
>>>>This results in a reduction in what would
>>>> have been part if the error term -- in old version of SAS one had
>>>> to specify what error terms to use in GLM and some folks treated
>>>> within-subject designs as between-subjects design, leaving out
>>>> the subject effect which was added to a single error term (one
>>>> dissertation I was a reader on did this and the analyses had to
>>>> be redone). SAS implemented the "repeated" keyword/command
>>>> which I think should have helped stopped this problem from
>>>> occurring (SPSS and BMDP inferred the structure of the design
>>>> from how the data was specified).
>>>> NOTE: If subjects DO NOT DIFFER, then an independent
>>>> groups ANOVA should be done because the subject variance
>>>> will still be subtracted from the error term (subject by treatment
>>>> interaction), an unwarranted reduction in the error term.
>>>
>>>My reaction to the Note: NO! - "DO NOT DIFFER" is apparently a
>>>description of
>>>a test result, since the next phrases describe subtracting
>>>variances...
>>
>> Yes, see above:
>>
>> SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment
>>
>>>Someone is claiming that what must be described as analyzing
>>>"according
>>>to
>>>the actual design" would provide "an unwarranted reduction in the
>>>error term"?
>>
>> If one asserts that one is using a within-subject design but (when T
>> >
>> 2) and the
>> variance-covariance matrix is an idtentity matrix (that is, the
>> responses are
>> not correlated as one would expect and which a repeated-measures
>> ANOVA
>> assumes), then one really shouldn't do a repeated measure ANOVA
>> because
>> SS-Subjects will be calcualted and as a sample value is unlikely to
>> be
>> exactly
>> equal to zero, thus giving rise to an inappropriate SS-Error(TxP) if
>> one
>> is
>> using the formula (the alternative is to take the cell mean for the
>> combination Tt by Pp
>> and subtract the Treatment mean for that t level + subtract the
>> Subject
>> mean for the
>> p level +the grand mean -- a hand operation that probably drove many
>> students
>> mad, using the following is so much easier if one has been doing the
>> math correctly:
>>
>> SS-Error(TxP) = SS-total - SS-Subjects(P) - SS-Treatment
>>
>>>I admit that the common examples won't show much difference. But I
>>>recall
>>>an extreme example: Consider a paired t-test with very similar means
>>>and
>>>a highly negative correlation.
>>
>> We've done this dance before and simply put I doubt that
>> Gosset/Student
>> or
>> Sir Ronald ever considered the case where measuring the same
>> individual
>> twice would lead to a negative correlation because this occurs only
>> in
>> rare situations. If the negative correlation is "real", then I think
>> one has a couple
>> of options:
>> (1) take the absolute value off the negative pearson r, do the
>> appropriate calculation,
>> and see if it make sense,
>> or
>> (2) Make up you own damned test. If you have a pair of values x and
>> y
>> on the same
>> individual, then Y would have to be blow the Y mean in order to
>> obtain a
>> negative
>> correlation. Given that Gosset's concerns was with making sure that
>> contents of
>> vats of ale and stout did not vary significantly from vat to vat, I
>> think that Guinness
>> would have fired his ass if he ever obtain a negative correlation
>> between two measures
>> that should be the same.
>>
>> If you are studying fish that give rise to strange patterns that are
>> not
>> appropriate
>> for existing tests, then this is a wonderful opportunity for one to
>> come
>> up with
>> a new test -- afterall they might name after you, especially if some
>> else created
>> the test before; see:
>> https://en.wikipedia.org/wiki/Stigler's_law_of_eponymy
>>
>> [snip]
>>>I can't agree that it is ever a mistake to analyze according to the
>>>design;
>>>though the observation that a variance is exactly ZERO raises the
>>>question
>>>of "What does describe this design?"
>>
>> I think you are unclear here. If you do a two-level within-subject
>> design
>> and obtain a nonsignificant Pearson of 0.10, between your two
>> measures,
>> are you saying that you should still use a correlated groups
>> t-test --
>> even if
>> you ave no evidence of a correlation between your pairs? Indeed you
>> have
>> evidence that the sample Pearson r should be set to zero, then the
>> term
>> - 2*r*SE1*SE2 in the denominator should be set to zero. If you do
>> not
>> set it
>> to zero because you think that even without evidence of r > 0.00 you
>> are
>> "entitled" to do that version of the test, I'd really like to hear
>> the
>> justification
>> and why it makes any sense instead of a misleading result.
>>
>> -Mike Palij
>> New York University
>
>> mp26@
>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/How-to-do-a-Ladder-graph-tp5731410p5731427.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: How to do a Ladder graph

In reply to this post by Mike

I think you have two central problems. Your laxness on hypothesis testing.
Your ignorance about intraclass correlations. I don't know if your cited text
has the same problems, or if you are mis-reading.

I haven't looked for your reference. If you are using him right,
your source, Hoyle, is apparently much laxer than I am about the
proper conclusions from "hypothesis testing." By my standard, the
failure-to-reject does NOT affirm that the null is TRUE; rather, it
states that there is insufficient evidence.

I do agree that when you know that the elements are independent,
it might be better to say that it is simply WRONG to set it up as a
repeated-measure in the first place. I do not accept that "significance"
is a proper guide for deciding on independence. (What I have read
about the Mauchley test for sphericity stated that its power is so poor
for small and medium n that -- instead of being guided by "0.05" -- one
should probably use "0.5" unless your n's are really huge.)

- see more comments inserted into the [stripped] text, below -

> Date: Mon, 1 Feb 2016 23:21:56 -0500

> From: [hidden email]
> Subject: Re: How to do a Ladder graph
> To: [hidden email]
>
> ----- Original Message -----
> On Monday, February 01, 2016 8:44 PM,Rich Ulrich wortel
> >
> >Mostly, good and useful comments, but ...
>
> Thank you. However, some of the comments you make below
> are unclear to me and perhaps you can clarify them a little
>

[snip, much detail]

> Now, according to Hoyle, if the Pearson f is not significant
> then the entire term -2*r*SE1*SE2 should be set to zero and
> one is left with a version of the independent groups t-test.
> But if one leaves that term while r is not significant, it will reduce
> the
> error variance (i.e, sum of VarErr1 + VarErr2)) and make the
> donominator inappropriately smaller. How much of an effect
> this has on the t-test depends upon how large the nonsignificant
> - 2*r*SE1*SE2 is but it could provide seriously misleading
> results.

I can believe, "seriously /different/ results"; doubling the N changes
the power of the t-test. The change in Mean-square matters more than
the change in Sum-of-squares, if you are not otherwise noticing the d.f.

I don't know if you are using Hoyle wrong; it sounds like he has it wrong.

The paired t-test with n-1 d.f. is to be replaced by an independent-
samples t-test with 2n-2 d.f. The difference in d.f. says that with
the independent test, you are pretending to have twice the sample size;
the error in this was implicit in my example of the "forced choice" with
r nearly 1.0; you can prove that you have "too much power" if you
let the sums total to 10, so that the correlation is perfect; and the
independent test still pretends that the sample is twice as big as it
actually is (that is, definitely "too much power" and the wrong test).
- For r= 1.0, the one-sample test is, incontrovertibly, a proper test of the
proper size.

It is well known that a paired test, with a strong positive correlation,
has much better power than the independent-samples test. It is less
well recognized that a negative correlation has less power than the
independent test; that certainly does NOT make it the preferred test -
it makes it an improper test.

[snip, more]

> We've done this dance before
??
and simply put I doubt that Gosset/Student
> or
> Sir Ronald ever considered the case where measuring the same individual
> twice would lead to a negative correlation because this occurs only in
> rare situations. If the negative correlation is "real", then I think
> one has a couple
> of options:
> (1) take the absolute value off the negative pearson r, do the
> appropriate calculation,
> and see if it make sense,

- Now THAT is absolute nonsense.

Negative correlations arise naturally for anything resembling an analysis
of "compositions" that come to a fixed total (or near to one).
- See my example of forced choice, time spent, analyzing (A, B) and dropping
"indeterminate". I think Sir Roger would not be surprised or amazed.
- Any analysis provided on rank-date (sum is fixed).
- Geological assays show compositions that to 100% (or, nearly).
- One early example that I read of was for the weights of a litter of certain lab-bred
rats. The pups differ in weight, down to "runt"; the litters have exactly the same
number of pups, and much more similarity in total weight that you would expect
from the within-litter variance.

The basic formula for intraclass correlation clearly allows for negative values; and
they make good sense, both logically and analytically. Now, not all textbooks are
always right. I heard a guest lecturer on campus a dozen years ago on Hierarchical
Linear Models. He presented a formula for ICC which had to be positive.

I raised the question about negative ICCs at the point where the last example
clearly had a forcing situation where the ordinary ICC would be negative ... and the
notion seemed new to the lecturer (whose name I forget).

> or
> (2) Make up you own damned test. If you have a pair of values x and y
> on the same
> individual, then Y would have to be blow the Y mean in order to obtain a
> negative
> correlation. Given that Gosset's concerns was with making sure that
> contents of
> vats of ale and stout did not vary significantly from vat to vat, I
> think that Guinness
> would have fired his ass if he ever obtain a negative correlation
> between two measures
> that should be the same.

But I doubt that Gosset assumed that his laboratory represented the whole
possible world of reasonable designs and hypotheses.

--
Rich Ulrich

Mike

Re: How to do a Ladder graph

Rich,

Just a couple of points::

(1) When one says "according to Hoyle" one is using an old saying

that refers back to Edmond Hoyle's rules for the playing of card games.

See:

https://en.wikipedia.org/wiki/Edmond_Hoyle

Quoting: from the above source:

|The phrase "according to Hoyle" came into the language as a

|reflection of his generally perceived authority on the subject

So, when used in conversation, like I did below, it meant "doing

things according to a recognized authority", not some statistician

named Hoyle as you seem to have misunderstood. See also:

http://www.phrases.org.uk/meanings/according-to-hoyle.html

I apologize for using pop culture refs that you are unfamiliar with.

(2) Since Gosset worked for a Guinness brewery, I don't believe

he worked in a lab, instead he probably worked on measurements

taken by workers during the usual course of brewing. If Guinness

was really concerned with lab work rather than brewing beer, ale,

and stout, I don't think that Guinness would have prevented their employees

from publishing papers with their real names (hence "student" because Gosset

considered himself a lowly student compared to his friend Sir Ronald).

Regarding negative correlations in t-tests and other statistics that

typically assume positive correlations, I have to admit that I have

never come across a situation in actual data where this has occurred

without it being some sort of error. But I am glad that you find the

topic interesting and I'm sure you are an excellent resource about them.

But if I want advice about ICC's I'll ask Pat Shrout about them.

NOTE: There are such things as negative probabilities -- Feynman has

a chapter on them in the following book:

https://books.google.com/books?hl=en&lr=&id=Avvxbw-NONYC&oi=fnd&pg=PA235&dq=%22negative+probabilities%22&ots=ummdm-zW9m&sig=MHv_11_tqm2Kf2EImcOFqsQdV4U#v=onepage&q=%22negative%20probabilities%22&f=false

and the Wikipedia has a kiiddie version of their explanation; see:

https://en.wikipedia.org/wiki/Negative_probability

Also, there is such a thing as "negative temperature", that is,

temperature below absolute zero on the Kelvin and related scales;

for the kiddie version of the explanation see the Wikipedia entry:

https://en.wikipedia.org/wiki/Negative_temperature

I point these out just to say that just because these negative entities exist,

I don't have to think about them -- I'm sure there are others intensely

interested in them Similarly, for negative correlations in situations where

they should be positive, I will leave it up to you to worry about such

things because the phenomena I study do not demonstrate them.

So, knock yourself out. And we'll have to agree to disagree on the

other points (NOTE: a possible reference to "Men in Black 3").

-Mike Palij

New York University

[hidden email]

----- Original Message -----

From: [hidden email]

To: [hidden email] ; [hidden email]

Sent: Wednesday, February 03, 2016 11:00 PM

Subject: RE: How to do a Ladder graph

I think you have two central problems. Your laxness on hypothesis testing.
Your ignorance about intraclass correlations. I don't know if your cited text
has the same problems, or if you are mis-reading.

I haven't looked for your reference. If you are using him right,
your source, Hoyle, is apparently much laxer than I am about the
proper conclusions from "hypothesis testing." By my standard, the
failure-to-reject does NOT affirm that the null is TRUE; rather, it
states that there is insufficient evidence.

I do agree that when you know that the elements are independent,
it might be better to say that it is simply WRONG to set it up as a
repeated-measure in the first place. I do not accept that "significance"
is a proper guide for deciding on independence. (What I have read
about the Mauchley test for sphericity stated that its power is so poor
for small and medium n that -- instead of being guided by "0.05" -- one
should probably use "0.5" unless your n's are really huge.)

- see more comments inserted into the [stripped] text, below -

> Date: Mon, 1 Feb 2016 23:21:56 -0500

> From: [hidden email]
> Subject: Re: How to do a Ladder graph
> To: [hidden email]
>
> ----- Original Message -----
> On Monday, February 01, 2016 8:44 PM,Rich Ulrich wortel
> >
> >Mostly, good and useful comments, but ...
>
> Thank you. However, some of the comments you make below
> are unclear to me and perhaps you can clarify them a little
>
[snip, much detail]

> Now, according to Hoyle, if the Pearson f is not significant
> then the entire term -2*r*SE1*SE2 should be set to zero and
> one is left with a version of the independent groups t-test.
> But if one leaves that term while r is not significant, it will reduce
> the
> error variance (i.e, sum of VarErr1 + VarErr2)) and make the
> donominator inappropriately smaller. How much of an effect
> this has on the t-test depends upon how large the nonsignificant
> - 2*r*SE1*SE2 is but it could provide seriously misleading
> results.

I can believe, "seriously /different/ results"; doubling the N changes
the power of the t-test. The change in Mean-square matters more than
the change in Sum-of-squares, if you are not otherwise noticing the d.f.

I don't know if you are using Hoyle wrong; it sounds like he has it wrong.

The paired t-test with n-1 d.f. is to be replaced by an independent-
samples t-test with 2n-2 d.f. The difference in d.f. says that with
the independent test, you are pretending to have twice the sample size;
the error in this was implicit in my example of the "forced choice" with
r nearly 1.0; you can prove that you have "too much power" if you
let the sums total to 10, so that the correlation is perfect; and the
independent test still pretends that the sample is twice as big as it
actually is (that is, definitely "too much power" and the wrong test).
- For r= 1.0, the one-sample test is, incontrovertibly, a proper test of the
proper size.

It is well known that a paired test, with a strong positive correlation,
has much better power than the independent-samples test. It is less
well recognized that a negative correlation has less power than the
independent test; that certainly does NOT make it the preferred test -
it makes it an improper test.

[snip, more]

> We've done this dance before
??
and simply put I doubt that Gosset/Student
> or
> Sir Ronald ever considered the case where measuring the same individual
> twice would lead to a negative correlation because this occurs only in
> rare situations. If the negative correlation is "real", then I think
> one has a couple
> of options:
> (1) take the absolute value off the negative pearson r, do the
> appropriate calculation,
> and see if it make sense,

- Now THAT is absolute nonsense.

Negative correlations arise naturally for anything resembling an analysis
of "compositions" that come to a fixed total (or near to one).
- See my example of forced choice, time spent, analyzing (A, B) and dropping
"indeterminate". I think Sir Roger would not be surprised or amazed.
- Any analysis provided on rank-date (sum is fixed).
- Geological assays show compositions that to 100% (or, nearly).
- One early example that I read of was for the weights of a litter of certain lab-bred
rats. The pups differ in weight, down to "runt"; the litters have exactly the same
number of pups, and much more similarity in total weight that you would expect
from the within-litter variance.

The basic formula for intraclass correlation clearly allows for negative values; and
they make good sense, both logically and analytically. Now, not all textbooks are
always right. I heard a guest lecturer on campus a dozen years ago on Hierarchical
Linear Models. He presented a formula for ICC which had to be positive.

I raised the question about negative ICCs at the point where the last example
clearly had a forcing situation where the ordinary ICC would be negative ... and the
notion seemed new to the lecturer (whose name I forget).

> or
> (2) Make up you own damned test. If you have a pair of values x and y
> on the same
> individual, then Y would have to be blow the Y mean in order to obtain a
> negative
> correlation. Given that Gosset's concerns was with making sure that
> contents of
> vats of ale and stout did not vary significantly from vat to vat, I
> think that Guinness
> would have fired his ass if he ever obtain a negative correlation
> between two measures
> that should be the same.

But I doubt that Gosset assumed that his laboratory represented the whole
possible world of reasonable designs and hypotheses.

--
Rich Ulrich

Ryan

Re: How to do a Ladder graph

This has been discussed already, at length.

The explanation I already provided a long time ago is below.

*************************************

All,

A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.

However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).

So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.

Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)

Using the simulated example I provide BELOW, the between-cage variance component is -.310601.

Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.

How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):

between cage-variance component = .111455 - .422056 = .310601

Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.

Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)

ICC = -.310601 / (-.310601 <a href="tel:+ 1.266169">+ 1.266169) = -.325043

Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.

The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.

Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.

I will stop at this point and simply provide the code to estimate the model.

Hope this is of interest to others.

Ryan

set seed 98734523.

new file.

inp pro.
compute plot=-99.
compute subject = -99.
compute x1 = -99.
compute x2 = -99.
compute x3 = -99.
compute e1 = -99.
compute e2 = -99.
compute e3 = -99.
compute sigma = 1.
compute rho = -0.35.
compute a11 = 1.
compute a21 = rho.
compute a31 = rho.
compute a22 = sqrt(1 - rho**2).
compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).

leave plot to a33.

   loop plot= 1 to 50.
   compute x1 = rv.normal(0,1).
   compute x2 = rv.normal(0,1).
   compute x3 = rv.normal(0,1).
   compute e1 = sigma * a11*x1.
   compute e2 = sigma * (a21*x1 + a22*x2).
   compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).

loop subject = 1 to 3.
compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).

end case.
end loop.
end loop.
end file.
end inp pro.
exe.

delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.

MIXED y
   /FIXED= | SSTYPE(3)
   /METHOD=REML
   /PRINT=R SOLUTION
   /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).

VARCOMP y BY plot
/RANDOM=plot
/METHOD=SSTYPE(3)
/PRINT=SS
/PRINT=EMS
/DESIGN
/INTERCEPT=INCLUDE.

As you can see above, arriving at a negative ICC simply requires an understanding how a variance component and ICC are calculated.

*************************************

I will not respond back to this post since all the proof necessary is right here, and this has already been discussed.

Ryan

On Feb 4, 2016, at 12:00 AM, Mike Palij <[hidden email]> wrote:

Rich,

Just a couple of points::

(1) When one says "according to Hoyle" one is using an old saying

that refers back to Edmond Hoyle's rules for the playing of card games.

See:

https://en.wikipedia.org/wiki/Edmond_Hoyle

Quoting: from the above source:

|The phrase "according to Hoyle" came into the language as a

|reflection of his generally perceived authority on the subject

So, when used in conversation, like I did below, it meant "doing

things according to a recognized authority", not some statistician

named Hoyle as you seem to have misunderstood. See also:

http://www.phrases.org.uk/meanings/according-to-hoyle.html

I apologize for using pop culture refs that you are unfamiliar with.

(2) Since Gosset worked for a Guinness brewery, I don't believe

he worked in a lab, instead he probably worked on measurements

taken by workers during the usual course of brewing. If Guinness

was really concerned with lab work rather than brewing beer, ale,

and stout, I don't think that Guinness would have prevented their employees

from publishing papers with their real names (hence "student" because Gosset

considered himself a lowly student compared to his friend Sir Ronald).

Regarding negative correlations in t-tests and other statistics that

typically assume positive correlations, I have to admit that I have

never come across a situation in actual data where this has occurred

without it being some sort of error. But I am glad that you find the

topic interesting and I'm sure you are an excellent resource about them.

But if I want advice about ICC's I'll ask Pat Shrout about them.

NOTE: There are such things as negative probabilities -- Feynman has

a chapter on them in the following book:

https://books.google.com/books?hl=en&lr=&id=Avvxbw-NONYC&oi=fnd&pg=PA235&dq=%22negative+probabilities%22&ots=ummdm-zW9m&sig=MHv_11_tqm2Kf2EImcOFqsQdV4U#v=onepage&q=%22negative%20probabilities%22&f=false

and the Wikipedia has a kiiddie version of their explanation; see:

https://en.wikipedia.org/wiki/Negative_probability

Also, there is such a thing as "negative temperature", that is,

temperature below absolute zero on the Kelvin and related scales;

for the kiddie version of the explanation see the Wikipedia entry:

https://en.wikipedia.org/wiki/Negative_temperature

I point these out just to say that just because these negative entities exist,

I don't have to think about them -- I'm sure there are others intensely

interested in them Similarly, for negative correlations in situations where

they should be positive, I will leave it up to you to worry about such

things because the phenomena I study do not demonstrate them.

So, knock yourself out. And we'll have to agree to disagree on the

other points (NOTE: a possible reference to "Men in Black 3").

-Mike Palij

New York University

[hidden email]

----- Original Message -----

From: [hidden email]

To: [hidden email] ; [hidden email]

Sent: Wednesday, February 03, 2016 11:00 PM

Subject: RE: How to do a Ladder graph

I think you have two central problems. Your laxness on hypothesis testing.
Your ignorance about intraclass correlations. I don't know if your cited text
has the same problems, or if you are mis-reading.

I haven't looked for your reference. If you are using him right,
your source, Hoyle, is apparently much laxer than I am about the
proper conclusions from "hypothesis testing." By my standard, the
failure-to-reject does NOT affirm that the null is TRUE; rather, it
states that there is insufficient evidence.

I do agree that when you know that the elements are independent,
it might be better to say that it is simply WRONG to set it up as a
repeated-measure in the first place. I do not accept that "significance"
is a proper guide for deciding on independence. (What I have read
about the Mauchley test for sphericity stated that its power is so poor
for small and medium n that -- instead of being guided by "0.05" -- one
should probably use "0.5" unless your n's are really huge.)

- see more comments inserted into the [stripped] text, below -

> Date: Mon, 1 Feb 2016 23:21:56 -0500

> From: [hidden email]
> Subject: Re: How to do a Ladder graph
> To: [hidden email]
>
> ----- Original Message -----
> On Monday, February 01, 2016 8:44 PM,Rich Ulrich wortel
> >
> >Mostly, good and useful comments, but ...
>
> Thank you. However, some of the comments you make below
> are unclear to me and perhaps you can clarify them a little
>
[snip, much detail]

> Now, according to Hoyle, if the Pearson f is not significant
> then the entire term -2*r*SE1*SE2 should be set to zero and
> one is left with a version of the independent groups t-test.
> But if one leaves that term while r is not significant, it will reduce
> the
> error variance (i.e, sum of VarErr1 + VarErr2)) and make the
> donominator inappropriately smaller. How much of an effect
> this has on the t-test depends upon how large the nonsignificant
> - 2*r*SE1*SE2 is but it could provide seriously misleading
> results.

I can believe, "seriously /different/ results"; doubling the N changes
the power of the t-test. The change in Mean-square matters more than
the change in Sum-of-squares, if you are not otherwise noticing the d.f.

I don't know if you are using Hoyle wrong; it sounds like he has it wrong.

The paired t-test with n-1 d.f. is to be replaced by an independent-
samples t-test with 2n-2 d.f. The difference in d.f. says that with
the independent test, you are pretending to have twice the sample size;
the error in this was implicit in my example of the "forced choice" with
r nearly 1.0; you can prove that you have "too much power" if you
let the sums total to 10, so that the correlation is perfect; and the
independent test still pretends that the sample is twice as big as it
actually is (that is, definitely "too much power" and the wrong test).
- For r= 1.0, the one-sample test is, incontrovertibly, a proper test of the
proper size.

It is well known that a paired test, with a strong positive correlation,
has much better power than the independent-samples test. It is less
well recognized that a negative correlation has less power than the
independent test; that certainly does NOT make it the preferred test -
it makes it an improper test.

[snip, more]

> We've done this dance before
??
and simply put I doubt that Gosset/Student
> or
> Sir Ronald ever considered the case where measuring the same individual
> twice would lead to a negative correlation because this occurs only in
> rare situations. If the negative correlation is "real", then I think
> one has a couple
> of options:
> (1) take the absolute value off the negative pearson r, do the
> appropriate calculation,
> and see if it make sense,

- Now THAT is absolute nonsense.

Negative correlations arise naturally for anything resembling an analysis
of "compositions" that come to a fixed total (or near to one).
- See my example of forced choice, time spent, analyzing (A, B) and dropping
"indeterminate". I think Sir Roger would not be surprised or amazed.
- Any analysis provided on rank-date (sum is fixed).
- Geological assays show compositions that to 100% (or, nearly).
- One early example that I read of was for the weights of a litter of certain lab-bred
rats. The pups differ in weight, down to "runt"; the litters have exactly the same
number of pups, and much more similarity in total weight that you would expect
from the within-litter variance.

The basic formula for intraclass correlation clearly allows for negative values; and
they make good sense, both logically and analytically. Now, not all textbooks are
always right. I heard a guest lecturer on campus a dozen years ago on Hierarchical
Linear Models. He presented a formula for ICC which had to be positive.

I raised the question about negative ICCs at the point where the last example
clearly had a forcing situation where the ordinary ICC would be negative ... and the
notion seemed new to the lecturer (whose name I forget).

> or
> (2) Make up you own damned test. If you have a pair of values x and y
> on the same
> individual, then Y would have to be blow the Y mean in order to obtain a
> negative
> correlation. Given that Gosset's concerns was with making sure that
> contents of
> vats of ale and stout did not vary significantly from vat to vat, I
> think that Guinness
> would have fired his ass if he ever obtain a negative correlation
> between two measures
> that should be the same.

But I doubt that Gosset assumed that his laboratory represented the whole
possible world of reasonable designs and hypotheses.

--
Rich Ulrich

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: How to do a Ladder graph

In reply to this post by Mike

Mike,

When I was writing three or four answers per day in the Usenet stats groups
some years back, Reef Fish Bob asked me why I so often packaged my advice
as, "in the applications that I know of" or "in biostatistics". My answer was
something to the effect that I had already suffered falls from hubris a couple
of times, when folks threw real examples at me that contradicted my too-narrow
perspective. [RFBob was prone to make flat pronouncements that occasionally
were wrong; he avoided embarrassment by never admitting he was wrong.]

I even do assume that "small data sets used in textbooks" can have a real-world
application, somewhere, even before I run into it. The questions about the limits
are real questions to be aware of (and, sometimes, wary of).

My "fish" data is not my invention; it was in a question that was posed (~2003)
and I answered it. I was already familiar with the concept of "negative r" in a
paired t. I remember, in particular, a question on an exam in my first year of
grad school: 1) construct a small set of data for which the paired t will reject,
and the independent t will not; 2) construct a set for which the independent t
will reject and the paired test will not. Your formula for the error term (using r)
gives the answer away. Our class had not been given that form of the formula,
so it baffled everyone else and he omitted that Q from the scoring for the test.
But I had read it in Fisher's book on experimental design, so it was obvious.

Negative r, or negative ICC.
Why does this rare-data circumstance matter? - Well, I think that if you intend
to become a "real statistician" who understands his work, you will always want
to figure out what makes the difference between any two competing tests.
T-tests come pretty early in the curriculum.

Sure, I'm familiar with "according to Hoyle." He was talking about card games,
for which he documented the arbitrary rules. Your allusion was just too silly
-- after all, this is science, and you are wrong -- for me to think that you didn't
also have some authority: one who is wrong or who you persistently misread.
I thought it was a clever word play ... not, what you seem to claim, that you
were trying to push through your error by force of personality.

After all, statistics are logical, and I had given you the logic where negative r's
and ICCs make sense. Now, Ryan has, too, following the same not-so-unthinkable
model of scores adding to a similar total.

[Mike: But if I want advice about ICC's I'll ask Pat Shrout about them.]

Please do that, if Pat is someone who can sort you out.

--
Rich Ulrich

Date: Thu, 4 Feb 2016 00:00:30 -0500
From: [hidden email]
Subject: Re: How to do a Ladder graph
To: [hidden email]

Rich,

Just a couple of points::

(1) When one says "according to Hoyle" one is using an old saying

that refers back to Edmond Hoyle's rules for the playing of card games.

See:
[ snip, irrelevancies; previous notes]

Rich Ulrich

Re: How to do a Ladder graph

In reply to this post by Mike

P.S.
Mike,
I hold the same attitude towards negative /variances/ as what you suggest
by your NOTE: on negative probabilities and negative absolute temperatures
-- something exotic that I don't need to worry about.

However, I think you are attributing the weirdness of "negative variance"
to "negative correlations" and covariance. That is a huge error. Your own
rejection of "real, negative r's" has already led you to deliver bad advice in
this thread, so it should be a concern to you. IMO, "agree to disagree" properly
applies to matters of opinion, not matters of fact.

I have illustrated exactly how negative ICCs arise in several examples: by nature,
by experimental design, and as a consequence of data manipulation (scores
that total 100% or a nearly-fixed sum).

--
Rich Ulrich

Date: Thu, 4 Feb 2016 00:00:30 -0500
From: [hidden email]
Subject: Re: How to do a Ladder graph
To: [hidden email]

Rich,

Just a couple of points::

.
[snip, much]

Regarding negative correlations in t-tests and other statistics that

typically assume positive correlations, I have to admit that I have

never come across a situation in actual data where this has occurred

without it being some sort of error. But I am glad that you find the

topic interesting and I'm sure you are an excellent resource about them.

But if I want advice about ICC's I'll ask Pat Shrout about them.

NOTE: There are such things as negative probabilities -- Feynman has

a chapter on them in the following book:

and the Wikipedia has a kiiddie version of their explanation; see:

https://en.wikipedia.org/wiki/Negative_probability

Also, there is such a thing as "negative temperature", that is,

temperature below absolute zero on the Kelvin and related scales;

for the kiddie version of the explanation see the Wikipedia entry:

https://en.wikipedia.org/wiki/Negative_temperature

I point these out just to say that just because these negative entities exist,

I don't have to think about them -- I'm sure there are others intensely

interested in them Similarly, for negative correlations in situations where

they should be positive, I will leave it up to you to worry about such

things because the phenomena I study do not demonstrate them.

So, knock yourself out. And we'll have to agree to disagree on the

other points (NOTE: a possible reference to "Men in Black 3").

[snip]