paired t-test for unmatched subjects impossible to compute, correct?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

paired t-test for unmatched subjects impossible to compute, correct?

Zdaniuk, Bozena-3

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Marta Garcia-Granero
Hi

You could try to compute it by hand if you have a good estimate (educated guess?) of the correlation between the pre-post answers.

The variance of the paired differences csn be computed as s^2(pre) +s^2(post) - 2*s(pre)*s(post)*corr(pre, post)

I hope I am remembering the formula correctly, because it's almost midnight in Spain, and I'm sending this from my phone, not using my computer, where I keep that kind of information.

Regards
Marta GG


El sáb., 14 nov. 2020 20:51, Zdaniuk, Bozena <[hidden email]> escribió:

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Kirill Orlov
In reply to this post by Zdaniuk, Bozena-3
When you do an independent test instead of a paired one you don't do anything _illegal_. You just lose power.


14.11.2020 22:51, Zdaniuk, Bozena пишет:

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Mike
In reply to this post by Marta Garcia-Granero
Marta's formula is close but ambiguous.  Consider the following:

(1) The denominator of the two sample t-test is called the standard error of
the difference between means or SsubM1-M2 for the independent groups case
and the standard error of the mean difference or SsubM-Difference for the
correlated groups case.

(2)  The independent groups SsubM1-M2 is equal to
sqrt(VarianceError_Mean1 + VarianceError_Mean2) or sqrt(VE_M1 + VE_M2).
In words, it is the square root of the sum of variance errors of the means for the two groups.
In Marta's example below assume s^2(pre) = VarianceError_Mean1 and
s^2(post) = VarianceError_Mean2.

(3) The correlated groups SsubM-Difference is equal to
sqrt(VarianceError_Mean1 + VarianceError_Mean2  -  2*r*StandardError_Mean1*StandardError_Mean2)
or
sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)
In words, the square root of the sum of variance errors of the means MINUS the product of 2 times
Pearson r between the values of group1(pre) and group2(post), times the standard error of Mean1, times the
standard error of Mean2.
In Marta's example, let s(pre) = SE_M1 and s(post) = SE_M2; r = corr(pre,post)

(4) If r = 0.00 then - 2*r*SE_M1*SE_M2 becomes zero and we simply have the sum of the sum
of the variance errors of the mean, that is the sum used in the independent groups t-test.

For N=30, r would have to be equal to about 0.35 to be statistically significant -- if you
know what the population correlation rho is, then you can use that value even if it is
less than 0.35. 

If r is small or close to zero, then the independent groups t-test and the correlated groups t-test
will be similar because the term - 2*r*SE_M1*SE_M2 is small because of the small r and
subtracting it from the sum of the variance errors of the mean has a small impact.

(5) If r is large or close to 1.00, then the denominator of the correlated groups t-test will be
very small.  If we assume r = 0.99 and homogeneity of variance (i.e., SE_M1 = SE_M2 or SE1 = SE2),
then SE1*SE2 = VE, that is the common variance error of the mean -- the denominator is now
sqrt(VE + VE - 1.98*VE) or sqrt(2VE - 1.98*VE) or sqrt(VE*(2 - 1.98)) or sqrt(VE * .02)

In summary, if the Pearson r between pre and post is very small, then the independent groups
t-test and the correlated groups t-test will be close in value.  If the Pearson r is equal to 0.35
or somewhat larger, the independent groups t-test will be less than the correlated groups t-test
(the independent groups t-test might be nonsignificant while the correlated groups t-test
may be statistically significant).

If the Pearson r for pre and post is very large, then the two t-tests results will be very
different because the correlated groups t-tests denominator is much smaller.

So, this raises the question of what is a reasonable value for the Pearson r for pre and post.

By the way, if the correlation is negative, then you can't use the correlated groups t-test
at all.

-Mike Palij
New York University

P.S.  Why r=0.35?  Examination of a table of significant r's shows that for df=30 and
alpha = 0.05, two-tailed, one needs to have at least r=0.35.  Unfortunately, the table
did not have the value for df=28, the actual degrees of freedom in this case.




On Sat, Nov 14, 2020 at 5:37 PM Marta Garcia-Granero Márquez <[hidden email]> wrote:
Hi

You could try to compute it by hand if you have a good estimate (educated guess?) of the correlation between the pre-post answers.

The variance of the paired differences csn be computed as s^2(pre) +s^2(post) - 2*s(pre)*s(post)*corr(pre, post)

I hope I am remembering the formula correctly, because it's almost midnight in Spain, and I'm sending this from my phone, not using my computer, where I keep that kind of information.

Regards
Marta GG


El sáb., 14 nov. 2020 20:51, Zdaniuk, Bozena <[hidden email]> escribió:

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Rich Ulrich
Michael,
I disagree with you /totally/ on the proposition that you can't use
the paired t-test when the correlation is negative.  In fact, WHEN
the data exists as pairs, the PROPER testing will take into account
the correlation - be it negative or positive.  (If r is close enough to zero,
the difference is generally irrelevant in practice; but it is more legitimate
to use it than to ignore it.)

It is IMPROPER to ignore the negative correlation "merely" because
it creates a bigger error term and a less significant test.  Or, heaven
forbid, what you just described, it is improper to call it improper.

The general intra-class correlation has a formula that readily
allows it to be negative; and it makes sense when it occurs. 
For instance, lab-bred white mice of a certain type produce
litters of exactly 5 pups, ranging from "large" to "runt", and the
total weight of each litter will be almost exactly the same.
"Within variation" is much larger than "between variation" and
the  r  is negative to reflect that.

--
Rich Ulrich




From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Saturday, November 14, 2020 10:16 PM
To: [hidden email] <[hidden email]>
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?
 
Marta's formula is close but ambiguous.  Consider the following:
< snip >

By the way, if the correlation is negative, then you can't use the correlated groups t-test
at all.

-Mike Palij
New York University
< snip, earlier >
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Mike
Hi Rich,

We covered some of this ground a few years ago and I think we agreed
to disagree.  But, just to clarify my side a bit, if one is trying to get an
estimate of random error in the denominator of the correlated groups t-test,
then the formula I provided and the direct difference method gives this
estimate only when one has a positive Pearson r.  When it is negative,
one is adding a systematic component to the error variance, that is,
adding 2*r*SE_M!*SE_M2 to the pure error variance VE_M1 + VE_M2.
One consequence of this is that the t-test will not be statistically
significant.unless the difference between means is freakishly huge,
so large that one can use the intraocular trauma test instead (i.e.,
the difference is so large it hits you between the eyes).

There are also deeper issues involved in this situation such as
why there is a negative correlation in a situation where one expects
there to be a positive correlation (this may involve the measurement
model being used for data but a more concerning.situation is when
the data is actually the outcome of a process that can be modelled
with structural equations -- the correlated groups t-test in this situation
is inappropriate because one has omitted relevant variable variables
that should be controlled for; this is most likely to happen in research
that is observational or nonexperimental and pairing is "arbitrary" such
as husband-wife, siblings, etc.).

The question as to why one has a negative correlation is a relatively
deep one and though it has been noted in the case of the correlated
groups t-test there has been no actual or generally agreed upon
procedure to deal with me.  Consider a little bit of history:

"Student" (i.e., William Gosset) (1908) is credited with developing the
t-test but the 1908 paper actually uses a single sample z-test and
Gosset attempted to determine what the proper distribution one should
use if one used sample statistics (i.e., variance error, standard error)
instead of the population parameters that a valid z-test uses.  In
situations where one has very large sample sizes (Gosset refers to
research in astronomy as an example) then using the sample statistics
in the z-test introduces only a small amount of error.  Gosset, however,
was concerned with tests involving small samples.  Although Gosset
made a good start, he did not have the mathematical background to
develop the t-distribution.  This would lead to his collaboration with
a student named Ronald Fisher who did have the skills and would work
out the mathematical basis for the t-distribution.  For the story of
this collaboration see the following:
Eisenhart, C. (1979). On the Transition from “Student's” z to “Student's” t.
The American Statistician, 33(1), 6-12.

What is of additional interest is that Fisher and Gosset were working
with the single sample situation (again, one example is the correlated
groups t-test expressed through the direct-difference method) and
it would be Fisher who would work out the independent groups t-test
formula which was presented in a 1925 Metron article and in the
1925 edition of Fisher's "Statistical Methods for Research Workers"
(this led some authors at the time to call the independent groups t-test
"Fisher's t-test" but Fisher would grant the credit to "Student").
What is most relevant and interesting is Fisher's treatment of the one sample
t-test based on the direct difference method (i.e., correlated groups).
Both Gosset and Fisher knew that the t-test based on correlated groups
would have smaller error variances (i.e., denominator) relative to the
independent groups situations but they did not provide the explicit
formula for how this was done (i.e., subtracting 2*r*SE_M1*M2 from
the sum of the error variances) -- it would be Eisenhart who would
work out the math  and published the results several years later.
What is most surprising is that though Fisher knew that the correlation
was involved in calculating the error variance, he also knew that this
was only true for positive correlations.  His advice in his 1925 "Research
Methods" text was simply not to use the t-test in correlated groups
unless it was positive.  After the first few editions of "Research Methods"
this piece of advice was removed though other authors would sometimes
repeat this advice.  Again, to date, no simple procedure has been developed
for the correlated groups t-test with negative Pearson r.

If I may be so bold, let me suggest an exercise that illustrates some
of the issues involved.  Create a data set with two variables Y1 and Y2
which has a negative correlation of, say, -0.50 (any large value will do).
Calculate the correlated groups t-test with the standard formula (i.e.,
subtracting 2*r*SE_M1*SE_M2 from the sum of the error variance).
Note that the denominator now contains a random error component
and a systematic component (i.e.,  2*r*SE_M1*SE_M2).
Next, take the Y2 values and reflect them/recode them so the sign
of their deviations from the mean of Y2 is reversed -- this will make
the correlation positive while keeping the sample mean and variance
the same.  Calculate the correlated groups t-test for Y1 and the reflected
values of Y2.  The denominator now consists of pure error and is smaller
than the denominator using the negative correlation. 

The t-test with the reflected data captures the spirit of two sample t-test:
Is the difference between means relative to random variation so large
that it is unlikely to be due to chance?
The t-test with the unreflected data compares the difference between
means relative to the sum of random variation and systematic variation
due to the relationship between Y1 and Y2.

The first t-test above uses a procedure comparable to the independent
groups t-test:  mean difference/random variation.
This seems reasonable but how does one justify using it especially
if the negative correlation is of theoretical interest.  We need a version
of the t-test that explicitly takes into account the negative correlation
without ad hoc modifications like reflecting the data or only using the
absolute value of the Pearson r and so on.  If we can keep the
formula simple or similar to what we have already been using, that
would be a neat thing to do.  If Rich wants to show how the intraclass
correlation can be incorporated into the standard t-test formulas,
I would like to see it.  Or what would be the new formula that allows this.

-Mike Palij
New York University


On Sun, Nov 15, 2020 at 1:50 AM Rich Ulrich <[hidden email]> wrote:
Michael,
I disagree with you /totally/ on the proposition that you can't use
the paired t-test when the correlation is negative.  In fact, WHEN
the data exists as pairs, the PROPER testing will take into account
the correlation - be it negative or positive.  (If r is close enough to zero,
the difference is generally irrelevant in practice; but it is more legitimate
to use it than to ignore it.)

It is IMPROPER to ignore the negative correlation "merely" because
it creates a bigger error term and a less significant test.  Or, heaven
forbid, what you just described, it is improper to call it improper.

The general intra-class correlation has a formula that readily
allows it to be negative; and it makes sense when it occurs. 
For instance, lab-bred white mice of a certain type produce
litters of exactly 5 pups, ranging from "large" to "runt", and the
total weight of each litter will be almost exactly the same.
"Within variation" is much larger than "between variation" and
the  r  is negative to reflect that.

--
Rich Ulrich




From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Saturday, November 14, 2020 10:16 PM
To: [hidden email] <[hidden email]>
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?
 
Marta's formula is close but ambiguous.  Consider the following:
< snip >

By the way, if the correlation is negative, then you can't use the correlated groups t-test
at all.

-Mike Palij
New York University
< snip, earlier >
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Zdaniuk, Bozena-3
In reply to this post by Mike

Thanks so much to everyone who responded (Michael thanks so much for a thorough explanation!).

I managed to match eight IP addresses between pre and post responses, most likely indicating the same person responding, so this may give me a good idea of what the r is between the pre and post measures.

Best regards,

Bozena

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Michael Palij
Sent: November 14, 2020 7:17 PM
To: [hidden email]
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?

 

[CAUTION: Non-UBC Email]

Marta's formula is close but ambiguous.  Consider the following:

 

(1) The denominator of the two sample t-test is called the standard error of

the difference between means or SsubM1-M2 for the independent groups case

and the standard error of the mean difference or SsubM-Difference for the

correlated groups case.

 

(2)  The independent groups SsubM1-M2 is equal to

sqrt(VarianceError_Mean1 + VarianceError_Mean2) or sqrt(VE_M1 + VE_M2).

In words, it is the square root of the sum of variance errors of the means for the two groups.

In Marta's example below assume s^2(pre) = VarianceError_Mean1 and

s^2(post) = VarianceError_Mean2.

 

(3) The correlated groups SsubM-Difference is equal to

sqrt(VarianceError_Mean1 + VarianceError_Mean2  -  2*r*StandardError_Mean1*StandardError_Mean2)

or

sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)

In words, the square root of the sum of variance errors of the means MINUS the product of 2 times

Pearson r between the values of group1(pre) and group2(post), times the standard error of Mean1, times the

standard error of Mean2.

In Marta's example, let s(pre) = SE_M1 and s(post) = SE_M2; r = corr(pre,post)

 

(4) If r = 0.00 then - 2*r*SE_M1*SE_M2 becomes zero and we simply have the sum of the sum

of the variance errors of the mean, that is the sum used in the independent groups t-test.

 

For N=30, r would have to be equal to about 0.35 to be statistically significant -- if you

know what the population correlation rho is, then you can use that value even if it is

less than 0.35. 

 

If r is small or close to zero, then the independent groups t-test and the correlated groups t-test

will be similar because the term - 2*r*SE_M1*SE_M2 is small because of the small r and

subtracting it from the sum of the variance errors of the mean has a small impact.

 

(5) If r is large or close to 1.00, then the denominator of the correlated groups t-test will be

very small.  If we assume r = 0.99 and homogeneity of variance (i.e., SE_M1 = SE_M2 or SE1 = SE2),

then SE1*SE2 = VE, that is the common variance error of the mean -- the denominator is now

sqrt(VE + VE - 1.98*VE) or sqrt(2VE - 1.98*VE) or sqrt(VE*(2 - 1.98)) or sqrt(VE * .02)

 

In summary, if the Pearson r between pre and post is very small, then the independent groups

t-test and the correlated groups t-test will be close in value.  If the Pearson r is equal to 0.35

or somewhat larger, the independent groups t-test will be less than the correlated groups t-test

(the independent groups t-test might be nonsignificant while the correlated groups t-test

may be statistically significant).

 

If the Pearson r for pre and post is very large, then the two t-tests results will be very

different because the correlated groups t-tests denominator is much smaller.

 

So, this raises the question of what is a reasonable value for the Pearson r for pre and post.

 

By the way, if the correlation is negative, then you can't use the correlated groups t-test

at all.

 

-Mike Palij

New York University

 

P.S.  Why r=0.35?  Examination of a table of significant r's shows that for df=30 and

alpha = 0.05, two-tailed, one needs to have at least r=0.35.  Unfortunately, the table

did not have the value for df=28, the actual degrees of freedom in this case.

 

 

 

 

On Sat, Nov 14, 2020 at 5:37 PM Marta Garcia-Granero Márquez <[hidden email]> wrote:

Hi

 

You could try to compute it by hand if you have a good estimate (educated guess?) of the correlation between the pre-post answers.

 

The variance of the paired differences csn be computed as s^2(pre) +s^2(post) - 2*s(pre)*s(post)*corr(pre, post)

 

I hope I am remembering the formula correctly, because it's almost midnight in Spain, and I'm sending this from my phone, not using my computer, where I keep that kind of information.

 

Regards

Marta GG

 

El sáb., 14 nov. 2020 20:51, Zdaniuk, Bozena <[hidden email]> escribió:

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Mike
Let me throw out a suggestion for discussion (I admit lacking knowledge
in the following area):

Wouldn't it be possible to bootstrap, say, 1000 pairs of samples from the
original data and calculate the Pearson r for each pair?
One could then calculate the median Pearson r and the interquartile range
(or some other set of limits) for the Pearson r.  Wouldn't this provide
a (somewhat) more accurate estimate of the correlation?

It has been a while since I looked the literature in this area but
I vaguely recall that this situation might be treated as a case of
missing data:  matched data are complete but mixed data pairs
are instances of missing data.  I could be completely wrong on
this point, so I would appreciate getting a correction.

-Mike Palij
New York University




On Sun, Nov 15, 2020 at 1:06 PM Zdaniuk, Bozena <[hidden email]> wrote:

Thanks so much to everyone who responded (Michael thanks so much for a thorough explanation!).

I managed to match eight IP addresses between pre and post responses, most likely indicating the same person responding, so this may give me a good idea of what the r is between the pre and post measures.

Best regards,

Bozena

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Michael Palij
Sent: November 14, 2020 7:17 PM
To: [hidden email]
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?

 

[CAUTION: Non-UBC Email]

Marta's formula is close but ambiguous.  Consider the following:

 

(1) The denominator of the two sample t-test is called the standard error of

the difference between means or SsubM1-M2 for the independent groups case

and the standard error of the mean difference or SsubM-Difference for the

correlated groups case.

 

(2)  The independent groups SsubM1-M2 is equal to

sqrt(VarianceError_Mean1 + VarianceError_Mean2) or sqrt(VE_M1 + VE_M2).

In words, it is the square root of the sum of variance errors of the means for the two groups.

In Marta's example below assume s^2(pre) = VarianceError_Mean1 and

s^2(post) = VarianceError_Mean2.

 

(3) The correlated groups SsubM-Difference is equal to

sqrt(VarianceError_Mean1 + VarianceError_Mean2  -  2*r*StandardError_Mean1*StandardError_Mean2)

or

sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)

In words, the square root of the sum of variance errors of the means MINUS the product of 2 times

Pearson r between the values of group1(pre) and group2(post), times the standard error of Mean1, times the

standard error of Mean2.

In Marta's example, let s(pre) = SE_M1 and s(post) = SE_M2; r = corr(pre,post)

 

(4) If r = 0.00 then - 2*r*SE_M1*SE_M2 becomes zero and we simply have the sum of the sum

of the variance errors of the mean, that is the sum used in the independent groups t-test.

 

For N=30, r would have to be equal to about 0.35 to be statistically significant -- if you

know what the population correlation rho is, then you can use that value even if it is

less than 0.35. 

 

If r is small or close to zero, then the independent groups t-test and the correlated groups t-test

will be similar because the term - 2*r*SE_M1*SE_M2 is small because of the small r and

subtracting it from the sum of the variance errors of the mean has a small impact.

 

(5) If r is large or close to 1.00, then the denominator of the correlated groups t-test will be

very small.  If we assume r = 0.99 and homogeneity of variance (i.e., SE_M1 = SE_M2 or SE1 = SE2),

then SE1*SE2 = VE, that is the common variance error of the mean -- the denominator is now

sqrt(VE + VE - 1.98*VE) or sqrt(2VE - 1.98*VE) or sqrt(VE*(2 - 1.98)) or sqrt(VE * .02)

 

In summary, if the Pearson r between pre and post is very small, then the independent groups

t-test and the correlated groups t-test will be close in value.  If the Pearson r is equal to 0.35

or somewhat larger, the independent groups t-test will be less than the correlated groups t-test

(the independent groups t-test might be nonsignificant while the correlated groups t-test

may be statistically significant).

 

If the Pearson r for pre and post is very large, then the two t-tests results will be very

different because the correlated groups t-tests denominator is much smaller.

 

So, this raises the question of what is a reasonable value for the Pearson r for pre and post.

 

By the way, if the correlation is negative, then you can't use the correlated groups t-test

at all.

 

-Mike Palij

New York University

 

P.S.  Why r=0.35?  Examination of a table of significant r's shows that for df=30 and

alpha = 0.05, two-tailed, one needs to have at least r=0.35.  Unfortunately, the table

did not have the value for df=28, the actual degrees of freedom in this case.

 

 

 

 

On Sat, Nov 14, 2020 at 5:37 PM Marta Garcia-Granero Márquez <[hidden email]> wrote:

Hi

 

You could try to compute it by hand if you have a good estimate (educated guess?) of the correlation between the pre-post answers.

 

The variance of the paired differences csn be computed as s^2(pre) +s^2(post) - 2*s(pre)*s(post)*corr(pre, post)

 

I hope I am remembering the formula correctly, because it's almost midnight in Spain, and I'm sending this from my phone, not using my computer, where I keep that kind of information.

 

Regards

Marta GG

 

El sáb., 14 nov. 2020 20:51, Zdaniuk, Bozena <[hidden email]> escribió:

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Baker, Harley

This is an interesting discussion. As I understand the issue, the data were obtained from the same participants on two separate occasions (standard pre-post situation.) However, for whatever reason, matching participants to scores is not possible. We, as a community, have been asked to weigh-in on what type of statistical analysis would be appropriate under these circumstances. Because matching is not possible here, the default is to use an independent t-test. As Kiril Orlov indicated, this approach is appropriate but has lower power. Others (e.g., Mike and Marta) have suggested ways to increase power by estimating the pre-post correlation that is necessary to adjust the error term.

 

As originally derived, the t-test error term always includes the -2*r*SE(1)*SE(2) [where SE(1) and SE(2) are the standard errors of the two groups/testing occasions.] In the independent case, r is always 0. Under random sampling, the expected value of the correlation between the elements selected for Group 1 and those selected for Group 2 is 0. Formulas for the independent t-test leave this entire term out for obvious reasons as it is always 0. For paired/dependent/matched t-tests however, the expected value of the correlation is > 0. There is no way to determine the magnitude of the expected value (expected values are always population parameters.) In its place, we calculate the sample value of r just as we use sample-based estimates of the population means, variance, standard errors, etc.

 

So, the issue becomes whether or not are there reasonable and defensible alternative ways to estimate the population correlation. Both Mike and Marta have suggested potential ways to do this. Here is another approach.

 

First, calculate as a standard independent t-test. If the difference are statistically significant, no worries. Calculate the effect size and most likely that is the end of it. (BTW, the effect size will be the same regardless of which t-test is conducted.) Case closed.

 

Second, Bozena matched a small sample (n = 8) based on IP addresses. Accepting the assumption that these do represent real pairs, the r can be calculated for these pairs. I then would calculate both the maximum and minimum values of Pearson’s r based on rank ordering all of the cases in each group. I would adjust the t-test error term using each and see what happens. If they both yield the same final result, again, case closed. If they do not agree, Mike’s suggestion becomes quite useful I think. But rather than bootstrap pairs, I would bootstrap 1,000 values of the r using a t- or normal distribution with the min/max values of the r calculated from the sample data as the interval endpoints and the sample r as the mean parameter. This will yield 1,000 t-values and I would take the median value as the final t-test result.

 

Of course, the real issue here is the degree to which whatever is done can be both successfully explained and then defended as leading to an accurate conclusion for the question asked and the purpose of the study.

 

If this is headed toward something academic, I’d just do the independent t-test and focus on the effect size and say the t-test results are under-powered by an unspecified amount. If this is headed in a different direction (evaluation report, not for publication, etc.) I would try various scenarios and see what happens.

 

Just my $0.02 worth.

 

Harley Baker

Professor Emeritus

California State University Channel Islands

 

 

From: "SPSSX(r) Discussion" <[hidden email]> on behalf of Michael Palij <[hidden email]>
Reply-To: Michael Palij <[hidden email]>
Date: Sunday, November 15, 2020 at 10:25 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?

 

CAUTION: This email originated from outside of CSUCI. Do not click links or open attachments unless you validate the sender and know the content is safe. Please forward this email to [hidden email] if you believe this email is suspicious. For more information on how to detect Phishing scams, please visit https://www.csuci.edu/its/security/phishing.htm

 

Let me throw out a suggestion for discussion (I admit lacking knowledge

in the following area):

 

Wouldn't it be possible to bootstrap, say, 1000 pairs of samples from the

original data and calculate the Pearson r for each pair?

One could then calculate the median Pearson r and the interquartile range

(or some other set of limits) for the Pearson r.  Wouldn't this provide

a (somewhat) more accurate estimate of the correlation?

 

It has been a while since I looked the literature in this area but

I vaguely recall that this situation might be treated as a case of

missing data:  matched data are complete but mixed data pairs

are instances of missing data.  I could be completely wrong on

this point, so I would appreciate getting a correction.

 

-Mike Palij

New York University

 

 

 

 

On Sun, Nov 15, 2020 at 1:06 PM Zdaniuk, Bozena <[hidden email]> wrote:

Thanks so much to everyone who responded (Michael thanks so much for a thorough explanation!).

I managed to match eight IP addresses between pre and post responses, most likely indicating the same person responding, so this may give me a good idea of what the r is between the pre and post measures.

Best regards,

Bozena

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Michael Palij
Sent: November 14, 2020 7:17 PM
To: [hidden email]
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?

 

[CAUTION: Non-UBC Email]

Marta's formula is close but ambiguous.  Consider the following:

 

(1) The denominator of the two sample t-test is called the standard error of

the difference between means or SsubM1-M2 for the independent groups case

and the standard error of the mean difference or SsubM-Difference for the

correlated groups case.

 

(2)  The independent groups SsubM1-M2 is equal to

sqrt(VarianceError_Mean1 + VarianceError_Mean2) or sqrt(VE_M1 + VE_M2).

In words, it is the square root of the sum of variance errors of the means for the two groups.

In Marta's example below assume s^2(pre) = VarianceError_Mean1 and

s^2(post) = VarianceError_Mean2.

 

(3) The correlated groups SsubM-Difference is equal to

sqrt(VarianceError_Mean1 + VarianceError_Mean2  -  2*r*StandardError_Mean1*StandardError_Mean2)

or

sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)

In words, the square root of the sum of variance errors of the means MINUS the product of 2 times

Pearson r between the values of group1(pre) and group2(post), times the standard error of Mean1, times the

standard error of Mean2.

In Marta's example, let s(pre) = SE_M1 and s(post) = SE_M2; r = corr(pre,post)

 

(4) If r = 0.00 then - 2*r*SE_M1*SE_M2 becomes zero and we simply have the sum of the sum

of the variance errors of the mean, that is the sum used in the independent groups t-test.

 

For N=30, r would have to be equal to about 0.35 to be statistically significant -- if you

know what the population correlation rho is, then you can use that value even if it is

less than 0.35. 

 

If r is small or close to zero, then the independent groups t-test and the correlated groups t-test

will be similar because the term - 2*r*SE_M1*SE_M2 is small because of the small r and

subtracting it from the sum of the variance errors of the mean has a small impact.

 

(5) If r is large or close to 1.00, then the denominator of the correlated groups t-test will be

very small.  If we assume r = 0.99 and homogeneity of variance (i.e., SE_M1 = SE_M2 or SE1 = SE2),

then SE1*SE2 = VE, that is the common variance error of the mean -- the denominator is now

sqrt(VE + VE - 1.98*VE) or sqrt(2VE - 1.98*VE) or sqrt(VE*(2 - 1.98)) or sqrt(VE * .02)

 

In summary, if the Pearson r between pre and post is very small, then the independent groups

t-test and the correlated groups t-test will be close in value.  If the Pearson r is equal to 0.35

or somewhat larger, the independent groups t-test will be less than the correlated groups t-test

(the independent groups t-test might be nonsignificant while the correlated groups t-test

may be statistically significant).

 

If the Pearson r for pre and post is very large, then the two t-tests results will be very

different because the correlated groups t-tests denominator is much smaller.

 

So, this raises the question of what is a reasonable value for the Pearson r for pre and post.

 

By the way, if the correlation is negative, then you can't use the correlated groups t-test

at all.

 

-Mike Palij

New York University

 

P.S.  Why r=0.35?  Examination of a table of significant r's shows that for df=30 and

alpha = 0.05, two-tailed, one needs to have at least r=0.35.  Unfortunately, the table

did not have the value for df=28, the actual degrees of freedom in this case.

 

 

 

 

On Sat, Nov 14, 2020 at 5:37 PM Marta Garcia-Granero Márquez <[hidden email]> wrote:

Hi

 

You could try to compute it by hand if you have a good estimate (educated guess?) of the correlation between the pre-post answers.

 

The variance of the paired differences csn be computed as s^2(pre) +s^2(post) - 2*s(pre)*s(post)*corr(pre, post)

 

I hope I am remembering the formula correctly, because it's almost midnight in Spain, and I'm sending this from my phone, not using my computer, where I keep that kind of information.

 

Regards

Marta GG

 

El sáb., 14 nov. 2020 20:51, Zdaniuk, Bozena <[hidden email]> escribió:

Hello everyone,

I have been asked to see if I can do something with the data where 30 students were tested before and after an intervention but, unfortunately, the students were not identified and their pre-post answers cannot be matched. Am I correct that a paired sample t-test cannot be computed on these data? I think the best I can do is treat the two sets of scores as separate samples and compare the two means with a independent sample t-test. And such a test would not be very valid.

Is there anything else I can do to get any valid estimates of how the students' scores changed?

thanks so much,

bozena

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Rich Ulrich
In reply to this post by Mike
Michael, (my comments interspersed)

M<<
We covered some of this ground a few years ago and I think we agreed
to disagree.  >>

My only recollection of the topic is from 15 or 20 years ago, on the sci.stat.*
groups, where no one (I think) disagreed with me.  I recall then of speaking
of a specific case, not generalities, though the generalization seemed obvious.

M<<       But, just to clarify my side a bit, if one is trying to get an
estimate of random error in the denominator of the correlated groups t-test,
then the formula I provided and the direct difference method gives this
estimate only when one has a positive Pearson r.  When it is negative,
one is adding a systematic component to the error variance, that is,
adding 2*r*SE_M!*SE_M2 to the pure error variance VE_M1 + VE_M2. >>

The paired t-test uses the formula for the variance of the difference of two
correlated terms.  Negative or positive correlation, the formula does not care.


M <<  One consequence of this is that the t-test will not be statistically
significant.unless the difference between means is freakishly huge,
so large that one can use the intraocular trauma test instead (i.e.,
the difference is so large it hits you between the eyes).  >>

Doubling the SD (compared to independent) is not an impossible penalty,
and I offer an example, below ...

M << There are also deeper issues involved in this situation such as
why there is a negative correlation in a situation where one expects
there to be a positive correlation (this may involve the measurement
model being used for data ... >>

"where one expects ... a positive correlation" asks questions of the data
that are possibly severe one.  Yes, look into those problems, SERIOUSLY.
Expecting a positive correlation and not getting it can be bad. Look into it.

I have been concerned with the negative correlation that makes sense.
If it is not part of the test, increasing the error, the test is simply wrong.

The case in hand, 20 years ago, was a forced choice for a goldfish, turn
Left or Right, for 25 trials. That can be a binomial test if the answers are
all Left or Right.  But the fish sometimes stopped or reversed course.

You can score that (-1, 0, 1)  for Left, Neither, Right.  Or you can make two
variables, Left and Right, and score them (Yes, No = 1,0) and do a t-test.
The paired t-test will give the same result as scoring (-1, 0, 1) -- since, as
a matter of algorithm, you /can/  do the subtraction and obtain exactly
those values.  But setting up a separate-samples t-test will give a test that
is more powerful -- and wrong.



M << (much further down) ...
If I may be so bold, let me suggest an exercise that illustrates some
of the issues involved. ...
Next, take the Y2 values and reflect them/recode them so the sign
of their deviations from the mean of Y2 is reversed -- this will make
the correlation positive while keeping the sample mean and variance
the same. >>

I have to admit that I am absolutely baffled at what this "suggestion"
is about.

--
Rich Ulrich

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Bruce Weaver
Administrator
Rich Ulrich wrote

> --- snip ---
>
> M << (much further down) ...
> If I may be so bold, let me suggest an exercise that illustrates some
> of the issues involved. ...
> Next, take the Y2 values and reflect them/recode them so the sign
> of their deviations from the mean of Y2 is reversed -- this will make
> the correlation positive while keeping the sample mean and variance
> the same. >>
>
> I have to admit that I am absolutely baffled at what this "suggestion"
> is about.
>
> --
> Rich Ulrich

I too am a bit puzzled by it.  Reflecting one variable to "fix" the sign of
the correlation seems rather arbitrary.  Nevertheless, if anyone wants to
try it, here is a dataset I generated using Stata's corr2data command
(https://www.stata.com/manuals/dcorr2data.pdf).  And before anyone objects,
yes, I know there are ways to generate raw data with a specified correlation
structure using SPSS, but they seem to me far more clunky than corr2data in
Stata!  I went with n=30, r=-0.35, means of 50 and 55, and both SDs = 10.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST /  y1  y2  y2ref  .
BEGIN DATA
    32.41188   60.37415   49.62585  
    49.68332   70.84409   39.15591  
    66.41528   51.90503   58.09497  
    47.82475   76.95493   33.04507  
     44.8241   41.83441   68.16559  
    61.41996   38.85472   71.14528  
    44.52678   57.93159   52.06841  
    33.66391   57.98293   52.01707  
    44.93646   51.85727   58.14273  
    55.71753   49.38914   60.61086  
    62.45727   49.95849   60.04151  
    36.19563   61.77552   48.22448  
    58.91556   54.26168   55.73832  
    55.07874   48.92782   61.07218  
    53.59081   64.05428   45.94572  
    40.19928   51.59528   58.40472  
    60.98631   52.33216   57.66784  
     31.9488   56.81583   53.18417  
    65.61869   40.08631    69.9137  
    45.37857   45.60527   64.39474  
    39.70753   64.58192   45.41808  
      54.422   44.32498   65.67502  
    45.68801   70.57954   39.42046  
    51.34895   63.01077   46.98923  
    57.26944   60.79189   49.20811  
    59.89955   66.11801   43.88199  
    49.98519   55.51571   54.48429  
    45.44726   43.04454   66.95546  
    62.34001   38.49995   71.50005  
    42.09842   60.19178   49.80822  
END DATA.

DESCRIPTIVES ALL.

T-TEST PAIRS=y1 y1 WITH y2 y2ref (PAIRED)
  /CRITERIA=CI(.9500)
  /MISSING=ANALYSIS.


In case anyone cares, here's the Stata code.

clear
matrix C = (1, -0.35 \ -0.35, 1)
corr2data y1 y2, n(30) corr(C) means(50 55) sds(10 10)
quietly summarize y2
generate y2ref = r(mean)-1*(y2-r(mean))
summarize
correlate y1 y2 y2ref
ttest y1 == y2
ttest y1 == y2ref
list, clean noobs

* Show that reflecting y2 has no effect when
* one uses an unpaired t-test
ttest y1 == y2, unpaired
ttest y1 == y2ref, unpaired




-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Mike
Bruce and Rich,

This may sound odd but I don't understand what you don't understand.
Let me step through the points I was making and see where the problems are:

(1)  The denominator of the t-test is supposed to contain a pure error term
and the denominator for the correlated groups t-test can be expressed as
sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)
where VE_M1 = the variance error of the mean for the first group
VE_M2 = the variance error of the mean for the second group
2 = constant
r = Pearson r between values in groups 1 and 2
SE_M1 = the standard error of the mean for the first group
SE_M2 = the standard error of the mean for the second group

IF the Pearson r is negative (-r), the denominator becomes
sqrt(VE_M1 + VE_M2 + 2*r*SE_M1*SE_M2)
or
sqrt(random error variance + systematic error variance)
But we want a measure of only the random error variance. 
How do we get an estimate of this?

(2)  In questionnaire construction there are various ways to deal with response
bias.  Consider the Right Wing Authoritarian (RWA) scale by Bob Altemeyer.
There are 32 items on this scale, each using 9 point Likert type scale 
ranging from -4 (very strongly disagree) to +4 ( very strongly agree).
These items can be divided into two groups:
Protrait items:  -4 is the lowest level of RWA and +4 is the highest level of RWA
Contrait items: -4 is the highest level of RWA and +4 is the lowest level of RWA

In a correlation matrix of RWA items Protrait items should correlate negatively
with Contrait if a person has a significant degree of RWA.  However, negative
correlations will have a bad effect on multivariate procedures like factor analysis --
all items are measures of RWA but Contrait items are scaled to prevent a
person selecting all -4 or +4 as a response (if such a pattern is obtained,
the data is removed).  To remove the negative correlations -- only the sign
is changed, not the value -- the responses for the contrait items are reflected
or reverse coded.  This produces an all positive correlation which can then
be factor analyzed or other analyses.
NOTE:  with structural equation modeling, one can include latent variables
for the pro/contrait status to determine if the difference in type of item has
a systematic effect of responses.
For more on the RWA and its analysis I suggest the following reference:

Altemeyer, B. (1998). The other “authoritarian personality”. In Advances in
experimental social psychology (Vol. 30, pp. 47-92). Academic Press.

To be clear, using a reflection transformation or reverse coding of the values
changes the sign of the Pearson but not of its magnitude -- the mean and
standard deviation also remain the same.

(3) Getting back to the issue in (1), is there a way to get a pure measure of
error variance in the denominator of the correlated groups t-test if the
Pearson r is negative?  If the measures are Y1 and Y2, then reflecting or
reverse coding either Y1 or Y2 will change the sign of the Pearson r for Y1 and Y2
and one can use the standard formula for the denominator, namely,
sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)

I believe that I originally said that this will provide an appropriate denominator,
that is, a measure of pure random error, but providing a justification doing
this might not, obviously, be acceptable to all.

So, in summary, using reflection will allow one to get the appropriate denominator
for the correlated groups t-test.  Is this the point that some have difficulty with?

-Mike Palij
New York University




On Sun, Nov 15, 2020 at 3:51 PM Bruce Weaver <[hidden email]> wrote:
Rich Ulrich wrote
> --- snip ---
>
> M << (much further down) ...
> If I may be so bold, let me suggest an exercise that illustrates some
> of the issues involved. ...
> Next, take the Y2 values and reflect them/recode them so the sign
> of their deviations from the mean of Y2 is reversed -- this will make
> the correlation positive while keeping the sample mean and variance
> the same. >>
>
> I have to admit that I am absolutely baffled at what this "suggestion"
> is about.
>
> --
> Rich Ulrich

I too am a bit puzzled by it.  Reflecting one variable to "fix" the sign of
the correlation seems rather arbitrary.  Nevertheless, if anyone wants to
try it, here is a dataset I generated using Stata's corr2data command
(https://urldefense.proofpoint.com/v2/url?u=https-3A__www.stata.com_manuals_dcorr2data.pdf&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=vzD70wsCVRuWDRx7WJektpBK4tMrAuC7IN4joK2MnZ4&s=GG6XTVWki8NbmMicdXRP_p0swIAL2VZoJcDB-sGivNQ&e= ).  And before anyone objects,
yes, I know there are ways to generate raw data with a specified correlation
structure using SPSS, but they seem to me far more clunky than corr2data in
Stata!  I went with n=30, r=-0.35, means of 50 and 55, and both SDs = 10.

NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST /  y1  y2  y2ref  .
BEGIN DATA
    32.41188   60.37415   49.62585 
    49.68332   70.84409   39.15591 
    66.41528   51.90503   58.09497 
    47.82475   76.95493   33.04507 
     44.8241   41.83441   68.16559 
    61.41996   38.85472   71.14528 
    44.52678   57.93159   52.06841 
    33.66391   57.98293   52.01707 
    44.93646   51.85727   58.14273 
    55.71753   49.38914   60.61086 
    62.45727   49.95849   60.04151 
    36.19563   61.77552   48.22448 
    58.91556   54.26168   55.73832 
    55.07874   48.92782   61.07218 
    53.59081   64.05428   45.94572 
    40.19928   51.59528   58.40472 
    60.98631   52.33216   57.66784 
     31.9488   56.81583   53.18417 
    65.61869   40.08631    69.9137 
    45.37857   45.60527   64.39474 
    39.70753   64.58192   45.41808 
      54.422   44.32498   65.67502 
    45.68801   70.57954   39.42046 
    51.34895   63.01077   46.98923 
    57.26944   60.79189   49.20811 
    59.89955   66.11801   43.88199 
    49.98519   55.51571   54.48429 
    45.44726   43.04454   66.95546 
    62.34001   38.49995   71.50005 
    42.09842   60.19178   49.80822 
END DATA.

DESCRIPTIVES ALL.

T-TEST PAIRS=y1 y1 WITH y2 y2ref (PAIRED)
  /CRITERIA=CI(.9500)
  /MISSING=ANALYSIS.


In case anyone cares, here's the Stata code.

clear
matrix C = (1, -0.35 \ -0.35, 1)
corr2data y1 y2, n(30) corr(C) means(50 55) sds(10 10)
quietly summarize y2
generate y2ref = r(mean)-1*(y2-r(mean))
summarize
correlate y1 y2 y2ref
ttest y1 == y2
ttest y1 == y2ref
list, clean noobs

* Show that reflecting y2 has no effect when
* one uses an unpaired t-test
ttest y1 == y2, unpaired
ttest y1 == y2ref, unpaired




-----
--
Bruce Weaver
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=vzD70wsCVRuWDRx7WJektpBK4tMrAuC7IN4joK2MnZ4&s=yMYpew0WbZy5y7uqizJTStvmezq84k9fQdeuanXedE0&e=

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=vzD70wsCVRuWDRx7WJektpBK4tMrAuC7IN4joK2MnZ4&s=aBDn6rM7Abe4pZVBP2_awjThtlZdbQQlu9yJCPWv3AA&e=

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Rich Ulrich
Comments interspersed -

MP<<
Bruce and Rich,

This may sound odd but I don't understand what you don't understand.
Let me step through the points I was making and see where the problems are:

(1)  The denominator of the t-test is supposed to contain a pure error term
and the denominator for the correlated groups t-test can be expressed as
sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)
where VE_M1 = the variance error of the mean for the first group
VE_M2 = the variance error of the mean for the second group
2 = constant
r = Pearson r between values in groups 1 and 2
SE_M1 = the standard error of the mean for the first group
SE_M2 = the standard error of the mean for the second group

IF the Pearson r is negative (-r), the denominator becomes
sqrt(VE_M1 + VE_M2 + 2*r*SE_M1*SE_M2)
or
sqrt(random error variance + systematic error variance)
But we want a measure of only the random error variance.
>> 

No, I never heard of that.  We compare a difference to its own
error, and that's that.  (Who talks about a "pure error term"?)

<< snip, details, RWA scale >>
MP<<
In a correlation matrix of RWA items Protrait items should correlate negatively
with Contrait if a person has a significant degree of RWA.  However, negative
correlations will have a bad effect on multivariate procedures like factor analysis --
>>

Well, I suppose "bad effect" may be in the eye of the viewer. A factor analysis
is 100% immune to linear scaling effects, including sign reversal, except to the
extent that it does reflect the reversed algebraic signs.  I don't see the virtue in
this digression to speak of factor analyses, which have nothing to do with t-tests.

MP<< all items are measures of RWA but Contrait items are scaled to prevent a
person selecting all -4 or +4 as a response (if such a pattern is obtained,
the data is removed).  To remove the negative correlations -- only the sign
is changed, not the value -- the responses for the contrait items are reflected
or reverse coded.  This produces an all positive correlation which can then
be factor analyzed or other analyses. >>

The factor analysis with or without reversed items comes out exactly the same,
except for the convenience of the investigator who sees minus signs.  For RWA,
the first Principal Component probably contain loadings for every variable:  Do you
want to see them with no minus signs, or with mixed signs?  The second Principal
Component may well measure "Response Bias" -- the direction of loadings, again,
depending on whether half the items were reversed.  Nothing to do with t-tests
or variance terms.


MP <<
(3) Getting back to the issue in (1), is there a way to get a pure measure of
error variance
>>

"pure measure of error variance"?  Are you inventing terms? 
I'm assuming that there usually /is/ an apparent reason for the negative
correlation; and the negative r is required for the formula to make it a "t-test".



MP<<
          in the denominator of the correlated groups t-test if the
Pearson r is negative?  If the measures are Y1 and Y2, then reflecting or
reverse coding either Y1 or Y2 will change the sign of the Pearson r for Y1 and Y2
and one can use the standard formula for the denominator, namely,
sqrt(VE_M1 + VE_M2 - 2*r*SE_M1*SE_M2)
>>

Before, you were testing the hypothesis that (Y1-Y2 = 0).  As best I see it, you
propose to replace that with a test that (Y1+Y2 = 0).  Or, you will use the error from
Y1+Y2  because it happens to be smaller. That does not sound rational. 

MP <<
So, in summary, using reflection will allow one to get the appropriate denominator
for the correlated groups t-test.  Is this the point that some have difficulty with?
>> 

You say "appropriate" where I say "does not sound rational".

--
Rich Ulrich

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Bruce Weaver
Administrator
I don't have time to re-read it right now, but I bet that Zimmerman's (1997)
article, "A Note on Interpretation of the Paired-Samples t Test", will be
relevant to this discussion.  In simulations, compared unpaired and paired
t-tests with correlations ranging from -0.5 to 0.5.  Those with access to
JSTOR can get the article here:

https://www.jstor.org/stable/1165289?seq=1





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: paired t-test for unmatched subjects impossible to compute, correct?

Rich Ulrich
Zimmerman does say that the situations where you think you
have cleverly increased your power might be ones where you
are doing invalid analyses by ignoring correlations. 

Even small correlations can have an effect, he warns.

--
Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Sunday, November 15, 2020 9:06 PM
To: [hidden email] <[hidden email]>
Subject: Re: paired t-test for unmatched subjects impossible to compute, correct?
 
I don't have time to re-read it right now, but I bet that Zimmerman's (1997)
article, "A Note on Interpretation of the Paired-Samples t Test", will be
relevant to this discussion.  In simulations, compared unpaired and paired
t-tests with correlations ranging from -0.5 to 0.5.  Those with access to
JSTOR can get the article here:

https://www.jstor.org/stable/1165289?seq=1





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD