What test to use for unmatched pre- and post-test of the same sample?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

What test to use for unmatched pre- and post-test of the same sample?

VATX123
I recently did an education presentation and administered a pre-test and post-test to the same group of participants. However, I did not label the tests so I have no way of matching the pre- and post-test scores to any particular participant. I am now concerned since I cannot use the paired t-test. What test can I use to analyze my data? Any help is greatly appreciated.
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

Art Kendall
I suggest that you place a caveat in your write-up, and do an ordinary t-test as if you had 2 independent groups.

Take a look at the formula for a t-test for independent group and the formula for repeated (paired) measures.

The numerators are the same.

The error term (denominator) has an extra part that shrinks the resulting error term to take into account that the measures are repeated. If it is plausible that the pre-test and post-test are positively correlated, then omitting the term for the correlation will make the t-test less powerful.

Take a look at the boxplots for the two "groups".
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

Mike
The original post has not made it to the SPSS mailing list but Art's
response did.  I looked at Nabble and read the original post there.
Let me make a few points:

(1) It's a good idea to tell us how many people's response you
have in the "pre" condition as well as the "post" condition.
If this were a real one-way two level within-subject design,
you would have  N-pre = N-post.  But I have the feeling that
this may not be true in your situation (the fact that you cannot
match pre to post responses is the basis for this).

(2) As Art says below, do an independent groups 2 sample
t-test.  You obtained t-value comes from the following formula:

t = (M-pre - M-post)/sqrt(VarErr-pre + VarErr-post)

where
M-Pre is the Mean of the pre responses
M-Post is the Mean of the post responses
VarErr-pre is the Variance Error of the pre responses
VarErr-post is the Variance Error of the post responses
The version of the t-test above is also called the separate
variance version because it does not pool the Sum of
Squares from the two groups and can be used with
either homogeneous variances (and df = N-total - 2)
or heterogeneous variances (where either the Welch
or Brown-Forsythe correction to the df is used to
correct for the heterogeneous variances.

If this t-test is statistically significant, you can breathe a
sigh of relief.

(3) If the independent groups t-test is not significant, it
could be significant by correlated groups t-tests (I can't
use the term "repeated measures t-test" because this
limits it to within-subjects designs and not to between-subjects
designs where subjects in one group is matched to a
subject in the other group -- this induces a correlation
between the responses of the matched pairs).
The denominator of the correlated groups t-test is

sqrt(VarErr-pre + VarErr-post - 2*r*StdErr-pre*StdErr-post)
Where
VarErr-pre and VarErr-post are as defined before
2 is just the number two.
r is the Pearson r between the matched pairs in the two groups
StdErr-pre is the Standard Error for the pre responses
StdErr-post is the Standard Error for the post responses

The key value in the denominator is the value of the
Pearson r:  in the independents groups situation, it is
assumed that r = 0 and the whole subtraction term
disappears while in the correlated groups situation,
r is not equal to zero, typically greater than zero (but
occasionally negative) which means that the sum of
the variance errors is reduced by the subtraction term.
If r is very large (say r = .80), then even a small
difference between the pre and post means would be
statistically significant.

(4) An alternative analysis which might be more appropriate
but I don't know if it can be done in SPSS (perhaps one of
the SPSS wizards can answer) is to run a simulation where
values in the pre group are randomly paired with values in
the post group and then calculate the Pearson r for this
pairing.  Do this, say, about 10,000 times and then examine
the distribution of the Pearson r's.  If there really is a
significant positive correlation between pre and post, then
you should have a distribution whose mean and median
will give you an estimate of the Pearson r you might have
gotten in your sample (it will be an estimate of the population
rho).  It is likely that that the mean and median will not be
the same because I expect the distribution of Pearson r's
will be skewed but, in any case, you can use these values
in the separate variance t-test formula above to see if
the difference between the pre and post means becomes
significant.  To SPSS Gurus:  would this fall under bootstrapping?

(5) If N-pre does not equal N-post, then you have a missing
values situation.  This would complicate what I suggest in (4)
but there may be ways to get around this (I'm not sure imputation
would be a good idea but this would require additional information
about the people who provided the data).

So, to the rest of the list, whaddyathink?

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Art Kendall" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, September 16, 2015 1:15 PM
Subject: Re: What test to use for unmatched pre- and post-test of the
same sample?


>I suggest that you place a caveat in your write-up, and do an ordinary
>t-test
> as if you had 2 independent groups.
>
> Take a look at the formula for a t-test for independent group and the
> formula for repeated (paired) measures.
>
> The numerators are the same.
>
> The error term (denominator) has an extra part that shrinks the
> resulting
> error term to take into account that the measures are repeated. If it
> is
> plausible that the pre-test and post-test are positively correlated,
> then
> omitting the term for the correlation will make the t-test less
> powerful.
>
> Take a look at the boxplots for the two "groups".

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

Bruce Weaver
Administrator
In reply to this post by Art Kendall
In the Nabble archive, the post Art is responding to below is flagged as "not yet delivered to the mailing list".  So folks who are not using Nabble may not have seen it.  If you are interested, the original post can be viewed here:

http://spssx-discussion.1045642.n5.nabble.com/What-test-to-use-for-unmatched-pre-and-post-test-of-the-same-sample-td5730581.html


Art Kendall wrote
I suggest that you place a caveat in your write-up, and do an ordinary t-test as if you had 2 independent groups.

Take a look at the formula for a t-test for independent group and the formula for repeated (paired) measures.

The numerators are the same.

The error term (denominator) has an extra part that shrinks the resulting error term to take into account that the measures are repeated. If it is plausible that the pre-test and post-test are positively correlated, then omitting the term for the correlation will make the t-test less powerful.

Take a look at the boxplots for the two "groups".
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

Thomas MacFarland
In reply to this post by Mike
Everyone:

It may be helpful to also consider use of the Kolmogorov-Smirnov (K-S) Two-Sample Test.

We have limited knowledge of the methods, which may not be ideal, but a nonparametric approach should at least be considered.

Best wishes.

Tom MacFarland

----------
Thomas W. MacFarland, Ed.D.
Senior Research Associate; Institutional Effectiveness and Associate Professor
Nova Southeastern University
Voice 954-262-5395 [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Palij
Sent: Wednesday, September 16, 2015 2:26 PM
To: [hidden email]
Subject: Re: What test to use for unmatched pre- and post-test of the same sample?

The original post has not made it to the SPSS mailing list but Art's response did.  I looked at Nabble and read the original post there.
Let me make a few points:

(1) It's a good idea to tell us how many people's response you have in the "pre" condition as well as the "post" condition.
If this were a real one-way two level within-subject design, you would have  N-pre = N-post.  But I have the feeling that this may not be true in your situation (the fact that you cannot match pre to post responses is the basis for this).

(2) As Art says below, do an independent groups 2 sample t-test.  You obtained t-value comes from the following formula:

t = (M-pre - M-post)/sqrt(VarErr-pre + VarErr-post)

where
M-Pre is the Mean of the pre responses
M-Post is the Mean of the post responses VarErr-pre is the Variance Error of the pre responses VarErr-post is the Variance Error of the post responses The version of the t-test above is also called the separate variance version because it does not pool the Sum of Squares from the two groups and can be used with either homogeneous variances (and df = N-total - 2) or heterogeneous variances (where either the Welch or Brown-Forsythe correction to the df is used to correct for the heterogeneous variances.

If this t-test is statistically significant, you can breathe a sigh of relief.

(3) If the independent groups t-test is not significant, it could be significant by correlated groups t-tests (I can't use the term "repeated measures t-test" because this limits it to within-subjects designs and not to between-subjects designs where subjects in one group is matched to a subject in the other group -- this induces a correlation between the responses of the matched pairs).
The denominator of the correlated groups t-test is

sqrt(VarErr-pre + VarErr-post - 2*r*StdErr-pre*StdErr-post) Where VarErr-pre and VarErr-post are as defined before
2 is just the number two.
r is the Pearson r between the matched pairs in the two groups StdErr-pre is the Standard Error for the pre responses StdErr-post is the Standard Error for the post responses

The key value in the denominator is the value of the Pearson r:  in the independents groups situation, it is assumed that r = 0 and the whole subtraction term disappears while in the correlated groups situation, r is not equal to zero, typically greater than zero (but occasionally negative) which means that the sum of the variance errors is reduced by the subtraction term.
If r is very large (say r = .80), then even a small difference between the pre and post means would be statistically significant.

(4) An alternative analysis which might be more appropriate but I don't know if it can be done in SPSS (perhaps one of the SPSS wizards can answer) is to run a simulation where values in the pre group are randomly paired with values in the post group and then calculate the Pearson r for this pairing.  Do this, say, about 10,000 times and then examine the distribution of the Pearson r's.  If there really is a significant positive correlation between pre and post, then you should have a distribution whose mean and median will give you an estimate of the Pearson r you might have gotten in your sample (it will be an estimate of the population rho).  It is likely that that the mean and median will not be the same because I expect the distribution of Pearson r's will be skewed but, in any case, you can use these values in the separate variance t-test formula above to see if the difference between the pre and post means becomes significant.  To SPSS Gurus:  would this fall under bootstrapping?

(5) If N-pre does not equal N-post, then you have a missing values situation.  This would complicate what I suggest in (4) but there may be ways to get around this (I'm not sure imputation would be a good idea but this would require additional information about the people who provided the data).

So, to the rest of the list, whaddyathink?

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Art Kendall" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, September 16, 2015 1:15 PM
Subject: Re: What test to use for unmatched pre- and post-test of the same sample?


>I suggest that you place a caveat in your write-up, and do an ordinary
>t-test  as if you had 2 independent groups.
>
> Take a look at the formula for a t-test for independent group and the
> formula for repeated (paired) measures.
>
> The numerators are the same.
>
> The error term (denominator) has an extra part that shrinks the
> resulting error term to take into account that the measures are
> repeated. If it is plausible that the pre-test and post-test are
> positively correlated, then omitting the term for the correlation will
> make the t-test less powerful.
>
> Take a look at the boxplots for the two "groups".

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

Bruce Weaver
Administrator
In reply to this post by Mike
Mike's first point below made me think about the possibility of selective attrition.  E.g., suppose that people who scored high at time 1 are less likely to stick around for time 2.  If there is the usual positive correlation between the paired scores, that means the time 2 scores will be lower than they ought to be (if everyone was contributing data at both time points).  

So if N_post < N_pre, the OP is in a real pickle, I think.  

For those who like demos, here is a simple demo of what happens to the time 2 mean when Ss scoring high at time 1 are excluded (when there is a positive correlation between times 1 and 2).


NEW FILE.
DATASET CLOSE all.

INPUT PROGRAM.
LOOP ID = 1 to 25.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

SET RNG=MT.
SET MTINDEX=20150916.

COMPUTE Y1 = RV.NORMAL(50,10).
COMPUTE Y2 = 5 + .8*Y1 + RV.NORMAL(0,2).
GRAPH SCATTERPLOT Y1 with Y2.
DESCRIPTIVES Y1 Y2.

* Suppose the folks with the highest scores at time 1 dropped out.
* This would lower the mean at time 2.

COMPUTE HighY1 = Y1 GT 60.
FORMATS HighY1(F1).
MEANS Y1 Y2 by HighY1.

* All Ss (N = 25)
* Y1.mean = 49.6336
* Y2.Mean = 44.5128
* Exclude Ss with Y1 > 60.
* Y2.Mean = 42.5526
* Y1.Mean - Y2.Mean with complete data: 5.12
* Again, excluding high Y1 Ss at time 2:   7.08 -- LARGER THAN IT SHOULD BE!
.



Mike wrote
The original post has not made it to the SPSS mailing list but Art's
response did.  I looked at Nabble and read the original post there.
Let me make a few points:

(1) It's a good idea to tell us how many people's response you
have in the "pre" condition as well as the "post" condition.
If this were a real one-way two level within-subject design,
you would have  N-pre = N-post.  But I have the feeling that
this may not be true in your situation (the fact that you cannot
match pre to post responses is the basis for this).

(2) As Art says below, do an independent groups 2 sample
t-test.  You obtained t-value comes from the following formula:

t = (M-pre - M-post)/sqrt(VarErr-pre + VarErr-post)

where
M-Pre is the Mean of the pre responses
M-Post is the Mean of the post responses
VarErr-pre is the Variance Error of the pre responses
VarErr-post is the Variance Error of the post responses
The version of the t-test above is also called the separate
variance version because it does not pool the Sum of
Squares from the two groups and can be used with
either homogeneous variances (and df = N-total - 2)
or heterogeneous variances (where either the Welch
or Brown-Forsythe correction to the df is used to
correct for the heterogeneous variances.

If this t-test is statistically significant, you can breathe a
sigh of relief.

(3) If the independent groups t-test is not significant, it
could be significant by correlated groups t-tests (I can't
use the term "repeated measures t-test" because this
limits it to within-subjects designs and not to between-subjects
designs where subjects in one group is matched to a
subject in the other group -- this induces a correlation
between the responses of the matched pairs).
The denominator of the correlated groups t-test is

sqrt(VarErr-pre + VarErr-post - 2*r*StdErr-pre*StdErr-post)
Where
VarErr-pre and VarErr-post are as defined before
2 is just the number two.
r is the Pearson r between the matched pairs in the two groups
StdErr-pre is the Standard Error for the pre responses
StdErr-post is the Standard Error for the post responses

The key value in the denominator is the value of the
Pearson r:  in the independents groups situation, it is
assumed that r = 0 and the whole subtraction term
disappears while in the correlated groups situation,
r is not equal to zero, typically greater than zero (but
occasionally negative) which means that the sum of
the variance errors is reduced by the subtraction term.
If r is very large (say r = .80), then even a small
difference between the pre and post means would be
statistically significant.

(4) An alternative analysis which might be more appropriate
but I don't know if it can be done in SPSS (perhaps one of
the SPSS wizards can answer) is to run a simulation where
values in the pre group are randomly paired with values in
the post group and then calculate the Pearson r for this
pairing.  Do this, say, about 10,000 times and then examine
the distribution of the Pearson r's.  If there really is a
significant positive correlation between pre and post, then
you should have a distribution whose mean and median
will give you an estimate of the Pearson r you might have
gotten in your sample (it will be an estimate of the population
rho).  It is likely that that the mean and median will not be
the same because I expect the distribution of Pearson r's
will be skewed but, in any case, you can use these values
in the separate variance t-test formula above to see if
the difference between the pre and post means becomes
significant.  To SPSS Gurus:  would this fall under bootstrapping?

(5) If N-pre does not equal N-post, then you have a missing
values situation.  This would complicate what I suggest in (4)
but there may be ways to get around this (I'm not sure imputation
would be a good idea but this would require additional information
about the people who provided the data).

So, to the rest of the list, whaddyathink?

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Art Kendall" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, September 16, 2015 1:15 PM
Subject: Re: What test to use for unmatched pre- and post-test of the
same sample?


>I suggest that you place a caveat in your write-up, and do an ordinary
>t-test
> as if you had 2 independent groups.
>
> Take a look at the formula for a t-test for independent group and the
> formula for repeated (paired) measures.
>
> The numerators are the same.
>
> The error term (denominator) has an extra part that shrinks the
> resulting
> error term to take into account that the measures are repeated. If it
> is
> plausible that the pre-test and post-test are positively correlated,
> then
> omitting the term for the correlation will make the t-test less
> powerful.
>
> Take a look at the boxplots for the two "groups".

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

David Marso
Administrator
In reply to this post by Mike
(4) could be very easily programmed in the MATRIX language using T(X)*Y as operation after construction of the vectors.  Note that since we know Sum(X), Sum(Y) and Sum(X^2), Sum(Y^2) all one needs to do is build the random pairings and calculate Sum(XY) then plug these values in to the Pearson R using the already known terms.  Not programming it right this minute but it would really be quite trivial.

Mike wrote
The original post has not made it to the SPSS mailing list but Art's
response did.  I looked at Nabble and read the original post there.
Let me make a few points:

(1) It's a good idea to tell us how many people's response you
have in the "pre" condition as well as the "post" condition.
If this were a real one-way two level within-subject design,
you would have  N-pre = N-post.  But I have the feeling that
this may not be true in your situation (the fact that you cannot
match pre to post responses is the basis for this).

(2) As Art says below, do an independent groups 2 sample
t-test.  You obtained t-value comes from the following formula:

t = (M-pre - M-post)/sqrt(VarErr-pre + VarErr-post)

where
M-Pre is the Mean of the pre responses
M-Post is the Mean of the post responses
VarErr-pre is the Variance Error of the pre responses
VarErr-post is the Variance Error of the post responses
The version of the t-test above is also called the separate
variance version because it does not pool the Sum of
Squares from the two groups and can be used with
either homogeneous variances (and df = N-total - 2)
or heterogeneous variances (where either the Welch
or Brown-Forsythe correction to the df is used to
correct for the heterogeneous variances.

If this t-test is statistically significant, you can breathe a
sigh of relief.

(3) If the independent groups t-test is not significant, it
could be significant by correlated groups t-tests (I can't
use the term "repeated measures t-test" because this
limits it to within-subjects designs and not to between-subjects
designs where subjects in one group is matched to a
subject in the other group -- this induces a correlation
between the responses of the matched pairs).
The denominator of the correlated groups t-test is

sqrt(VarErr-pre + VarErr-post - 2*r*StdErr-pre*StdErr-post)
Where
VarErr-pre and VarErr-post are as defined before
2 is just the number two.
r is the Pearson r between the matched pairs in the two groups
StdErr-pre is the Standard Error for the pre responses
StdErr-post is the Standard Error for the post responses

The key value in the denominator is the value of the
Pearson r:  in the independents groups situation, it is
assumed that r = 0 and the whole subtraction term
disappears while in the correlated groups situation,
r is not equal to zero, typically greater than zero (but
occasionally negative) which means that the sum of
the variance errors is reduced by the subtraction term.
If r is very large (say r = .80), then even a small
difference between the pre and post means would be
statistically significant.

(4) An alternative analysis which might be more appropriate
but I don't know if it can be done in SPSS (perhaps one of
the SPSS wizards can answer) is to run a simulation where
values in the pre group are randomly paired with values in
the post group and then calculate the Pearson r for this
pairing.  Do this, say, about 10,000 times and then examine
the distribution of the Pearson r's.  If there really is a
significant positive correlation between pre and post, then
you should have a distribution whose mean and median
will give you an estimate of the Pearson r you might have
gotten in your sample (it will be an estimate of the population
rho).  It is likely that that the mean and median will not be
the same because I expect the distribution of Pearson r's
will be skewed but, in any case, you can use these values
in the separate variance t-test formula above to see if
the difference between the pre and post means becomes
significant.  To SPSS Gurus:  would this fall under bootstrapping?

(5) If N-pre does not equal N-post, then you have a missing
values situation.  This would complicate what I suggest in (4)
but there may be ways to get around this (I'm not sure imputation
would be a good idea but this would require additional information
about the people who provided the data).

So, to the rest of the list, whaddyathink?

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Art Kendall" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, September 16, 2015 1:15 PM
Subject: Re: What test to use for unmatched pre- and post-test of the
same sample?


>I suggest that you place a caveat in your write-up, and do an ordinary
>t-test
> as if you had 2 independent groups.
>
> Take a look at the formula for a t-test for independent group and the
> formula for repeated (paired) measures.
>
> The numerators are the same.
>
> The error term (denominator) has an extra part that shrinks the
> resulting
> error term to take into account that the measures are repeated. If it
> is
> plausible that the pre-test and post-test are positively correlated,
> then
> omitting the term for the correlation will make the t-test less
> powerful.
>
> Take a look at the boxplots for the two "groups".

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

David Marso
Administrator
Here's a start, unfinished, untested. Homework for OP?
MATRIX.
GET x / FILE * / VARIABLES x.
GET y / FILE * / VARIABLES y.
COMPUTE Sxy=SSCP({MSUM(X),MSUM(Y)}).
COMPUTE SSxy=SSCP({x,y}).
COMPUTE randY=MAKE(n,1,0).
COMPUTE xy=MAKE(10000,1).
COMPUTE Tx=T(x).
LOOP #=1 TO 10000.
COMPUTE ord=GRADE(UNIFORM(n,1)).
LOOP ##=1 TO n.
COMPUTE randy(##)=y(ord(##)).
END LOOP.
COMPUTE xy(#) = Tx * randy.
END LOOP.
END MATRIX.



David Marso wrote
(4) could be very easily programmed in the MATRIX language using T(X)*Y as operation after construction of the vectors.  Note that since we know Sum(X), Sum(Y) and Sum(X^2), Sum(Y^2) all one needs to do is build the random pairings and calculate Sum(XY) then plug these values in to the Pearson R using the already known terms.  Not programming it right this minute but it would really be quite trivial.

Mike wrote
The original post has not made it to the SPSS mailing list but Art's
response did.  I looked at Nabble and read the original post there.
Let me make a few points:

(1) It's a good idea to tell us how many people's response you
have in the "pre" condition as well as the "post" condition.
If this were a real one-way two level within-subject design,
you would have  N-pre = N-post.  But I have the feeling that
this may not be true in your situation (the fact that you cannot
match pre to post responses is the basis for this).

(2) As Art says below, do an independent groups 2 sample
t-test.  You obtained t-value comes from the following formula:

t = (M-pre - M-post)/sqrt(VarErr-pre + VarErr-post)

where
M-Pre is the Mean of the pre responses
M-Post is the Mean of the post responses
VarErr-pre is the Variance Error of the pre responses
VarErr-post is the Variance Error of the post responses
The version of the t-test above is also called the separate
variance version because it does not pool the Sum of
Squares from the two groups and can be used with
either homogeneous variances (and df = N-total - 2)
or heterogeneous variances (where either the Welch
or Brown-Forsythe correction to the df is used to
correct for the heterogeneous variances.

If this t-test is statistically significant, you can breathe a
sigh of relief.

(3) If the independent groups t-test is not significant, it
could be significant by correlated groups t-tests (I can't
use the term "repeated measures t-test" because this
limits it to within-subjects designs and not to between-subjects
designs where subjects in one group is matched to a
subject in the other group -- this induces a correlation
between the responses of the matched pairs).
The denominator of the correlated groups t-test is

sqrt(VarErr-pre + VarErr-post - 2*r*StdErr-pre*StdErr-post)
Where
VarErr-pre and VarErr-post are as defined before
2 is just the number two.
r is the Pearson r between the matched pairs in the two groups
StdErr-pre is the Standard Error for the pre responses
StdErr-post is the Standard Error for the post responses

The key value in the denominator is the value of the
Pearson r:  in the independents groups situation, it is
assumed that r = 0 and the whole subtraction term
disappears while in the correlated groups situation,
r is not equal to zero, typically greater than zero (but
occasionally negative) which means that the sum of
the variance errors is reduced by the subtraction term.
If r is very large (say r = .80), then even a small
difference between the pre and post means would be
statistically significant.

(4) An alternative analysis which might be more appropriate
but I don't know if it can be done in SPSS (perhaps one of
the SPSS wizards can answer) is to run a simulation where
values in the pre group are randomly paired with values in
the post group and then calculate the Pearson r for this
pairing.  Do this, say, about 10,000 times and then examine
the distribution of the Pearson r's.  If there really is a
significant positive correlation between pre and post, then
you should have a distribution whose mean and median
will give you an estimate of the Pearson r you might have
gotten in your sample (it will be an estimate of the population
rho).  It is likely that that the mean and median will not be
the same because I expect the distribution of Pearson r's
will be skewed but, in any case, you can use these values
in the separate variance t-test formula above to see if
the difference between the pre and post means becomes
significant.  To SPSS Gurus:  would this fall under bootstrapping?

(5) If N-pre does not equal N-post, then you have a missing
values situation.  This would complicate what I suggest in (4)
but there may be ways to get around this (I'm not sure imputation
would be a good idea but this would require additional information
about the people who provided the data).

So, to the rest of the list, whaddyathink?

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Art Kendall" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, September 16, 2015 1:15 PM
Subject: Re: What test to use for unmatched pre- and post-test of the
same sample?


>I suggest that you place a caveat in your write-up, and do an ordinary
>t-test
> as if you had 2 independent groups.
>
> Take a look at the formula for a t-test for independent group and the
> formula for repeated (paired) measures.
>
> The numerators are the same.
>
> The error term (denominator) has an extra part that shrinks the
> resulting
> error term to take into account that the measures are repeated. If it
> is
> plausible that the pre-test and post-test are positively correlated,
> then
> omitting the term for the correlation will make the t-test less
> powerful.
>
> Take a look at the boxplots for the two "groups".

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: What test to use for unmatched pre- and post-test of the same sample?

Mike
Thanks David for starting this.  But it would be nice if the OP
would speak up and provide more information about the data.
I would hate to have the OP post in a day or two and saying
thanks for the suggestions but I decided to do something
else.  Or not respond anymore at all.

-Mike Palij
New York University
[hidden email]



----- Original Message -----
From: "David Marso" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, September 16, 2015 7:56 PM
Subject: Re: What test to use for unmatched pre- and post-test of the
same sample?


> Here's a start, unfinished, untested. Homework for OP?
> MATRIX.
> GET x / FILE * / VARIABLES x.
> GET y / FILE * / VARIABLES y.
> COMPUTE Sxy=SSCP({MSUM(X),MSUM(Y)}).
> COMPUTE SSxy=SSCP({x,y}).
> COMPUTE randY=MAKE(n,1,0).
> COMPUTE xy=MAKE(10000,1).
> COMPUTE Tx=T(x).
> LOOP #=1 TO 10000.
> COMPUTE ord=GRADE(UNIFORM(n,1)).
> LOOP ##=1 TO n.
> COMPUTE randy(##)=y(ord(##)).
> END LOOP.
> COMPUTE xy(#) = Tx * randy.
> END LOOP.
> END MATRIX.
>
>
>
>
> David Marso wrote
>> (4) could be very easily programmed in the MATRIX language using
>> T(X)*Y as
>> operation after construction of the vectors.  Note that since we know
>> Sum(X), Sum(Y) and Sum(X^2), Sum(Y^2) all one needs to do is build
>> the
>> random pairings and calculate Sum(XY) then plug these values in to
>> the
>> Pearson R using the already known terms.  Not programming it right
>> this
>> minute but it would really be quite trivial.
>> Mike wrote
>>> The original post has not made it to the SPSS mailing list but Art's
>>> response did.  I looked at Nabble and read the original post there.
>>> Let me make a few points:
>>>
>>> (1) It's a good idea to tell us how many people's response you
>>> have in the "pre" condition as well as the "post" condition.
>>> If this were a real one-way two level within-subject design,
>>> you would have  N-pre = N-post.  But I have the feeling that
>>> this may not be true in your situation (the fact that you cannot
>>> match pre to post responses is the basis for this).
>>>
>>> (2) As Art says below, do an independent groups 2 sample
>>> t-test.  You obtained t-value comes from the following formula:
>>>
>>> t = (M-pre - M-post)/sqrt(VarErr-pre + VarErr-post)
>>>
>>> where
>>> M-Pre is the Mean of the pre responses
>>> M-Post is the Mean of the post responses
>>> VarErr-pre is the Variance Error of the pre responses
>>> VarErr-post is the Variance Error of the post responses
>>> The version of the t-test above is also called the separate
>>> variance version because it does not pool the Sum of
>>> Squares from the two groups and can be used with
>>> either homogeneous variances (and df = N-total - 2)
>>> or heterogeneous variances (where either the Welch
>>> or Brown-Forsythe correction to the df is used to
>>> correct for the heterogeneous variances.
>>>
>>> If this t-test is statistically significant, you can breathe a
>>> sigh of relief.
>>>
>>> (3) If the independent groups t-test is not significant, it
>>> could be significant by correlated groups t-tests (I can't
>>> use the term "repeated measures t-test" because this
>>> limits it to within-subjects designs and not to between-subjects
>>> designs where subjects in one group is matched to a
>>> subject in the other group -- this induces a correlation
>>> between the responses of the matched pairs).
>>> The denominator of the correlated groups t-test is
>>>
>>> sqrt(VarErr-pre + VarErr-post - 2*r*StdErr-pre*StdErr-post)
>>> Where
>>> VarErr-pre and VarErr-post are as defined before
>>> 2 is just the number two.
>>> r is the Pearson r between the matched pairs in the two groups
>>> StdErr-pre is the Standard Error for the pre responses
>>> StdErr-post is the Standard Error for the post responses
>>>
>>> The key value in the denominator is the value of the
>>> Pearson r:  in the independents groups situation, it is
>>> assumed that r = 0 and the whole subtraction term
>>> disappears while in the correlated groups situation,
>>> r is not equal to zero, typically greater than zero (but
>>> occasionally negative) which means that the sum of
>>> the variance errors is reduced by the subtraction term.
>>> If r is very large (say r = .80), then even a small
>>> difference between the pre and post means would be
>>> statistically significant.
>>>
>>> (4) An alternative analysis which might be more appropriate
>>> but I don't know if it can be done in SPSS (perhaps one of
>>> the SPSS wizards can answer) is to run a simulation where
>>> values in the pre group are randomly paired with values in
>>> the post group and then calculate the Pearson r for this
>>> pairing.  Do this, say, about 10,000 times and then examine
>>> the distribution of the Pearson r's.  If there really is a
>>> significant positive correlation between pre and post, then
>>> you should have a distribution whose mean and median
>>> will give you an estimate of the Pearson r you might have
>>> gotten in your sample (it will be an estimate of the population
>>> rho).  It is likely that that the mean and median will not be
>>> the same because I expect the distribution of Pearson r's
>>> will be skewed but, in any case, you can use these values
>>> in the separate variance t-test formula above to see if
>>> the difference between the pre and post means becomes
>>> significant.  To SPSS Gurus:  would this fall under bootstrapping?
>>>
>>> (5) If N-pre does not equal N-post, then you have a missing
>>> values situation.  This would complicate what I suggest in (4)
>>> but there may be ways to get around this (I'm not sure imputation
>>> would be a good idea but this would require additional information
>>> about the people who provided the data).
>>>
>>> So, to the rest of the list, whaddyathink?
>>>
>>> -Mike Palij
>>> New York University
>
>>> mp26@
>
>>>
>>> ----- Original Message -----
>>> From: "Art Kendall" &lt;
>
>>> Art@
>
>>> &gt;
>>> To: &lt;
>
>>> SPSSX-L@.UGA
>
>>> &gt;
>>> Sent: Wednesday, September 16, 2015 1:15 PM
>>> Subject: Re: What test to use for unmatched pre- and post-test of
>>> the
>>> same sample?
>>>
>>>
>>>>I suggest that you place a caveat in your write-up, and do an
>>>>ordinary
>>>>t-test
>>>> as if you had 2 independent groups.
>>>>
>>>> Take a look at the formula for a t-test for independent group and
>>>> the
>>>> formula for repeated (paired) measures.
>>>>
>>>> The numerators are the same.
>>>>
>>>> The error term (denominator) has an extra part that shrinks the
>>>> resulting
>>>> error term to take into account that the measures are repeated. If
>>>> it
>>>> is
>>>> plausible that the pre-test and post-test are positively
>>>> correlated,
>>>> then
>>>> omitting the term for the correlation will make the t-test less
>>>> powerful.
>>>>
>>>> Take a look at the boxplots for the two "groups".
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>
>>> LISTSERV@.UGA
>
>>>  (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
> porcos ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum
> cliff in abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/What-test-to-use-for-unmatched-pre-and-post-test-of-the-same-sample-tp5730581p5730618.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD