|
A peer has a certain data set and we're not sure if it can be analyzed as repeated measures. The observations were collected from subjects at different times, but time is not the factor of interest. When the subject reached a certain threshold value of the independent factor, then an observation was taken. It is a gambling game, so, for instance, when the subject lost $200, an observation was taken. When subject won $50, another observation was taken.
My friend wants to know if there is a correlation between the amount lost and the dependent variable. A simple Pearson's correlation was not significant, but these observations are not independent. Each subject's observations are correlated. So my initial thought was to do a Linear Mixed Model for repeated measures. But since time of each observation is not the factor of interest, would this apply?
There are no additional groups or factors they are considering at this time...only if there is an association between amount lost or won and their dependent variable, which is continuous.
Is there a better method?
Thanks.
Karen R. Harker, MLS, MPH
Biostatistical Consultant
Adolescent Mood and Addictive Disorders Research Program UT Southwestern Medical Center
5323 Harry Hines Blvd. Dallas, TX 75390- 214-648-5391 Yahoo IM: karenharker |
|
Karen,
I may not be understanding you correctly, but it seems as if each person's data would consist of two variables, like this in a long format file. Id trial lost dv 1 1 200 3 1 2 -50 4 1 3 200 4 1 4 200 6 1 5 -50 2 1 6 -50 8 2 1 -50 6 2 2 -50 5 2 3 200 1 3 1 200 3 3 2 200 3 4 1 -50 4 If this is all true, it seems that amount lost is irrelevant in that it is a fixed amount, either 200 or -50. So the variable is really won or lost, i.e., 0 or 1. Call this variable wonlost I think I would do this using Mixed. I haven't done this for a while but I think the syntax would be Mixed dv with wonlost/fixed=wonlost/random=INTERCEPT wonlost | SUBJECT(id) COVTYPE(ID). So what you are really doing here is a multilevel regression where trials is nested within person. What you are interested in is the average value of the regression of dv on lost (and, possibly, whether there is variance in the regression coefficient). In running this start out with wonlost not being a random effect and then add it and see what happens by looking at the likelihood change. You also need numbers for this analysis. Generally, 20+ trials per person and maybe 20+ persons. It probably will run with fewer. By the way, it might be a useful first step to look at the within person correlation between wonlost and dv. That will give you an idea about between person variability. Gene Maguin >>>A peer has a certain data set and we're not sure if it can be analyzed as repeated measures. The observations were collected from subjects at different times, but time is not the factor of interest. When the subject reached a certain threshold value of the independent factor, then an observation was taken. It is a gambling game, so, for instance, when the subject lost $200, an observation was taken. When subject won $50, another observation was taken. My friend wants to know if there is a correlation between the amount lost and the dependent variable. A simple Pearson's correlation was not significant, but these observations are not independent. Each subject's observations are correlated. So my initial thought was to do a Linear Mixed Model for repeated measures. But since time of each observation is not the factor of interest, would this apply? There are no additional groups or factors they are considering at this time...only if there is an association between amount lost or won and their dependent variable, which is continuous. Is there a better method? Thanks. Karen R. Harker, MLS, MPH Biostatistical Consultant Adolescent Mood and Addictive Disorders Research Program UT Southwestern Medical Center 5323 Harry Hines Blvd. Dallas, TX 75390- 214-648-5391 Yahoo IM: karenharker ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
This is very helpful...yes, I was considering making the recommendation of dichotomizing the independent variable into won or lost. I also thought that perhaps it could be curvilinear, like a U-shape (less response near 0, more response further from 0).
Thanks for your response, which I will forward to my co-worker.
Karen,
I may not be understanding you correctly, but it seems as if each person's data would consist of two variables, like this in a long format file. Id trial lost dv 1 1 200 3 1 2 -50 4 1 3 200 4 1 4 200 6 1 5 -50 2 1 6 -50 8 2 1 -50 6 2 2 -50 5 2 3 200 1 3 1 200 3 3 2 200 3 4 1 -50 4 If this is all true, it seems that amount lost is irrelevant in that it is a fixed amount, either 200 or -50. So the variable is really won or lost, i.e., 0 or 1. Call this variable wonlost I think I would do this using Mixed. I haven't done this for a while but I think the syntax would be Mixed dv with wonlost/fixed=wonlost/random=INTERCEPT wonlost | SUBJECT(id) COVTYPE(ID). So what you are really doing here is a multilevel regression where trials is nested within person. What you are interested in is the average value of the regression of dv on lost (and, possibly, whether there is variance in the regression coefficient). In running this start out with wonlost not being a random effect and then add it and see what happens by looking at the likelihood change. You also need numbers for this analysis. Generally, 20+ trials per person and maybe 20+ persons. It probably will run with fewer. By the way, it might be a useful first step to look at the within person correlation between wonlost and dv. That will give you an idea about between person variability. Gene Maguin >>>A peer has a certain data set and we're not sure if it can be analyzed as repeated measures. The observations were collected from subjects at different times, but time is not the factor of interest. When the subject reached a certain threshold value of the independent factor, then an observation was taken. It is a gambling game, so, for instance, when the subject lost $200, an observation was taken. When subject won $50, another observation was taken. My friend wants to know if there is a correlation between the amount lost and the dependent variable. A simple Pearson's correlation was not significant, but these observations are not independent. Each subject's observations are correlated. So my initial thought was to do a Linear Mixed Model for repeated measures. But since time of each observation is not the factor of interest, would this apply? There are no additional groups or factors they are considering at this time...only if there is an association between amount lost or won and their dependent variable, which is continuous. Is there a better method? Thanks. Karen R. Harker, MLS, MPH Biostatistical Consultant Adolescent Mood and Addictive Disorders Research Program UT Southwestern Medical Center 5323 Harry Hines Blvd. Dallas, TX 75390- 214-648-5391 Yahoo IM: karenharker ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Karen,
Might you expect that the correlation [within persons] between observation A and observation B to be the same as the correlation between observation A and observation C? You might want to test this assumption by replacing the RANDOM statement suggested by another poster with a REPEATED statement. Specifically, you could fit the linear mixed model assuming a compound symmetric (CS) var-cov matrix using this statement: REPEATED = TIME | SUBJECT(ID) COVTYPE (CS). and then you could replace "(CS)" with "(UN)" and refit the linear mixed model. "UN" stands for "unstrucutred," which relaxes the assumption that the correlations are the same. You could then conduct a statistical test to help determine which model better fits the data. The difference in -2LLs between models generally follows a Chi-Square distribution with degrees of freedom equal to the difference in number of estimated parameters. Note that the model with a CS specification is nested within the model with a UN specification. Another var-cov matrix you might consider is AR. When considering other var-cov matrices, it is important to keep in mind that the Chi-Square test is only appropriate when comparing nested models. HTH, Ryan
|
|
Well, that's what I wasn't sure about...whether these observations should be treated as repeated. While there is a set of observations from the same subject taken within a single session, for the purposes of this analysis, time is not of interest. In a scatterplot, the X-axis is the amount won/lost, not time or sequence, even though the observations within each subject will, of course, be correlated. |
| x
| x x
| o x x
| o o xox
| oo x o x
| o x xo x
| o o
|_________________________
-200 0 +200
x = subject 1
o = subject 2
What they want to know is if the overall observations are correlated with amounts won/lost. At first, she just take the means of the observation of at each dollar value and calculated a correlation coefficient r, which was not significant. So then it was suggested to use LMM. This is where I get confused. Should this data be treated like repeated observations, or should it be treated like it is hierarchical, with one level at the subject?
Thanks.
Karen,
Might you expect that the correlation [within persons] between observation A and observation B to be the same as the correlation between observation A and observation C? You might want to test this assumption by replacing the RANDOM statement suggested by another poster with a REPEATED statement. Specifically, you could fit the linear mixed model assuming a compound symmetric (CS) var-cov matrix using this statement: REPEATED = TIME | SUBJECT(ID) COVTYPE (CS). and then you could replace "(CS)" with "(UN)" and refit the linear mixed model. "UN" stands for "unstrucutred," which relaxes the assumption that the correlations are the same. You could then conduct a statistical test to help determine which model better fits the data. The difference in -2LLs between models generally follows a Chi-Square distribution with degrees of freedom equal to the difference in number of estimated parameters. Note that the model with a CS specification is nested within the model with a UN specification. Another var-cov matrix you might consider is AR. When considering other var-cov matrices, it is important to keep in mind that the Chi-Square test is only appropriate when comparing nested models. HTH, Ryan Karen Harker wrote: > > This is very helpful...yes, I was considering making the recommendation of > dichotomizing the independent variable into won or lost. I also thought > that perhaps it could be curvilinear, like a U-shape (less response near > 0, more response further from 0). > > Thanks for your response, which I will forward to my co-worker. > Karen > >>>> On 4/14/10 at 2:01 PM, in message >>>> <[hidden email]>, Gene Maguin >>>> <[hidden email]> wrote: > Karen, > > I may not be understanding you correctly, but it seems as if each person's > data would consist of two variables, like this in a long format file. > > Id trial lost dv > 1 1 200 3 > 1 2 -50 4 > 1 3 200 4 > 1 4 200 6 > 1 5 -50 2 > 1 6 -50 8 > 2 1 -50 6 > 2 2 -50 5 > 2 3 200 1 > 3 1 200 3 > 3 2 200 3 > 4 1 -50 4 > > If this is all true, it seems that amount lost is irrelevant in that it is > a > fixed amount, either 200 or -50. So the variable is really won or lost, > i.e., 0 or 1. Call this variable wonlost > I think I would do this using Mixed. I haven't done this for a while but I > think the syntax would be > > Mixed dv with wonlost/fixed=wonlost/random=INTERCEPT wonlost | SUBJECT(id) > COVTYPE(ID). > > So what you are really doing here is a multilevel regression where trials > is > nested within person. What you are interested in is the average value of > the > regression of dv on lost (and, possibly, whether there is variance in the > regression coefficient). In running this start out with wonlost not being > a > random effect and then add it and see what happens by looking at the > likelihood change. > > You also need numbers for this analysis. Generally, 20+ trials per person > and maybe 20+ persons. It probably will run with fewer. > > By the way, it might be a useful first step to look at the within person > correlation between wonlost and dv. That will give you an idea about > between > person variability. > > Gene Maguin > > >>>>A peer has a certain data set and we're not sure if it can be analyzed > repeated measures. The observations were collected from subjects at > different times, but time is not the factor of interest. When the subject > reached a certain threshold value of the independent factor, then an > observation was taken. It is a gambling game, so, for instance, when the > subject lost $200, an observation was taken. When subject won $50, > another > observation was taken. > > My friend wants to know if there is a correlation between the amount lost > and the dependent variable. A simple Pearson's correlation was not > significant, but these observations are not independent. Each subject's > observations are correlated. So my initial thought was to do a Linear > Mixed > Model for repeated measures. But since time of each observation is not > the > factor of interest, would this apply? > > There are no additional groups or factors they are considering at this > time...only if there is an association between amount lost or won and > their > dependent variable, which is continuous. > > Is there a better method? > > Thanks. > > > > Karen R. Harker, MLS, MPH > Biostatistical Consultant > Adolescent Mood and Addictive Disorders Research Program > UT Southwestern Medical Center > 5323 Harry Hines Blvd. > Dallas, TX 75390- > 214-648-5391 > Yahoo IM: karenharker > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > -- View this message in context: http://old.nabble.com/Repeated-measures--tp28246403p28262302.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hello,
I've only been scanningt his thread so can you please say again which two variables being correlated?
Also, I believe this is repeated measures. In social science we usually think of repeated measures in terms taking the same measure over time. But repeated measures can also be multiple measures on the same object at the same time (e.g., height & weight). So it might be more useful for you to think of your model in terms of a more traditional hierarchical design, e.g., children nested in a classroom. I might consider controlling for time as some time related effect might be present, such as fatigue or practice.
HTH,
John
From: Karen Harker <[hidden email]> To: [hidden email] Sent: Fri, April 16, 2010 10:11:27 AM Subject: Re: Repeated measures? Well, that's what I wasn't sure about...whether these observations should be treated as repeated. While there is a set of observations from the same subject taken within a single session, for the purposes of this analysis, time is not of interest. In a scatterplot, the X-axis is the amount won/lost, not time or sequence, even though the observations within each subject will, of course, be correlated. |
| x
| x x
| o x x
| o o xox
| oo x o x
| o x xo x
| o o
|_________________________
-200 0 +200
x = subject 1
o = subject 2
What they want to know is if the overall observations are correlated with amounts won/lost. At first, she just take the means of the observation of at each dollar value and calculated a correlation coefficient r, which was not significant. So then it was suggested to use LMM. This is where I get confused. Should this data be treated like repeated observations, or should it be treated like it is hierarchical, with one level at the subject?
Thanks.
Karen, Might you expect that the correlation [within persons] between observation A and observation B to be the same as the correlation between observation A and observation C? You might want to test this assumption by replacing the RANDOM statement suggested by another poster with a REPEATED statement. Specifically, you could fit the linear mixed model assuming a compound symmetric (CS) var-cov matrix using this statement: REPEATED = TIME | SUBJECT(ID) COVTYPE (CS). and then you could replace "(CS)" with "(UN)" and refit the linear mixed model. "UN" stands for "unstrucutred," which relaxes the assumption that the correlations are the same. You could then conduct a statistical test to help determine which model better fits the data. The difference in -2LLs between models generally follows a Chi-Square distribution with degrees of freedom equal to the difference in number of estimated parameters. Note that the model with a CS specification is nested within the model with a UN specification. Another var-cov matrix you might consider is AR. When considering other var-cov matrices, it is important to keep in mind that the Chi-Square test is only appropriate when comparing nested models. HTH, Ryan Karen Harker wrote: > > This is very helpful...yes, I was considering making the recommendation of > dichotomizing the independent variable into won or lost. I also thought > that perhaps it could be curvilinear, like a U-shape (less response near > 0, more response further from 0). > > Thanks for your response, which I will forward to my co-worker. > Karen > >>>> On 4/14/10 at 2:01 PM, in message >>>> <[hidden email]>, Gene Maguin >>>> <[hidden email]> wrote: > Karen, > > I may not be understanding you correctly, but it seems as if each person's > data would consist of two variables, like this in a long format file. > > Id trial lost dv > 1 1 200 3 > 1 2 -50 4 > 1 3 200 4 > 1 4 200 6 > 1 5 -50 2 > 1 6 -50 8 > 2 1 -50 6 > 2 2 -50 5 > 2 3 200 1 > 3 1 200 3 > 3 2 200 3 > 4 1 -50 4 > > If this is all true, it seems that amount lost is irrelevant in that it is > a > fixed amount, either 200 or -50. So the variable is really won or lost, > i.e., 0 or 1. Call this variable wonlost > I think I would do this using Mixed. I haven't done this for a while but I > think the syntax would be > > Mixed dv with wonlost/fixed=wonlost/random=INTERCEPT wonlost | SUBJECT(id) > COVTYPE(ID). > > So what you are really doing here is a multilevel regression where trials > is > nested within person. What you are interested in is the average value of > the > regression of dv on lost (and, possibly, whether there is variance in the > regression coefficient). In running this start out with wonlost not being > a > random effect and then add it and see what happens by looking at the > likelihood change. > > You also need numbers for this analysis. Generally, 20+ trials per person > and maybe 20+ persons. It probably will run with fewer. > > By the way, it might be a useful first step to look at the within person > correlation between wonlost and dv. That will give you an idea about > between > person variability. > > Gene Maguin > > >>>>A peer has a certain data set and we're not sure if it can be analyzed > repeated measures. The observations were collected from subjects at > different times, but time is not the factor of interest. When the subject > reached a certain threshold value of the independent factor, then an > observation was taken. It is a gambling game, so, for instance, when the > subject lost $200, an observation was taken. When subject won $50, > another > observation was taken. > > My friend wants to know if there is a correlation between the amount lost > and the dependent variable. A simple Pearson's correlation was not > significant, but these observations are not independent. Each subject's > observations are correlated. So my initial thought was to do a Linear > Mixed > Model for repeated measures. But since time of each observation is not > the > factor of interest, would this apply? > > There are no additional groups or factors they are considering at this > time...only if there is an association between amount lost or won and > their > dependent variable, which is continuous. > > Is there a better method? > > Thanks. > > > > Karen R. Harker, MLS, MPH > Biostatistical Consultant > Adolescent Mood and Addictive Disorders Research Program > UT Southwestern Medical Center > 5323 Harry Hines Blvd. > Dallas, TX 75390- > 214-648-5391 > Yahoo IM: karenharker > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > -- View this message in context: http://old.nabble.com/Repeated-measures--tp28246403p28262302.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Karen Harker
Karen,
Assuming your data set is structured as Gene suggested in a previous post, the following two linear mixed models should yield identical fixed effects results including the standard errors, p-values etc. What does this mean? Under special circumstances where you assume compound symmetry, it might not matter if you view it as a random intercept model or a repeated measures model. Now, if you really do think that the correlations between observations may not be the same, then you might consider (1) only including the REPEATED statement allowing for a more flexible var-cov matrix such (i.e. unstructured) or (2) including both statements, again allowing for an unstructured var-cov matrix for the REPEATED statement. You might also want to consider allowing for random slopes in your model. I'm also wondering if other variables should be considered (i.e. sequence), but I'll stop here for now. See code below. -Ryan MIXED y BY x /FIXED= x | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /REPEATED=time | SUBJECT(id) COVTYPE(CS). MIXED y BY x /FIXED= x | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /RANDOM=INTERCEPT | SUBJECT(id) COVTYPE(VC). [End of Post]
|
|
As we are already on the subject... A colleague has run a survey twice for one of our constituent groups; once in 2007 and again in 2010. In either year about 300 people took the survey. It appears that about half of the people who took the survey in 2007 also took it in 2010. The constituents would like to know if the item scores (4-point lickert: Strongly Agree to Strongly Disagree) have changed between years. They have indicated that they really don't want a measure of 'personal change', they simple want to know if the proportion of 'Strongly Agree', 'Agree', etc. shown for items in the 2007 sample differs from those of the 2010 sample. We have indicated that the situation is not so simple. I don't think it is appropriate to simply pool the samples into 2007/2010 groups and compare them as though they were independent samples. Then again, not all of the sample is paired. Ultimately, we just want to make some very simple summaries and comparisons. I really don't want to present two sets of reports; one using the paired sample and one using the independent samples. Does anyone have any ideas about a way to combine and/or weigh the data so that any possible effect of the pairing for some of the cases is mitigated? Do I even need to worry about the intercorrelation of the paired scores? My preference for this analysis is SPSS but I also have SAS at my disposal. Thanks, Mark |
|
Mark,
I agree that it is likely incorrect to assume that observations within the same person are not correlated (i.e. independent measures t test). Can you assume that the data are missing completely at random or at least missing at random? If yes, then you can account for the variation within a linear mixed model. There are issues around dealing with Likert-scale type items worth considering. Anyway, you need to set up your data set in long format: ID Year Y 1 2007 3 1 2010 4 2 2007 5 3 2007 3 3 2010 2 4 2007 3 4 2010 4 5 2010 1 6 2010 2 7 2007 5 . . . N Note in the data set above that some people have provided only one observation while others have provided two. Based on what you wrote, I do not think you have people who gave multiple responses in one year. I also do not think you have more than two years. If I'm wrong, then what I'm about to show you is not appropriate. You could code this up in MIXED as follows: MIXED y BY year /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /REPEATED=year | SUBJECT(id) COVTYPE(CS). and as I mentioned in a previous post, the following code would work equally as well: MIXED y BY year /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /RANDOM=INTERCEPT | SUBJECT(id) COVTYPE(VC). Both pieces of code are taking into account the covariation issue, and both should yield the same fixed effects results. Feel free to write back if you think I've missed something key to your design. Ryan
|
|
I hadn't thought of that. I think in many cases the non-paired responses are due to people simply not being here--hired since 2007, left after 2007. I should be able to find this out. In the remaining cases, I can only guess as to why they responded in one year but not the other. *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962)
Mark, I agree that it is likely incorrect to assume that observations within the same person are not correlated (i.e. independent measures t test). Can you assume that the data are missing completely at random or at least missing at random? If yes, then you can account for the variation within a linear mixed model. There are issues around dealing with Likert-scale type items worth considering. Anyway, you need to set up your data set in long format: ID Year Y 1 2007 3 1 2010 4 2 2007 5 3 2007 3 3 2010 2 4 2007 3 4 2010 4 5 2010 1 6 2010 2 7 2007 5 . . . N Note in the data set above that some people have provided only one observation while others have provided two. Based on what you wrote, I do not think you have people who gave multiple responses in one year. I also do not think you have more than two years. If I'm wrong, then what I'm about to show you is not appropriate. You could code this up in MIXED as follows: MIXED y BY year /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /REPEATED=year | SUBJECT(id) COVTYPE(CS). and as I mentioned in a previous post, the following code would work equally as well: MIXED y BY year /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /RANDOM=INTERCEPT | SUBJECT(id) COVTYPE(VC). Both pieces of code are taking into account the covariation issue, and both should yield the same fixed effects results. Feel free to write back if you think I've missed something key to your design. Ryan Mark A Davenport MADAVENP wrote: > > As we are already on the subject... > > A colleague has run a survey twice for one of our constituent groups; once > in 2007 and again in 2010. In either year about 300 people took the > survey. It appears that about half of the people who took the survey in > 2007 also took it in 2010. The constituents would like to know if the > item scores (4-point lickert: Strongly Agree to Strongly Disagree) have > changed between years. They have indicated that they really don't want a > measure of 'personal change', they simple want to know if the proportion > of 'Strongly Agree', 'Agree', etc. shown for items in the 2007 sample > differs from those of the 2010 sample. We have indicated that the > situation is not so simple. I don't think it is appropriate to simply > pool the samples into 2007/2010 groups and compare them as though they > were independent samples. Then again, not all of the sample is paired. > Ultimately, we just want to make some very simple summaries and > comparisons. I really don't want to present two sets of reports; one > using the paired sample and one using the independent samples. Does > anyone have any ideas about a way to combine and/or weigh the data so that > any possible effect of the pairing for some of the cases is mitigated? Do > I even need to worry about the intercorrelation of the paired scores? > > My preference for this analysis is SPSS but I also have SAS at my > disposal. > > Thanks, > > Mark > > > -- View this message in context: http://old.nabble.com/Repeated-measures--tp28246403p28271346.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Keep in mind also that if you are not doing statistical testing it's really a moot point since the non-independence issue only impacts the standard error & statistical test--the proportion or means are the same no matter how you model the data.
Personally, I run it as Ryan suggests. But if you are woried that the non-matching responses are biasing the sample you can compare the means of the respective sub-groups to see if they differ. E.g., for 2007 compare those who have a survey in 2010 to those who do not have a 2010 survey. Then do the same for the 2010 group.
best,
John
From: Mark A Davenport MADAVENP <[hidden email]> To: [hidden email] Sent: Fri, April 16, 2010 4:51:32 PM Subject: Re: Another repeated measures question I hadn't thought of that. I think in many cases the non-paired responses are due to people simply not being here--hired since 2007, left after 2007. I should be able to find this out. In the remaining cases, I can only guess as to why they responded in one year but not the other. *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962)
Mark, I agree that it is likely incorrect to assume that observations within the same person are not correlated (i.e. independent measures t test). Can you assume that the data are missing completely at random or at least missing at random? If yes, then you can account for the variation within a linear mixed model. There are issues around dealing with Likert-scale type items worth considering. Anyway, you need to set up your data set in long format: ID Year Y 1 2007 3 1 2010 4 2 2007 5 3 2007 3 3 2010 2 4 2007 3 4 2010 4 5 2010 1 6 2010 2 7 2007 5 . . . N Note in the data set above that some people have provided only one observation while others have provided two. Based on what you wrote, I do not think you have people who gave multiple responses in one year. I also do not think you have more than two years. If I'm wrong, then what I'm about to show you is not appropriate. You could code this up in MIXED as follows: MIXED y BY year /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /REPEATED=year | SUBJECT(id) COVTYPE(CS). and as I mentioned in a previous post, the following code would work equally as well: MIXED y BY year /FIXED=year | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /RANDOM=INTERCEPT | SUBJECT(id) COVTYPE(VC). Both pieces of code are taking into account the covariation issue, and both should yield the same fixed effects results. Feel free to write back if you think I've missed something key to your design. Ryan Mark A Davenport MADAVENP wrote: > > As we are already on the subject... > > A colleague has run a survey twice for one of our constituent groups; once > in 2007 and again in 2010. In either year about 300 people took the > survey. It appears that about half of the people who took the survey in > 2007 also took it in 2010. The constituents would like to know if the > item scores (4-point lickert: Strongly Agree to Strongly Disagree) have > changed between years. They have indicated that they really don't want a > measure of 'personal change', they simple want to know if the proportion > of 'Strongly Agree', 'Agree', etc. shown for items in the 2007 sample > differs from those of the 2010 sample. We have indicated that the > situation is not so simple. I don't think it is appropriate to simply > pool the samples into 2007/2010 groups and compare them as though they > were independent samples. Then again, not all of the sample is paired. > Ultimately, we just want to make some very simple summaries and > comparisons. I really don't want to present two sets of reports; one > using the paired sample and one using the independent samples. Does > anyone have any ideas about a way to combine and/or weigh the data so that > any possible effect of the pairing for some of the cases is mitigated? Do > I even need to worry about the intercorrelation of the paired scores? > > My preference for this analysis is SPSS but I also have SAS at my > disposal. > > Thanks, > > Mark > > > -- View this message in context: http://old.nabble.com/Repeated-measures--tp28246403p28271346.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
