Hi,
My data-set contains several cases for each subjects and two other vars: case subjext var1 var2 1 1 1 7 2 1 2 4 3 1 4 5 4 2 5 7 5 2 5 2 6 3 9 18 7 3 3 4 8 3 12 1 9 3 56 5 10 4 3 5 11 4 4 6 as you can see, the quantity of cases for each subject varies. I would like to calculate correlations for var1 with var2 while controlling for the changing quantity of cases for each subject, meaning neutralizing the effect one "heavy" subject may have on the overall correlation. what should i do? 10x!! Uri. |
Administrator
|
For a few hours now, Nabble has been showing that "This post has NOT been accepted by the mailing list yet." So I'm just giving it a bump.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I already demonstrated how to estimate a correlation in the presence of repeated measurements on "x" and "y" using a linear mixed modeling approach. It happened to match the Pearson correlation for a fully balanced design; whether or not that is true for an imbalanced designed I cannot say, but I am confident in the mixed modeling approach I provided.
This would be a nice exercise for the OP to complete and write back the findings to the list. I searched the archives and found the message I posted: https://listserv.uga.edu/cgi-bin/wa?A2=ind1406&L=SPSSX-L&P=R24300 Ryan Sent from my iPhone > On Feb 28, 2015, at 3:20 PM, Bruce Weaver <[hidden email]> wrote: > > For a few hours now, Nabble has been showing that "This post has NOT been > accepted by the mailing list yet." So I'm just giving it a bump. > > > uri1616 wrote >> Hi, >> >> My data-set contains several cases for each subjects and two other vars: >> >> case subjext var1 var2 >> 1 1 1 7 >> 2 1 2 4 >> 3 1 4 5 >> 4 2 5 7 >> 5 2 5 2 >> 6 3 9 18 >> 7 3 3 4 >> 8 3 12 1 >> 9 3 56 5 >> 10 4 3 5 >> 11 4 4 6 >> >> as you can see, the quantity of cases for each subject varies. >> >> I would like to calculate correlations for var1 with var2 while >> controlling for the changing quantity of cases for each subject, meaning >> neutralizing the effect one "heavy" subject may have on the overall >> correlation. >> >> what should i do? >> >> 10x!! >> Uri. > > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Controlled-correlation-tp5728844p5728854.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
I am going to re-state the problem, because I do not know which
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
"correlation" is the object of the question, Within or Between. I assume that "neutralizing the effect" of one 'heavy' subject is a reference to the number of data points -- and not to the scores. There are 2 to p cases for each Subject. From these, one can describe "within-subject correlation," which is available as an option from Discriminant function. From these data, one could also speak of "between- subject correlation," which usually goes by other names. What is usually sufficient is the F-test from the one-way ANOVA table. Also, (I have never wanted to use it, but ...) I think that "eta-squared" is a correlation measure that has been used for unbalanced tables, and it is computed from terms of the ANOVA table. The ANOVA terms can also yield an estimate of the intra-class correlation, which is another within Subject (or Class) measure, but the proper formula is not used much. Whether you look at the ANOVA table for Between or Within, or look at the within-subject r from the DF, the statistic that you automatically get is automatically going to weight most heavily the Subject with the most data. - An ad-hoc solution to this might be to use a Weighting variable. I *think* that this would be set proportional to 1/(p-1), the reciprocal of the degrees of freedom for each of the Subjects. (When using weights, be careful that the procedure you are using will accept fractions and use them properly.) -- Rich Ulrich > Date: Sat, 28 Feb 2015 13:20:21 -0700 > From: [hidden email] > Subject: Re: Controlled correlation > To: [hidden email] > > For a few hours now, Nabble has been showing that "This post has NOT been > accepted by the mailing list yet." So I'm just giving it a bump. > > > uri1616 wrote > > Hi, > > > > My data-set contains several cases for each subjects and two other vars: > > > > case subjext var1 var2 > > 1 1 1 7 > > 2 1 2 4 > > 3 1 4 5 > > 4 2 5 7 > > 5 2 5 2 > > 6 3 9 18 > > 7 3 3 4 > > 8 3 12 1 > > 9 3 56 5 > > 10 4 3 5 > > 11 4 4 6 > > > > as you can see, the quantity of cases for each subject varies. > > > > I would like to calculate correlations for var1 with var2 while > > controlling for the changing quantity of cases for each subject, meaning > > neutralizing the effect one "heavy" subject may have on the overall > > correlation. > > > > what should i do? > > > > 10x!! > > Uri. > |
To be clear, if the goal is to estimate the correlation between x and y given that there are dependent pairs and the number of pairs varies across subjects, the linear mixed model will answer that question. However, certainly subjects who provide more data will be weighted more heavily than those who provide fewer pairs in the estimation of the correlation between x and y. Ryan Sent from my iPhone ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
But more weight for subjects with more pairs of data points is what the OP is trying to avoid. They wrote: 'controlling for the changing quantity of cases for each subject, meaning neutralizing the effect one "heavy" subject may have on the overall correlation'.
That suggests to me something like computing the correlation within each subject, then computing a pooled estimate--e.g., using meta-analytic methods. In standard meta-analysis, one would weight by the inverse of the variance, so subjects with more data points would be weighted more heavily. But one could change the weight variable to 1, thus weighting each subject's estimate equally. I'm not sure why one would want to do this, but I think it might achieve what the OP seems to be after. (The formulae for standard meta-analysis can be found in plenty of sources, including this one: http://www.ncbi.nlm.nih.gov/pubmed/8261254.) HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Mixed models.
1. As the correlation among observations at level 1 increases, the level 1 observations will provide less unique information at level 2. 2. Level 2 units with more level 1 units will generally be weighted in the parameter estimation more heavily than level 2 units with fewer level 1 units. Why would the OP want to ignore these points? Yes there are ways around this (eg applying weights), of course, but why does the **OP** want to do so? Ryan Sent from my iPhone > On Mar 1, 2015, at 7:42 AM, Bruce Weaver <[hidden email]> wrote: > > But more weight for subjects with more pairs of data points is what the OP is > trying to avoid. They wrote: 'controlling for the changing quantity of > cases for each subject, meaning neutralizing the effect one "heavy" subject > may have on the overall correlation'. > > That suggests to me something like computing the correlation within each > subject, then computing a pooled estimate--e.g., using meta-analytic > methods. In standard meta-analysis, one would weight by the inverse of the > variance, so subjects with more data points would be weighted more heavily. > But one could change the weight variable to 1, thus weighting each subject's > estimate equally. I'm not sure why one would want to do this, but I think > it might achieve what the OP seems to be after. (The formulae for standard > meta-analysis can be found in plenty of sources, including this one: > http://www.ncbi.nlm.nih.gov/pubmed/8261254.) > > HTH. > > > > ryan.andrew.black wrote >> To be clear, if the goal is to estimate the correlation between x and y >> given that there are dependent pairs and the number of pairs varies across >> subjects, the linear mixed model will answer that question. However, >> certainly subjects who provide more data will be weighted more heavily >> than those who provide fewer pairs in the estimation of the correlation >> between x and y. >> >> Ryan >> >> Sent from my iPhone >> >>> On Feb 28, 2015, at 8:50 PM, Rich Ulrich < > >> rich-ulrich@ > >> > wrote: >>> >>> I am going to re-state the problem, because I do not know which >>> "correlation" is the object of the question, Within or Between. >>> I assume that "neutralizing the effect" of one 'heavy' subject is a >>> reference to the number of data points -- and not to the scores. >>> >>> There are 2 to p cases for each Subject. From these, one can describe >>> "within-subject correlation," which is available as an option from >>> Discriminant function. From these data, one could also speak of >>> "between- >>> subject correlation," which usually goes by other names. >>> >>> What is usually sufficient is the F-test from the one-way ANOVA table. >>> Also, (I have never wanted to use it, but ...) I think that "eta-squared" >>> is a >>> correlation measure that has been used for unbalanced tables, and it is >>> computed from terms of the ANOVA table. >>> >>> The ANOVA terms can also yield an estimate of the intra-class >>> correlation, >>> which is another within Subject (or Class) measure, but the proper >>> formula >>> is not used much. >>> >>> Whether you look at the ANOVA table for Between or Within, or look at >>> the within-subject r from the DF, the statistic that you automatically >>> get is >>> automatically going to weight most heavily the Subject with the most >>> data. >>> - An ad-hoc solution to this might be to use a Weighting variable. I >>> *think* >>> that this would be set proportional to 1/(p-1), the reciprocal of the >>> degrees >>> of freedom for each of the Subjects. (When using weights, be careful >>> that >>> the procedure you are using will accept fractions and use them properly.) >>> >>> >>> -- >>> Rich Ulrich >>> >>> >>>> Date: Sat, 28 Feb 2015 13:20:21 -0700 >>>> From: > >> bruce.weaver@ > >>>> Subject: Re: Controlled correlation >>>> To: > >> SPSSX-L@.UGA > >>>> >>>> For a few hours now, Nabble has been showing that "This post has NOT >>> been >>>> accepted by the mailing list yet." So I'm just giving it a bump. >>>> >>>> >>>> uri1616 wrote >>>>> Hi, >>>>> >>>>> My data-set contains several cases for each subjects and two other >>> vars: >>>>> >>>>> case subjext var1 var2 >>>>> 1 1 1 7 >>>>> 2 1 2 4 >>>>> 3 1 4 5 >>>>> 4 2 5 7 >>>>> 5 2 5 2 >>>>> 6 3 9 18 >>>>> 7 3 3 4 >>>>> 8 3 12 1 >>>>> 9 3 56 5 >>>>> 10 4 3 5 >>>>> 11 4 4 6 >>>>> >>>>> as you can see, the quantity of cases for each subject varies. >>>>> >>>>> I would like to calculate correlations for var1 with var2 while >>>>> controlling for the changing quantity of cases for each subject, >>> meaning >>>>> neutralizing the effect one "heavy" subject may have on the overall >>>>> correlation. >>>>> >>>>> what should i do? >>>>> >>>>> 10x!! >>>>> Uri. >>> ===================== To manage your subscription to SPSSX-L, send a >>> message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the command. To leave the >> list, send the command SIGNOFF SPSSX-L For a list of commands to manage >> subscriptions, send the command INFO REFCARD >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Controlled-correlation-tp5728844p5728860.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Tank you all!
|
Administrator
|
In reply to this post by Ryan
Hi Ryan. I don't know why the OP wants to ignore those points. But I do see a parallel to the old "unweighted means" method (or "equally weighted means", as Dave Howell would say) for estimating an unbalanced factorial ANOVA model.
By the way, the OP has posted again (it shows up in Nabble) thanking everyone for their replies, but giving no indication of why they want to do this. Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Hi Bruce, It should be possible to estimate weighted and unweighted means with standard errors, confidence intervals, and p-values directly using the TEST sub-command of MIXED from the traditional parameterization, but I don't have time to work it out. Ryan On Sun, Mar 1, 2015 at 3:55 PM, Bruce Weaver <[hidden email]> wrote: Hi Ryan. I don't know why the OP wants to ignore those points. But I do see |
In reply to this post by uri1616
Hi All,
Again thanks, I'm afraid that by trying to simplify the description of the problem and using inaccurate terms I may have mislead those who tried to help. So I will re-state it as detailed as possible. My data includes about 300 interviews, each made with one specific subject. Each interview is divided to segments represented each by one case. Each case hold data about the content of the segment. So the two first variables are "subject" (indicating each interview) and "segs" (indicating each segment inside each interview. Other variables describe the content inside each segment. eg. - 'supp': number of supportive utterances in a segment (made by the interviewer); 'rlct' - number of reluctant responses (made by interviewees). the structure of file is - subject segs supp rlct 1 1 1 7 1 2 2 4 1 3 4 5 2 1 5 7 2 2 5 2 3 1 9 18 3 2 3 4 3 3 12 1 3 4 56 5 4 1 3 5 4 2 4 6 What I am looking for is to identify correlation (and I am not sure which, perhaps pearson, perhaps regression model, or something else) between supp and rlct. The problem was, as I understand, that I can't simply estimate correlations bc my cases (each represent a segment) are not individual, but are grouped (inside different interviews). I hope I made myself more clear this time and will appreciate a lot your help, Uri. |
Uri,
Are you wanting to restructure the data subject segs1 supp1 rlct1 segs2 supp2 rlct2 segs3 supp3 rlct3 1 1 1 7 2 2 4 3 4 5 So you can correlate SUPP1 with RLCT1 and SUPP2 with RLCT2 etc. If so look into CASESTOVARS using subject as the ID, segs as the Index. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of uri1616 Sent: Wednesday, March 04, 2015 10:29 AM To: [hidden email] Subject: Re: [SPSSX-L] Controlled correlation Hi All, Again thanks, I'm afraid that by trying to simplify the description of the problem and using inaccurate terms I may have mislead those who tried to help. So I will re-state it as detailed as possible. My data includes about 300 interviews, each made with one specific subject. Each interview is divided to segments represented each by one case. Each case hold data about the content of the segment. So the two first variables are "subject" (indicating each interview) and "segs" (indicating each segment inside each interview. Other variables describe the content inside each segment. eg. - 'supp': number of supportive utterances in a segment (made by the interviewer); 'rlct' - number of reluctant responses (made by interviewees). the structure of file is - subject segs supp rlct 1 1 1 7 1 2 2 4 1 3 4 5 2 1 5 7 2 2 5 2 3 1 9 18 3 2 3 4 3 3 12 1 3 4 56 5 4 1 3 5 4 2 4 6 What I am looking for is to identify correlation (and I am not sure which, perhaps pearson, perhaps regression model, or something else) between supp and rlct. The problem was, as I understand, that I can't simply estimate correlations bc my cases (each represent a segment) are not individual, but are grouped (inside different interviews). I hope I made myself more clear this time and will appreciate a lot your help, Uri. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Controlled-correlation-tp5728844p5728891.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by uri1616
After reading that, I now think Ryan guessed correctly that you need to use a multilevel model that takes into account the clustering of observations within subjects. See his post(s) earlier in this thread.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I am sure that Ryan's model can get *some* r, and maybe it can get
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
both the within-subject correlation and the between-subject correlation -- but those remain two distinct features. (I thought that in some post I saw labels for the Segs, which showed them to be distinct areas rather than replications; that would further change the problem, but I can't find that post.) If these are real data listed, then I note that ID=3 is evidence of some pretty bad scaling for the prospect of simple correlations (9 and 18; 56 and 5). "Square root" was my first thought, but that 56 produces another outlier. Windsordize to 10? -- Rich Ulrich > Date: Wed, 4 Mar 2015 14:57:36 -0700 > From: [hidden email] > Subject: Re: Controlled correlation > To: [hidden email] > > After reading that, I now think Ryan guessed correctly that you need to use a > multilevel model that takes into account the clustering of observations > within subjects. See his post(s) earlier in this thread. > > > > uri1616 wrote > > Hi All, > > Again thanks, > > I'm afraid that by trying to simplify the description of the problem and > > using inaccurate terms I may have mislead those who tried to help. So I > > will re-state it as detailed as possible. > > > > My data includes about 300 interviews, each made with one specific > > subject. Each interview is divided to segments represented each by one > > case. Each case hold data about the content of the segment. So the two > > first variables are "subject" (indicating each interview) and "segs" > > (indicating each segment inside each interview. > > Other variables describe the content inside each segment. eg. - 'supp': > > number of supportive utterances in a segment (made by the interviewer); > > 'rlct' - number of reluctant responses (made by interviewees). > > > > the structure of file is - > > > > subject segs supp rlct > > 1 1 1 7 > > 1 2 2 4 > > 1 3 4 5 > > 2 1 5 7 > > 2 2 5 2 > > 3 1 9 18 > > 3 2 3 4 > > 3 3 12 1 > > 3 4 56 5 > > 4 1 3 5 > > 4 2 4 6 > > > > What I am looking for is to identify correlation (and I am not sure which, > > perhaps pearson, perhaps regression model, or something else) between supp > > and rlct. The problem was, as I understand, that I can't simply estimate > > correlations bc my cases (each represent a segment) are not individual, > > but are grouped (inside different interviews). > > > > I hope I made myself more clear this time and will appreciate a lot your > > help, > > Uri. > |
Hi Rich, That isn't real data, just an illustration cause the file holds thousands of cases... On Thu, Mar 5, 2015 at 10:15 PM, Rich Ulrich [via SPSSX Discussion] <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |