hi all, I have the following problem:
I have a dataset which has the age at screening of each individual. An individual can have many screenings, so each row in the dataset has the ID of the individual, the screening no. (1st screening, 2nd etc) and the age at each screening. There are males and females in the file. I also have "no. of samples", depending on how many screenings an individual had, i.e. for an individual with 5 screenings, the total number of samples is 5. I want to compare, eventually, across all sample strata, the mean age between males and females, but using weighted sums across strata (as you will see below). Is the right test the 2-way ANOVA through the GLM procedure? Is there a simpler way to do this in SPSS? So far I have calculated everything separately in SPSS: Within each sample stratum, i.e. for all individuals with 3 screenings etc, I have calculated the agediff = mean age females - mean age males, and the respective standard errors. I have set x(i) = 1/se(i)**2 (this is the inverse of the se to the power of 2 for sample stratum i), and w(i) = x(i)/(sum of all x(i) over all i sample strata) Then I calculated the pooled agediff = sum over all i of(w(i)*agediff(i)) (A). Doing these calculations I have come up with a file which has one row for each sample stratum, in which I have the agedifference, the mean age in females, the mean age in males and the x and w for each sample stratum. Would it be correct to run an independent samples t-test, so that I can examine the age difference between males and females pooled across all the sample strata? However, I don't know how to incorporate the specific weights w I described above into the ANOVA or the t-test procedure, i.e. do the procedures in SPSS calculate the pooled age difference as the sum over all i of(w(i)*agediff(i)) or do I need to change something in the syntax or write a macro to get (A)? I appreciate your help, thank you. |
Others with more experience may have the correct understanding but, I think, this is just a multilevel problem. You have screening age nested within person. No level 1 predictors; one level two predictor: gender. I think you can use Mixed with emmeans subcommand.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of xenia Sent: Thursday, November 08, 2012 9:58 AM To: [hidden email] Subject: comparison of means across strata hi all, I have the following problem: I have a dataset which has the age at screening of each individual. An individual can have many screenings, so each row in the dataset has the ID of the individual, the screening no. (1st screening, 2nd etc) and the age at each screening. There are males and females in the file. I also have "no. of samples", depending on how many screenings an individual had, i.e. for an individual with 5 screenings, the total number of samples is 5. I want to compare, eventually, across all sample strata, the mean age between males and females, but using weighted sums across strata (as you will see below). Is the right test the 2-way ANOVA through the GLM procedure? Is there a simpler way to do this in SPSS? So far I have calculated everything separately in SPSS: Within each sample stratum, i.e. for all individuals with 3 screenings etc, I have calculated the agediff = mean age females - mean age males, and the respective standard errors. I have set x(i) = 1/se(i)**2 (this is the inverse of the se to the power of 2 for sample stratum i), and w(i) = x(i)/(sum of all x(i) over all i sample strata) Then I calculated the pooled agediff = sum over all i of(w(i)*agediff(i)) (A). Doing these calculations I have come up with a file which has one row for each sample stratum, in which I have the agedifference, the mean age in females, the mean age in males and the x and w for each sample stratum. Would it be correct to run an independent samples t-test, so that I can examine the age difference between males and females pooled across all the sample strata? However, I don't know how to incorporate the specific weights w I described above into the ANOVA or the t-test procedure, i.e. do the procedures in SPSS calculate the pooled age difference as the sum over all i of(w(i)*agediff(i)) or do I need to change something in the syntax or write a macro to get (A)? I appreciate your help, thank you. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/comparison-of-means-across-strata-tp5716113.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you,
I want to compare the mean age for females with the mean age for males, across all sample strata. Each stratum represents the total number of screenings someone had, so in stratum 1 I have 10 individuals, in stratum 2 I have 5 and so on. So, shouldn't the stratum variable be included as a predictor? |
I will be out of the office until November 13th, with limited access to e-mail.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by xenia
The question is, What is your hypothesis? Or, is there
more than one? It seems reasonable, to me, that you would compare several variables that describe aspects of the study -- using obvious names, InitialAge; FinalAge; Duration; AverageAge. Duration uses StartDate and EndDate. Only AverageAge makes use of more than one Age. I don't see any reason to use anything other than the average of the several ages that have been recorded, though other aspects of the study might suggest something else. It looks like t-tests to me. -- Rich Ulrich > Date: Thu, 8 Nov 2012 14:34:51 -0800 > From: [hidden email] > Subject: Re: comparison of means across strata > To: [hidden email] > > Thank you, > I want to compare the mean age for females with the mean age for males, > across all sample strata. Each stratum represents the total number of > screenings someone had, so in stratum 1 I have 10 individuals, in stratum 2 > I have 5 and so on. So, shouldn't the stratum variable be included as a > predictor? > > ... |
Thank you,
the hypothesis is: the mean age in females is equal to the mean age in males, pooled across the strata. It is t-tests, but I obviously don't want to perform multiple t-tests, this is why I was thinking that the right test would be ANOVA. I also think that the variable for strata should be included because there could be a relationship with the number of screenings a person has, the no.of screenings is implied by the strata variable as I have explained in my initial query. |
Hello. I will be out of the office until Tuesday November 13, 2012. I will do my best to respond to you upon my return. If you have not heard back by Thursday November 15, please contact me again. Thank you, Stephanie L. Marhefka, Ph.D. |
In reply to this post by xenia
Okay, that statement of "what the hypothesis is" would not
satisfy me if I were in your audience. What do you mean by "pooled"? You are writing a narrative, and *that* answer will not satisfy anyone about anything. Why don't you want to perform multiple t-tests? - This is not a sort of test that introduces a "problem of multiple testing" because, I am pretty sure, the average age is not an *outcome* of otherwise of vital interest. When you are trying to validate, to show that your sample is matched, you owe it to your audience to be complete by testing everything. The "cheating" occurs when you avoid the separate tests. Perhaps you do not want to *report* multiple t-tests because of the clutter. In that case, you perform all the tests that might be of interest, to test whether the samples are equivalent by age, and then (for instance) report the most and least divergent and explain what they are. Of just show that the most divergent is not at all divergent. If "number of strata" is important, you might look separately at means for various counts of followups. -- Rich Ulrich > Date: Thu, 8 Nov 2012 21:40:48 -0800 > From: [hidden email] > Subject: Re: comparison of means across strata > To: [hidden email] > > Thank you, > the hypothesis is: the mean age in females is equal to the mean age in > males, pooled across the strata. It is t-tests, but I obviously don't want > to perform multiple t-tests, this is why I was thinking that the right test > would be ANOVA. I also think that the variable for strata should be included > because there could be a relationship with the number of screenings a person > has, the no.of screenings is implied by the strata variable as I have > explained in my initial query. > > ... |
"Pooled" means: the overall test across the strata, not within.
What you describe can be done more efficiently and in less time using the ANOVA. I don't see the point of doing many t-tests when I can do an ANOVA, especially if the number of tests is e.g. 1000. I think a general linear model is more efficient than the t-test. Can you state the reasons why you think doing t-tests is better than doing a GLM? |
[Apparently not posted on first try. Paragraph 2 added.] Keep the distinction between tests of outcome and tests for study validation. Finding a difference in age (I presume) is something that undermines the study. It is not an "outcome" that encourages publication as an important study. The proper attitude towards validation tests is, "Bring 'em on! We'll test everything that any critic may worry about." And, as I said, your write-up can use an an honest summary instead of detailing everything. This is all basic description that you ought to want to know for your own good confidence in going ahead, if age is any real concern. If you just want the crudest comparison, why not the average for each person. I'm a great advocate of an "overall test" when I have a sensible "overall hypothesis" that it will answer. In this case, as described, I would want to know if they subjects were recruited at the same age; if they stayed in the same amount of time (and same count of followups... how consistent are the gaps?); and if they are the same age at the end. Those are separate tests. If they show big differences, you have new concerns for your later tests. This is validation for the study design, where those things are (probably) assumed to be equal. And the separate assumptions deserve separate tests. If I see only one fuzzy, overall test, I will be led to imagine bad things, such as, the analyst is either careless or hiding something. -- Rich Ulrich > From: [hidden email] > Subject: Re: comparison of means across strata > To: [hidden email] > > "Pooled" means: the overall test across the strata, not within. > What you describe can be done more efficiently and in less time using the > ANOVA. I don't see the point of doing many t-tests when I can do an ANOVA, > especially if the number of tests is e.g. 1000. I think a general linear > model is more efficient than the t-test. Can you state the reasons why you > think doing t-tests is better than doing a GLM? > >... |
In reply to this post by Maguin, Eugene
Thank you for this.
-- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/comparison-of-means-across-strata-tp5716113p5716146.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I will be out of office on Nov12th and Nov13th.
For ARF, please contact ARF Sample
[hidden email] For SBA, Please contact Springboard US Sample
[hidden email] For SBUK, please contact Springboard UK Sample
[hidden email] Thanks! Cheers, Li Cao
Sample Analyst | National Panels | Vision Critical
direct 778.373.0459
|
In reply to this post by Rich Ulrich
Your reply was posted to my email, and I have already replied:
I'm doing an exploratory analysis of some existing data, this is not a study that is recruiting and will be published. So maybe you're talking in terms of a clinical study but I'm not. I have two groups and right now I'm interested on whether their mean ages differ, nothing else. This is not a survival or other time-to-event analysis, so whether they are the same age at the end or when they were recruited is not relevant. The things you mention are relevant to the setting-up of a trial, this is not what I'm doing i.e. I'm not designing a study. I asked something very specific: how to compare means of two groups across strata method-wise, when I have age, sex and number of samples. If I was running a trial I would have decided a priori which variables to examine and what primary outcomes I'm interested in, but all these are quite different to the point of my question. I did not mention anything like that in my initial query, regarding study set-up or publication etc etc, so this conversation is not to the point. However, thank you for taking the time to write and offer your help. |
Free forum by Nabble | Edit this page |