My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because
when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an
ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including
the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you
have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA
be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.
Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables,
why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a
correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens? It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow
my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to
examine the independent variables simultaneously. My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references
are appreciated. Peter Ji Adler University |
Okay, off-topic for SPSS. Here's a recommendation -
Harrell FE:
Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd Edition. New York: Springer, 2015.
I Googled on < Frank Harrell Jr "stepwise regression" >, and I
found his 2017 tweet,
Statistical quote of the day. Stepwise variable selection has
done incredible damage to science. How did we statisticians
let this happen?
Other hits will direct you to his set of guidelines about stepwise;
I promoted those comments in the Usenet stats-groups, umpteen years ago.
Short summary: If you start out with 100 variables, you expect, by
chance alone, that 5% of them will hit the 5% test criterion. That's
especially of concern for the variables that are "truly unrelated" to
the outcome, but you also get an exaggerated look at effects that
are small and useless. With huge samples, you can use 100 covariates,
but "see" trivial effects if you stick with a fixed, 5% cut-off. The
multi-test problem, combined with big samples, has brought more
focus on "effect size".
OTOH, one also should bear in mind the way that they confound each
other. If you have 5 variables that are highly correlated with "sex"
(for instance), then the potential effect of sex will be spread across
all five -- You do not get one /measure/ of the effect of sex, since each
suppresses the others. With multiple measures, you don't know what
you /have/ for most of them, because you haven't inspected /all/ the
correlations.
Use reason to reduce the pool of variables. Use factor analysis to reduce
the remaining variables to meaningful (in terms of outcome) composites.
There are related possibilities and concerns, including concern for the
"scaling" of measures, i.e., what transformations should be made of raw
scores to make them properly "linear" with respect to the outcome.
If you want to "try out" a large pool of potential covariates, keep /firmly/
in mind that the results are /exploratory/.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Ji, Peter <[hidden email]>
Sent: Friday, December 27, 2019 5:55 PM To: [hidden email] <[hidden email]> Subject: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression. My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.
Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens? It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously. My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.
Peter Ji Adler University |
In reply to this post by pji
>>>Granted including the covariates means giving up statistical power, but
conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control?<<<<< When you include covariates, you actually gain power for your focal prediction as you are removing noise from it, if my understanding is correct per Cohen's exposition of just this case. This is the case of f2. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rich Ulrich
I didn't read the OP's question as being about fishing for covariates. After all, the group definitions could be "fished for", too. If there are good grounds for including particular covariates, it makes sense to me to include those in the ANCOVA/regression analysis unless the data have been constructed to balance them. On Sun, Dec 29, 2019 at 12:10 PM Rich Ulrich <[hidden email]> wrote:
|
In reply to this post by pji
Peter.... In my 20 years' experience of Medical Statistics I have found that Psychologists are very proficient at Medical Statistics... If needs be..feel free to join the Google Group "MedStats".....it hosts most of the most notable Medical Statisticians. Anyone...literally "Anyone"...can view the exchanges...the discussion....but to post your query you do need to join. If you do join you will have access to the Archives of all previous exchanges. You might agree..an excellent resource. I'm biased..I am the founder of MedStats Kind Regards Martin P. Holt Freelance Medical Statistician If you can't explain it simply, you don't understand it well enough.....Einstein Concise Encyclopedia of Biostatistics for Medical Professionals Martin P. Holt Linked In: https://www.linkedin.com/in/martin-holt-3b800b48?trk=nav_responsive_tab_profile
On Sunday, 29 December 2019, 18:20:19 GMT, Ji, Peter <[hidden email]> wrote:
My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.
Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens? It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously. My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.
Peter Ji Adler University |
Free forum by Nabble | Edit this page |