conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

classic Classic list List threaded Threaded
5 messages Options
pji
Reply | Threaded
Open this post in threaded view
|

conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

pji

My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.

 

Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens?  It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously.  My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.

 

Peter Ji

Adler University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

Rich Ulrich
Okay, off-topic for SPSS.   Here's a recommendation -

I Googled on < Frank Harrell Jr "stepwise regression" >, and I
found his 2017 tweet,
   Statistical quote of the day. Stepwise variable selection has
   done incredible damage to science. How did we statisticians
   let this happen?

Other hits will direct you to his set of guidelines about stepwise;
I promoted those comments in the Usenet stats-groups, umpteen years ago.

Short summary: If you start out with 100 variables, you expect, by
chance alone, that 5% of them will hit the 5% test criterion. That's
especially of concern for the variables that are "truly unrelated" to
the outcome, but you also get an exaggerated look at effects that
are small and useless.  With huge samples, you can use 100 covariates,
but "see" trivial effects if you stick with a fixed, 5% cut-off. The
multi-test problem, combined with big samples, has brought more
focus on "effect size".

OTOH, one also should bear in mind the way that they confound each
other.  If you have 5 variables that are highly correlated with "sex"
(for instance), then the potential effect of sex will be spread across
all five -- You do not get one /measure/ of the effect of sex, since each
suppresses the others. With multiple measures, you don't know what
you /have/ for most of them, because you haven't inspected /all/ the
correlations.

Use reason to reduce the pool of variables. Use factor analysis to reduce
the remaining variables to meaningful (in terms of outcome) composites.
There are related possibilities and concerns, including concern for the
"scaling" of measures, i.e., what transformations should be made of raw
scores to make them properly "linear" with respect to the outcome.

If you want to "try out" a large pool of potential covariates, keep /firmly/
in mind that the results are /exploratory/.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Ji, Peter <[hidden email]>
Sent: Friday, December 27, 2019 5:55 PM
To: [hidden email] <[hidden email]>
Subject: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.
 

My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.

 

Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens?  It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously.  My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.

 

Peter Ji

Adler University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

J.D. Haltigan
In reply to this post by pji
>>>Granted including the covariates means giving up statistical power, but
conserving statistical power shouldn't be the reason for using an ANOVA
versus an ANCOVA. In other words, assuming a sufficient sample size, why not
conduct an ANCOVA every time instead of an ANOVA if you have covariates,
especially demographics, where you can enter the covariates for statistical
control?<<<<<

When you include covariates, you actually gain power for your focal
prediction as you are removing noise from it, if my understanding is correct
per Cohen's exposition of just this case. This is the case of f2.



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

Jon Peck
In reply to this post by Rich Ulrich
I didn't read the OP's question as being about fishing for covariates.  After all, the group definitions could be "fished for", too.  If there are good grounds for including particular covariates, it makes sense to me to include those in the ANCOVA/regression analysis unless the data have been constructed to balance them.

On Sun, Dec 29, 2019 at 12:10 PM Rich Ulrich <[hidden email]> wrote:
Okay, off-topic for SPSS.   Here's a recommendation -

I Googled on < Frank Harrell Jr "stepwise regression" >, and I
found his 2017 tweet,
   Statistical quote of the day. Stepwise variable selection has
   done incredible damage to science. How did we statisticians
   let this happen?

Other hits will direct you to his set of guidelines about stepwise;
I promoted those comments in the Usenet stats-groups, umpteen years ago.

Short summary: If you start out with 100 variables, you expect, by
chance alone, that 5% of them will hit the 5% test criterion. That's
especially of concern for the variables that are "truly unrelated" to
the outcome, but you also get an exaggerated look at effects that
are small and useless.  With huge samples, you can use 100 covariates,
but "see" trivial effects if you stick with a fixed, 5% cut-off. The
multi-test problem, combined with big samples, has brought more
focus on "effect size".

OTOH, one also should bear in mind the way that they confound each
other.  If you have 5 variables that are highly correlated with "sex"
(for instance), then the potential effect of sex will be spread across
all five -- You do not get one /measure/ of the effect of sex, since each
suppresses the others. With multiple measures, you don't know what
you /have/ for most of them, because you haven't inspected /all/ the
correlations.

Use reason to reduce the pool of variables. Use factor analysis to reduce
the remaining variables to meaningful (in terms of outcome) composites.
There are related possibilities and concerns, including concern for the
"scaling" of measures, i.e., what transformations should be made of raw
scores to make them properly "linear" with respect to the outcome.

If you want to "try out" a large pool of potential covariates, keep /firmly/
in mind that the results are /exploratory/.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Ji, Peter <[hidden email]>
Sent: Friday, December 27, 2019 5:55 PM
To: [hidden email] <[hidden email]>
Subject: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.
 

My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.

 

Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens?  It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously.  My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.

 

Peter Ji

Adler University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

Martin Holt-3
In reply to this post by pji
Peter....

In my 20 years' experience of Medical Statistics I have found that Psychologists are very proficient at Medical Statistics... 

If needs be..feel free to join the Google Group "MedStats".....it hosts most of the most notable Medical Statisticians. 

Anyone...literally "Anyone"...can view the exchanges...the discussion....but to post your query you do need to join.

If you do join you will have access to the Archives of all previous exchanges. You might agree..an excellent resource.

I'm biased..I am the founder of MedStats Emoji

Kind Regards

Martin P. Holt


Freelance Medical Statistician

If you can't explain it simply, you don't understand it well enough.....Einstein


Concise

Encyclopedia

of Biostatistics for

Medical Professionals 


Martin P. Holt

https://www.crcpress.com/Concise-Encyclopedia-of-Biostatistics-for-Medical-Professionals/Indrayan-Holt/9781482243871


Linked In: 
https://www.linkedin.com/in/martin-holt-3b800b48?trk=nav_responsive_tab_profile


On Sunday, 29 December 2019, 18:20:19 GMT, Ji, Peter <[hidden email]> wrote:


My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.

 

Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens?  It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously.  My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.

 

Peter Ji

Adler University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD