SPSSX Discussion

conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

Classic

List

Threaded

5 messages Options

pji

conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.

Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens? It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously. My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.

Peter Ji

Adler University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

Okay, off-topic for SPSS. Here's a recommendation -

Harrell FE: Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd Edition. New York: Springer, 2015.

I Googled on < Frank Harrell Jr "stepwise regression" >, and I

found his 2017 tweet,

Statistical quote of the day. Stepwise variable selection has

done incredible damage to science. How did we statisticians

let this happen?

Other hits will direct you to his set of guidelines about stepwise;

I promoted those comments in the Usenet stats-groups, umpteen years ago.

Short summary: If you start out with 100 variables, you expect, by

chance alone, that 5% of them will hit the 5% test criterion. That's

especially of concern for the variables that are "truly unrelated" to

the outcome, but you also get an exaggerated look at effects that

are small and useless. With huge samples, you can use 100 covariates,

but "see" trivial effects if you stick with a fixed, 5% cut-off. The

multi-test problem, combined with big samples, has brought more

focus on "effect size".

OTOH, one also should bear in mind the way that they confound each

other. If you have 5 variables that are highly correlated with "sex"

(for instance), then the potential effect of sex will be spread across

all five -- You do not get one /measure/ of the effect of sex, since each

suppresses the others. With multiple measures, you don't know what

you /have/ for most of them, because you haven't inspected /all/ the

correlations.

Use reason to reduce the pool of variables. Use factor analysis to reduce

the remaining variables to meaningful (in terms of outcome) composites.

There are related possibilities and concerns, including concern for the

"scaling" of measures, i.e., what transformations should be made of raw

scores to make them properly "linear" with respect to the outcome.

If you want to "try out" a large pool of potential covariates, keep /firmly/

in mind that the results are /exploratory/.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Ji, Peter <[hidden email]>
Sent: Friday, December 27, 2019 5:55 PM
To: [hidden email] <[hidden email]>
Subject: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

Peter Ji

Adler University

J.D. Haltigan

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

In reply to this post by pji

>>>Granted including the covariates means giving up statistical power, but
conserving statistical power shouldn't be the reason for using an ANOVA
versus an ANCOVA. In other words, assuming a sufficient sample size, why not
conduct an ANCOVA every time instead of an ANOVA if you have covariates,
especially demographics, where you can enter the covariates for statistical
control?<<<<<

When you include covariates, you actually gain power for your focal
prediction as you are removing noise from it, if my understanding is correct
per Cohen's exposition of just this case. This is the case of f2.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon Peck

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

In reply to this post by Rich Ulrich

I didn't read the OP's question as being about fishing for covariates. After all, the group definitions could be "fished for", too. If there are good grounds for including particular covariates, it makes sense to me to include those in the ANCOVA/regression analysis unless the data have been constructed to balance them.

On Sun, Dec 29, 2019 at 12:10 PM Rich Ulrich <[hidden email]> wrote:

Okay, off-topic for SPSS.   Here's a recommendation -

Harrell FE: Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd Edition. New York: Springer, 2015.

I Googled on < Frank Harrell Jr "stepwise regression" >, and I

found his 2017 tweet,

   Statistical quote of the day. Stepwise variable selection has

   done incredible damage to science. How did we statisticians

   let this happen?

Other hits will direct you to his set of guidelines about stepwise;

I promoted those comments in the Usenet stats-groups, umpteen years ago.

Short summary: If you start out with 100 variables, you expect, by

chance alone, that 5% of them will hit the 5% test criterion. That's

especially of concern for the variables that are "truly unrelated" to

the outcome, but you also get an exaggerated look at effects that

are small and useless. With huge samples, you can use 100 covariates,

but "see" trivial effects if you stick with a fixed, 5% cut-off. The

multi-test problem, combined with big samples, has brought more

focus on "effect size".

OTOH, one also should bear in mind the way that they confound each

other. If you have 5 variables that are highly correlated with "sex"

(for instance), then the potential effect of sex will be spread across

all five -- You do not get one /measure/ of the effect of sex, since each

suppresses the others. With multiple measures, you don't know what

you /have/ for most of them, because you haven't inspected /all/ the

correlations.

Use reason to reduce the pool of variables. Use factor analysis to reduce

the remaining variables to meaningful (in terms of outcome) composites.

There are related possibilities and concerns, including concern for the

"scaling" of measures, i.e., what transformations should be made of raw

scores to make them properly "linear" with respect to the outcome.

If you want to "try out" a large pool of potential covariates, keep /firmly/

in mind that the results are /exploratory/.

--

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Ji, Peter <[hidden email]>
Sent: Friday, December 27, 2019 5:55 PM
To: [hidden email] <[hidden email]>
Subject: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensible, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control? Granted including the covariates means giving up statistical power, but conserving statistical power shouldn't be the reason for using an ANOVA versus an ANCOVA. In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. Thanks for your comments, and any articles or website references are appreciated.

Another comment is that among my peers, they tend to think that a regression, rather than correlations, should always be conducted whenever you have several continuous variables. The idea is that if you have several predictor variables, why not throw them all together and see which ones have the stronger associations with the outcome variable? It is often, if not always the case, that we are collecting data on several predictor variables and one outcome variable. Once my colleagues see a correlation matrix, they are quick to point out, why not just a regression with all of the independent variables and see what happens? It’s as if my colleagues think of a regression as a “sophisticated” analysis compared to simple correlations. If I follow my colleagues logic, then why not run a regression every time you have multiple independent variables, and why bother with correlations? I maintain that you shouldn’t use a regression “just to see what happens”; you need a conceptual reason for wanting to examine the independent variables simultaneously. My question is, what is the conceptual difference or rationale for conducting correlations among variables, rather than just using regression? Thanks for your comments, and any articles or website references are appreciated.

Peter Ji

Adler University

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]

Martin Holt-3

Re: conceptual difference for conducting ANOVA vs. ANCOVA, and correlation vs. Regression.

In reply to this post by pji

Peter....

In my 20 years' experience of Medical Statistics I have found that Psychologists are very proficient at Medical Statistics...

If needs be..feel free to join the Google Group "MedStats".....it hosts most of the most notable Medical Statisticians.

Anyone...literally "Anyone"...can view the exchanges...the discussion....but to post your query you do need to join.

If you do join you will have access to the Archives of all previous exchanges. You might agree..an excellent resource.

I'm biased..I am the founder of MedStats

Kind Regards

Martin P. Holt

[hidden email]

Freelance Medical Statistician

If you can't explain it simply, you don't understand it well enough.....Einstein

Concise

Encyclopedia

of Biostatistics for

Medical Professionals

Martin P. Holt

https://www.crcpress.com/Concise-Encyclopedia-of-Biostatistics-for-Medical-Professionals/Indrayan-Holt/9781482243871

Linked In:

https://www.linkedin.com/in/martin-holt-3b800b48?trk=nav_responsive_tab_profile

On Sunday, 29 December 2019, 18:20:19 GMT, Ji, Peter <[hidden email]> wrote:

Peter Ji

Adler University