EFA and CFA on the same data set

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

EFA and CFA on the same data set

Tanja Gabriele Baudson
Hi all,

I found in several articles that it makes sense to split a sample in
two to derive the (yet unknown) factor structure from the first half
using EFA and then to test this structure using CFA with the second
half. I would consider it bad practice to use both strategies on the
entire sample; however, a colleague of mine disagrees. I have been
searching for arguments/references in favor of my assumption but was
not too successful (except for a lecture script, which does not
mention any references, I did not find anything specific). Any hints
would be appreciated.

Regards
Tanja
--
Tanja Gabriele Baudson
Universität Trier
FB I Psychologie
Hochbegabtenforschung und -förderung
54286 Trier
Fon 0651/201-4558
Fax 0651/201-4578
Email [hidden email]
Web http://www.uni-trier.de/index.php?id=9492

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: EFA and CFA on the same data set

Christopher Stride
Hi Tanja
The logic (which applies to any model building and testing situation,
not just measurement model construction) is that a model is likely to
fit the data set it was created from better than any other random sample
from the same population - on that basis, if you want an honest test of
how well your model fits the data, you need to forumulate and test it on
different data sets. Building and testing on the same sample will bia
your assessment of model fit upwards. Split half validation is one way
around this.
cheers
Chris

On 22/11/2010 20:09, Tanja Gabriele Baudson wrote:

> Hi all,
>
> I found in several articles that it makes sense to split a sample in
> two to derive the (yet unknown) factor structure from the first half
> using EFA and then to test this structure using CFA with the second
> half. I would consider it bad practice to use both strategies on the
> entire sample; however, a colleague of mine disagrees. I have been
> searching for arguments/references in favor of my assumption but was
> not too successful (except for a lecture script, which does not
> mention any references, I did not find anything specific). Any hints
> would be appreciated.
>
> Regards
> Tanja
> --
> Tanja Gabriele Baudson
> Universität Trier
> FB I Psychologie
> Hochbegabtenforschung und -förderung
> 54286 Trier
> Fon 0651/201-4558
> Fax 0651/201-4578
> Email [hidden email]
> Web http://www.uni-trier.de/index.php?id=9492
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: EFA and CFA on the same data set

Bruce Weaver
Administrator
I agree with Chris.  Here is a nice quote from Dave Howell's book Statistical Methods for Psychology, in which he makes the same point in the context of multiple linear regression.  

"Essentially, we have an equation that does its best to fit every bump and wiggle (including sampling error) in the data.  We should not be surprised when it does not do as well in accounting for different bumps and wiggles in a different set of data.  However, substantial differences between R^2 and R^2_cv are an indication that our solution lacks appreciable validity."  (Howell 2007, p. 524).

To understand what R^2_cv is referring to, imagine fitting a model on half of the data.  Now use that model to compute fitted values in the other half of the data (i.e., in the cross-validation data set).  R^2_cv = the squared correlation between Y and Y-prime in the cross-validation data set.  R^2, on the other hand, is just the R^2 value from the model in the original data set.

HTH.

Dr C B Stride wrote
Hi Tanja
The logic (which applies to any model building and testing situation,
not just measurement model construction) is that a model is likely to
fit the data set it was created from better than any other random sample
from the same population - on that basis, if you want an honest test of
how well your model fits the data, you need to forumulate and test it on
different data sets. Building and testing on the same sample will bia
your assessment of model fit upwards. Split half validation is one way
around this.
cheers
Chris

On 22/11/2010 20:09, Tanja Gabriele Baudson wrote:
> Hi all,
>
> I found in several articles that it makes sense to split a sample in
> two to derive the (yet unknown) factor structure from the first half
> using EFA and then to test this structure using CFA with the second
> half. I would consider it bad practice to use both strategies on the
> entire sample; however, a colleague of mine disagrees. I have been
> searching for arguments/references in favor of my assumption but was
> not too successful (except for a lecture script, which does not
> mention any references, I did not find anything specific). Any hints
> would be appreciated.
>
> Regards
> Tanja
> --
> Tanja Gabriele Baudson
> Universität Trier
> FB I Psychologie
> Hochbegabtenforschung und -förderung
> 54286 Trier
> Fon 0651/201-4558
> Fax 0651/201-4578
> Email baudson@uni-trier.de
> Web http://www.uni-trier.de/index.php?id=9492
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: EFA and CFA on the same data set

Mike
In reply to this post by Tanja Gabriele Baudson
Kenneth Bollen's (1989) "Structural Equations with Laten Variables"
describes the split-sample cross-validation procedure on page 278
in the model evaluation section of the chapter on confirmatory factor
analysis.  He cites Cudeck and Brown (1983) as the source for his
presentation and identifies that they conducted simulations on the
procedure.  On the next page Bollen identifies the information measures
(Akaike's and Schwarz's modification) as an alternative to conducting
a split sample analysis.  This is old stuff so I'd assume that there is
more recent thinking and research on the procedures -- a citation
analysis of Cudeck and Brown might revelant current publications.

Cudeck, R. & M.W. Browne (1983) Cross-validation of covariance
structures.  Multivariate Behavioral Research, 18, 147-167.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Tanja Gabriele Baudson" <[hidden email]>
To: <[hidden email]>
Sent: Monday, November 22, 2010 3:09 PM
Subject: EFA and CFA on the same data set


> Hi all,
>
> I found in several articles that it makes sense to split a sample in
> two to derive the (yet unknown) factor structure from the first half
> using EFA and then to test this structure using CFA with the second
> half. I would consider it bad practice to use both strategies on the
> entire sample; however, a colleague of mine disagrees. I have been
> searching for arguments/references in favor of my assumption but was
> not too successful (except for a lecture script, which does not
> mention any references, I did not find anything specific). Any hints
> would be appreciated.
>
> Regards
> Tanja
> --
> Tanja Gabriele Baudson
> Universität Trier
> FB I Psychologie
> Hochbegabtenforschung und -förderung
> 54286 Trier
> Fon 0651/201-4558
> Fax 0651/201-4578
> Email [hidden email]
> Web http://www.uni-trier.de/index.php?id=9492
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD