Login  Register

Re: EFA and CFA on the same data set

Posted by Bruce Weaver on Nov 22, 2010; 8:45pm
URL: http://spssx-discussion.165.s1.nabble.com/EFA-and-CFA-on-the-same-data-set-tp3275897p3275960.html

I agree with Chris.  Here is a nice quote from Dave Howell's book Statistical Methods for Psychology, in which he makes the same point in the context of multiple linear regression.  

"Essentially, we have an equation that does its best to fit every bump and wiggle (including sampling error) in the data.  We should not be surprised when it does not do as well in accounting for different bumps and wiggles in a different set of data.  However, substantial differences between R^2 and R^2_cv are an indication that our solution lacks appreciable validity."  (Howell 2007, p. 524).

To understand what R^2_cv is referring to, imagine fitting a model on half of the data.  Now use that model to compute fitted values in the other half of the data (i.e., in the cross-validation data set).  R^2_cv = the squared correlation between Y and Y-prime in the cross-validation data set.  R^2, on the other hand, is just the R^2 value from the model in the original data set.

HTH.

Dr C B Stride wrote
Hi Tanja
The logic (which applies to any model building and testing situation,
not just measurement model construction) is that a model is likely to
fit the data set it was created from better than any other random sample
from the same population - on that basis, if you want an honest test of
how well your model fits the data, you need to forumulate and test it on
different data sets. Building and testing on the same sample will bia
your assessment of model fit upwards. Split half validation is one way
around this.
cheers
Chris

On 22/11/2010 20:09, Tanja Gabriele Baudson wrote:
> Hi all,
>
> I found in several articles that it makes sense to split a sample in
> two to derive the (yet unknown) factor structure from the first half
> using EFA and then to test this structure using CFA with the second
> half. I would consider it bad practice to use both strategies on the
> entire sample; however, a colleague of mine disagrees. I have been
> searching for arguments/references in favor of my assumption but was
> not too successful (except for a lecture script, which does not
> mention any references, I did not find anything specific). Any hints
> would be appreciated.
>
> Regards
> Tanja
> --
> Tanja Gabriele Baudson
> Universität Trier
> FB I Psychologie
> Hochbegabtenforschung und -förderung
> 54286 Trier
> Fon 0651/201-4558
> Fax 0651/201-4578
> Email baudson@uni-trier.de
> Web http://www.uni-trier.de/index.php?id=9492
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).