Hi all,
I found in several articles that it makes sense to split a sample in two to derive the (yet unknown) factor structure from the first half using EFA and then to test this structure using CFA with the second half. I would consider it bad practice to use both strategies on the entire sample; however, a colleague of mine disagrees. I have been searching for arguments/references in favor of my assumption but was not too successful (except for a lecture script, which does not mention any references, I did not find anything specific). Any hints would be appreciated. Regards Tanja -- Tanja Gabriele Baudson Universität Trier FB I Psychologie Hochbegabtenforschung und -förderung 54286 Trier Fon 0651/201-4558 Fax 0651/201-4578 Email [hidden email] Web http://www.uni-trier.de/index.php?id=9492 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Tanja
The logic (which applies to any model building and testing situation, not just measurement model construction) is that a model is likely to fit the data set it was created from better than any other random sample from the same population - on that basis, if you want an honest test of how well your model fits the data, you need to forumulate and test it on different data sets. Building and testing on the same sample will bia your assessment of model fit upwards. Split half validation is one way around this. cheers Chris On 22/11/2010 20:09, Tanja Gabriele Baudson wrote: > Hi all, > > I found in several articles that it makes sense to split a sample in > two to derive the (yet unknown) factor structure from the first half > using EFA and then to test this structure using CFA with the second > half. I would consider it bad practice to use both strategies on the > entire sample; however, a colleague of mine disagrees. I have been > searching for arguments/references in favor of my assumption but was > not too successful (except for a lecture script, which does not > mention any references, I did not find anything specific). Any hints > would be appreciated. > > Regards > Tanja > -- > Tanja Gabriele Baudson > Universität Trier > FB I Psychologie > Hochbegabtenforschung und -förderung > 54286 Trier > Fon 0651/201-4558 > Fax 0651/201-4578 > Email [hidden email] > Web http://www.uni-trier.de/index.php?id=9492 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I agree with Chris. Here is a nice quote from Dave Howell's book Statistical Methods for Psychology, in which he makes the same point in the context of multiple linear regression.
"Essentially, we have an equation that does its best to fit every bump and wiggle (including sampling error) in the data. We should not be surprised when it does not do as well in accounting for different bumps and wiggles in a different set of data. However, substantial differences between R^2 and R^2_cv are an indication that our solution lacks appreciable validity." (Howell 2007, p. 524). To understand what R^2_cv is referring to, imagine fitting a model on half of the data. Now use that model to compute fitted values in the other half of the data (i.e., in the cross-validation data set). R^2_cv = the squared correlation between Y and Y-prime in the cross-validation data set. R^2, on the other hand, is just the R^2 value from the model in the original data set. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Tanja Gabriele Baudson
Kenneth Bollen's (1989) "Structural Equations with Laten Variables"
describes the split-sample cross-validation procedure on page 278 in the model evaluation section of the chapter on confirmatory factor analysis. He cites Cudeck and Brown (1983) as the source for his presentation and identifies that they conducted simulations on the procedure. On the next page Bollen identifies the information measures (Akaike's and Schwarz's modification) as an alternative to conducting a split sample analysis. This is old stuff so I'd assume that there is more recent thinking and research on the procedures -- a citation analysis of Cudeck and Brown might revelant current publications. Cudeck, R. & M.W. Browne (1983) Cross-validation of covariance structures. Multivariate Behavioral Research, 18, 147-167. -Mike Palij New York University [hidden email] ----- Original Message ----- From: "Tanja Gabriele Baudson" <[hidden email]> To: <[hidden email]> Sent: Monday, November 22, 2010 3:09 PM Subject: EFA and CFA on the same data set > Hi all, > > I found in several articles that it makes sense to split a sample in > two to derive the (yet unknown) factor structure from the first half > using EFA and then to test this structure using CFA with the second > half. I would consider it bad practice to use both strategies on the > entire sample; however, a colleague of mine disagrees. I have been > searching for arguments/references in favor of my assumption but was > not too successful (except for a lecture script, which does not > mention any references, I did not find anything specific). Any hints > would be appreciated. > > Regards > Tanja > -- > Tanja Gabriele Baudson > Universität Trier > FB I Psychologie > Hochbegabtenforschung und -förderung > 54286 Trier > Fon 0651/201-4558 > Fax 0651/201-4578 > Email [hidden email] > Web http://www.uni-trier.de/index.php?id=9492 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |