Hello there first of all, this is my first contribution so I hope formatting
works out! *Background*: short-term cohort study with several test dates. One of the focuses was quality of life with different facets and items, mostly batteries out of WHO questionnaires with Likert scales. Being a clinical study, regular participants for the whole duration were few for this part of it: 10 in the test group, 10 in the control group, 2 more dropped out for the next step (that had changes to items), they do represent the only cases that qualified, so I'd argue they are a population rather than a sample. Questionnaire items were changed slightly, a few of them dropped. I was instructed to verify consistency, to ensure measures before and after were somewhat comparable. Dimensionality and reliability being what I chose to analyse. *Methods*: The facet at hand had 6 items; one was dropped to a grand total of 5. I attributed the highest importance to changes between different test values and prayed they were about the same. Cronbach=92s Alpha did not change a lot (0,62 -> 0,61, knowing how it overestimates with high number of variables, I was surprised it stayed constant with one item less) so the attention went to performing a PCA to verify if the % of variance explained by the items is still one main factor (implying one dimension as the WHO had confirmed in questionnaire creation). This was ok, stable from 43% to 41%. What was not ok however, were the changes in KMO and adequacy. I didn't think they would be high to begin with (expecting one known underlying factor beforehand) and I didn't pay them much attention since I wasn't about to extract and use factors anyway. First question here would be how important communalities and reporting them are (all over 0.6 except one at 0.57 for the case below)? When comparing to what I had before the change was pretty dramatic in KMO from 0,58 to a bit over 0,1 and going from 4/5 adequate variables to 0. The variables do not very highly inter-correlate as Field tends to mention might ruin a FA from the get-go. Instead the partial correlations shown left and right of the diagonal in the Anti Image correlation matrix very much do correlate. Corr Matrix V1 1,000 ,435 ,150 ,518 -,107 V2 ,435 1,000 ,361 ,302 ,541 V3 ,150 ,361 1,000 ,290 -,387 V4 ,518 ,302 ,290 1,000 ,138 V5 -,107 ,541 -,387 ,138 1,000 AIC Matrix (opposite sign shows part. corr) V1 ,143 -,891 ,843 -,819 ,882 V2 -,891 ,185 -,924 ,732 -,953 V3 ,843 -,924 ,113 -,774 ,934 V4 -,819 ,732 -,774 ,161 -,772 V5 ,882 -,953 ,934 -,772 ,130 Both measures being for sampling adequacy, can I just discard them as unimportant and take the following data I needed (first factor variance-%) or would this be not advisable? Can I just account for this as strange one-time thing as case n is so low? To add to this, what does a very high and conversely a very low KMO actually mean? Could I identify that a PCA is not recommended as there probably is one underlying factor and no reduction of dimension seems advisable (screeplots are pretty straight-forward as well: almost a line in this case as well as the original survey version)? Or are there really other tests (parallel analysis) more appropriate to verify unidimensionality (especially seeing the low number of participants)? Thanks a lot, statcat |
KMO is the omnibus indicator of how low the partial correlations
(which are the inverses of the off-diagonal entries of the
anti-image matrix) are. In common-model FA, we usually want partial
correlations to be low because we want a factor to load on more than
two variables.
Partial correlations (and KMO therefore) are sensitive to sample size and the number of items (variables) in the analysis. So, for FA, always aim to (1) have a big enough sample size; (2) have a sufficiently big number of items - so that they "cover" all the field of interest and consequently each common factor playing on that field happily can load 3+ items; (3) sample size > number of items at least 3 times. If you have low KMO in the above settings - drop "bad" items to improve KMO. But if you have low KMO when you have too small sample and too few items - it's natural. Get more respondents, invent more items then. Also, KMO is sensitive to the way you treat missing data. KMO is usually lower with pairwise deletion than with mean substitution. However, these both are not recommended (well, if you don't thing in terms of population and don't mean any inference over to it, mean substitution is tolerable). Use imputation instead. 29.11.2013 22:13, statcat пишет:
Hello there first of all, this is my first contribution so I hope formatting works out! *Background*: short-term cohort study with several test dates. One of the focuses was quality of life with different facets and items, mostly batteries out of WHO questionnaires with Likert scales. Being a clinical study, regular participants for the whole duration were few for this part of it: 10 in the test group, 10 in the control group, 2 more dropped out for the next step (that had changes to items), they do represent the only cases that qualified, so I'd argue they are a population rather than a sample. Questionnaire items were changed slightly, a few of them dropped. I was instructed to verify consistency, to ensure measures before and after were somewhat comparable. Dimensionality and reliability being what I chose to analyse. *Methods*: The facet at hand had 6 items; one was dropped to a grand total of 5. I attributed the highest importance to changes between different test values and prayed they were about the same. Cronbach=92s Alpha did not change a lot (0,62 -> 0,61, knowing how it overestimates with high number of variables, I was surprised it stayed constant with one item less) so the attention went to performing a PCA to verify if the % of variance explained by the items is still one main factor (implying one dimension as the WHO had confirmed in questionnaire creation). This was /ok/, stable from 43% to 41%. What was not ok however, were the changes in KMO and adequacy. I didn't think they would be high to begin with (expecting one known underlying factor beforehand) and I didn't pay them much attention since I wasn't about to extract and use factors anyway. First question here would be how important communalities and reporting them are (all over 0.6 except one at 0.57 for the case below)? When comparing to what I had before the change was pretty dramatic in KMO from 0,58 to a bit over 0,1 and going from 4/5 adequate variables to 0. The variables do not very highly inter-correlate as Field tends to mention might ruin a FA from the get-go. Instead the partial correlations shown left and right of the diagonal in the Anti Image correlation matrix very much do correlate. Corr Matrix V1 1,000 ,435 ,150 ,518 -,107 V2 ,435 1,000 ,361 ,302 ,541 V3 ,150 ,361 1,000 ,290 -,387 V4 ,518 ,302 ,290 1,000 ,138 V5 -,107 ,541 -,387 ,138 1,000 AIC Matrix (opposite sign shows part. corr) V1 ,143 -,891 ,843 -,819 ,882 V2 -,891 ,185 -,924 ,732 -,953 V3 ,843 -,924 ,113 -,774 ,934 V4 -,819 ,732 -,774 ,161 -,772 V5 ,882 -,953 ,934 -,772 ,130 Both measures being for sampling adequacy, can I just discard them as unimportant and take the following data I needed (first factor variance-%) or would this be not advisable? Can I just account for this as strange one-time thing as case n is so low? To add to this, what does a very high and conversely a very low KMO actually mean? Could I identify that a PCA is not recommended as there probably is one underlying factor and no reduction of dimension seems advisable (screeplots are pretty straight-forward as well: almost a line in this case as well as the original survey version)? Or are there really other tests (parallel analysis) more appropriate to verify unidimensionality (especially seeing the low number of participants)? Thanks a lot, statcat -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interpreting-very-high-and-very-low-KMO-adequacy-values-Dimensionality-tp5723371.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you Kirill for your response.
I suppose much of what I'm asking boils down to the following. You said "In common-model FA, we usually want partial correlations to be low because we want a factor to load on more than two variables" -- does this, inversely, mean, that if sample adequacy is low and/or partial (inter-)correlations very high as in my example, that the strongest factor can be singled out as it loads onto all variables? Alternatively, would parallel analysis confirm that one factor is the way to go? As mentioned at first, the sample size is limited to a specific set of people, from which the majority took part in the clinical study -- there are no other people to fit the sampling so to speak. I would not like to manipulate the data set: dropping expandable variables would give a good KMO and mean factoring is possible (implying multidimensionality over very few items) which is not the purpose of the analysis. |
Free forum by Nabble | Edit this page |