Interpreting very high and very low KMO/adequacy values. Dimensionality?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Interpreting very high and very low KMO/adequacy values. Dimensionality?

statcat
Hello there first of all, this is my first contribution so I hope formatting
works out!

*Background*: short-term cohort study with several test dates. One of the
focuses was quality of life with different facets and items, mostly
batteries out of WHO questionnaires with Likert scales. Being a clinical
study, regular participants for the whole duration were few for this part
of it: 10 in the test group, 10 in the control group, 2 more dropped out
for the next step (that had changes to items), they do represent the only
cases that qualified, so I'd argue they are a population rather than a
sample. Questionnaire items were changed slightly, a few of them dropped. I
was instructed to verify consistency, to ensure measures before and after
were somewhat comparable. Dimensionality and reliability being what I chose
to analyse.
*Methods*: The facet at hand had 6 items; one was dropped to a grand total
of 5. I attributed the highest importance to changes between different test
values and prayed they were about the same. Cronbach=92s Alpha did not change
a lot (0,62 -> 0,61, knowing how it overestimates with high number of variables,
I was surprised it stayed constant with one item less) so the attention went to
performing a PCA to verify if the % of variance explained by the items is still
one main factor (implying one dimension as the WHO had confirmed in questionnaire creation).
This was ok, stable from 43% to 41%. What was not ok however, were the
changes in KMO and adequacy. I didn't think they would be high to begin with
(expecting one known underlying factor beforehand) and I didn't pay them much
 attention since I wasn't about to extract and use factors anyway. First question
 here would be how important communalities and reporting them are (all over 0.6
except one at 0.57 for the case below)?

When comparing to what I had before the change was pretty dramatic in KMO
from 0,58 to a bit over 0,1 and going from 4/5 adequate variables to 0. The variables
do not very highly inter-correlate as Field tends to mention might ruin a FA from
the get-go. Instead the partial correlations shown left and right of the diagonal in the
Anti Image correlation matrix very much do correlate.

Corr Matrix
V1 1,000 ,435 ,150 ,518 -,107
V2 ,435 1,000 ,361 ,302 ,541
V3 ,150 ,361 1,000 ,290 -,387
V4 ,518 ,302 ,290 1,000 ,138
V5 -,107 ,541 -,387 ,138 1,000

AIC Matrix (opposite sign shows part. corr)
V1 ,143 -,891 ,843 -,819 ,882
V2 -,891 ,185 -,924 ,732 -,953
V3 ,843 -,924 ,113 -,774 ,934
V4 -,819 ,732 -,774 ,161 -,772
V5 ,882 -,953 ,934 -,772 ,130

Both measures being for sampling adequacy, can I just discard them as
unimportant and take the following data I needed (first factor variance-%)
or would this be not advisable? Can I just account for this as strange
one-time thing as case n is so low? To add to this, what does a very high
and conversely a very low KMO actually mean? Could I identify that a PCA is
not recommended as there probably is one underlying factor and no reduction
of dimension seems advisable (screeplots are pretty straight-forward as
well: almost a line in this case as well as the original survey version)?
Or are there really other tests (parallel analysis) more appropriate to
verify unidimensionality (especially seeing the low number of participants)?

Thanks a lot,
statcat
Reply | Threaded
Open this post in threaded view
|

Re: Interpreting very high and very low KMO/adequacy values. Dimensionality?

Kirill Orlov
KMO is the omnibus indicator of how low the partial correlations (which are the inverses of the off-diagonal entries of the anti-image matrix) are. In common-model FA, we usually want partial correlations to be low because we want a factor to load on more than two variables.

Partial correlations (and KMO therefore) are sensitive to sample size and the number of items (variables) in the analysis. So, for FA, always aim to (1) have a big enough sample size; (2) have a sufficiently big number of items - so that they "cover" all the field of interest and consequently each common factor playing on that field happily can load 3+ items; (3) sample size > number of items at least 3 times.

If you have low KMO in the above settings - drop "bad" items to improve KMO. But if you have low KMO when you have too small sample and too few items - it's natural. Get more respondents, invent more items then.

Also, KMO is sensitive to the way you treat missing data. KMO is usually lower with pairwise deletion than with mean substitution. However, these both are not recommended (well, if you don't thing in terms of population and don't mean any inference over to it, mean substitution is tolerable). Use imputation instead.


29.11.2013 22:13, statcat пишет:
Hello there first of all, this is my first contribution so I hope formatting
works out!

*Background*: short-term cohort study with several test dates. One of the
focuses was quality of life with different facets and items, mostly
batteries out of WHO questionnaires with Likert scales. Being a clinical
study, regular participants for the whole duration were few for this part
of it: 10 in the test group, 10 in the control group, 2 more dropped out
for the next step (that had changes to items), they do represent the only
cases that qualified, so I'd argue they are a population rather than a
sample. Questionnaire items were changed slightly, a few of them dropped. I
was instructed to verify consistency, to ensure measures before and after
were somewhat comparable. Dimensionality and reliability being what I chose
to analyse.
*Methods*: The facet at hand had 6 items; one was dropped to a grand total
of 5. I attributed the highest importance to changes between different test
values and prayed they were about the same. Cronbach=92s Alpha did not
change
a lot (0,62 -> 0,61, knowing how it overestimates with high number of
variables,
I was surprised it stayed constant with one item less) so the attention went
to
performing a PCA to verify if the % of variance explained by the items is
still
one main factor (implying one dimension as the WHO had confirmed in
questionnaire creation).
This was /ok/, stable from 43% to 41%. What was not ok however, were the
changes in KMO and adequacy. I didn't think they would be high to begin with
(expecting one known underlying factor beforehand) and I didn't pay them
much
 attention since I wasn't about to extract and use factors anyway. First
question
 here would be how important communalities and reporting them are (all over
0.6
except one at 0.57 for the case below)?

When comparing to what I had before the change was pretty dramatic in KMO
from 0,58 to a bit over 0,1 and going from 4/5 adequate variables to 0. The
variables
do not very highly inter-correlate as Field tends to mention might ruin a FA
from
the get-go. Instead the partial correlations shown left and right of the
diagonal in the
Anti Image correlation matrix very much do correlate.

Corr Matrix
V1 1,000 ,435 ,150 ,518 -,107
V2 ,435 1,000 ,361 ,302 ,541
V3 ,150 ,361 1,000 ,290 -,387
V4 ,518 ,302 ,290 1,000 ,138
V5 -,107 ,541 -,387 ,138 1,000

AIC Matrix (opposite sign shows part. corr)
V1 ,143 -,891 ,843 -,819 ,882
V2 -,891 ,185 -,924 ,732 -,953
V3 ,843 -,924 ,113 -,774 ,934
V4 -,819 ,732 -,774 ,161 -,772
V5 ,882 -,953 ,934 -,772 ,130

Both measures being for sampling adequacy, can I just discard them as
unimportant and take the following data I needed (first factor variance-%)
or would this be not advisable? Can I just account for this as strange
one-time thing as case n is so low? To add to this, what does a very high
and conversely a very low KMO actually mean? Could I identify that a PCA is
not recommended as there probably is one underlying factor and no reduction
of dimension seems advisable (screeplots are pretty straight-forward as
well: almost a line in this case as well as the original survey version)?
Or are there really other tests (parallel analysis) more appropriate to
verify unidimensionality (especially seeing the low number of participants)?

Thanks a lot,
statcat



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interpreting-very-high-and-very-low-KMO-adequacy-values-Dimensionality-tp5723371.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




Reply | Threaded
Open this post in threaded view
|

Re: Interpreting very high and very low KMO/adequacy values. Dimensionality?

statcat
Thank you Kirill for your response.

I suppose much of what I'm asking boils down to the following.
You said "In common-model FA, we usually want partial correlations to be low because we want a factor to load on more than two variables" -- does this, inversely, mean, that if sample adequacy is low and/or partial (inter-)correlations very high as in my example, that the strongest factor can be singled out as it loads onto all variables? Alternatively, would parallel analysis confirm that one factor is the way to go?

As mentioned at first, the sample size is limited to a specific set of people, from which the majority took part in the clinical study -- there are no other people to fit the sampling so to speak. I would not like to manipulate the data set: dropping expandable variables would give a good KMO and mean factoring is possible (implying multidimensionality over very few items) which is not the purpose of the analysis.