how calculate new variables from factor analysis?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how calculate new variables from factor analysis?

danl
Dear friends,

thanks for help. I see the procedure to calculate and save the factors, but I want to create a new variables from others 6.
my new variables will be about infrastructure farm, and I don`t know how to create. the first factor will be the new variables (infrastructure farm)?
what is necessary to make?

thanks!


infrastructura agricola.sav
Reply | Threaded
Open this post in threaded view
|

Re: new variables from factor analysis?

Hector Maletta
Dani wrote:

Dear friends,
thanks for help. I see the procedure to calculate and save the factors, but
I want to create a new variables from others 6.
my new variables will be about infrastructure farm, and I don`t know how to
create. the first factor will be the new variables (infrastructure farm)?
what is necessary to make?

thanks!

Dani,
I have been working on a similar problem recently, only my problem was
household infrastructure. Which factor scores enter your new variable
depends on your theory about the underlying concept of farm infrastructure.
Remember your various factors (unless obliquely rotated, which is not
necessary here) are independent from each other, i.e. uncorrelated. This
means that each factor explains one separate portion of the total variance
of the set of observed variables. In traditional (psychological)
applications of factor analysis the first factor (the one explaining most
variance) is supposed to represent the main psychological trait being
measured by observed answers (say, intelligence or aggression proclivity),
and the rest of factors are supposed to measure other traits that are
INDEPENDENT of the first (e.g. better understanding of the test language).
If you think that "farm infrastructure" is a concept encompassing CORRELATED
aspects, so that farms with high amounts of one aspect will tend to have
high amounts of other aspects as well (farms with longer internal irrigation
canals will tend to have also more sheds and more silos and more everything
relevant) then you should see whether the relevant variables are (a)
positively correlated to each other and (b) highly loaded on the first
factor. If so, the first factor is an adequate index of all those variables.

Of course, when I say "positively correlated to each other" I assume the
variables have the "correct" sign, so that always a higher value means
better infrastructure, otherwise you should expect negative correlation; for
instance, if one variable is the degree of silting, less is good and more is
bad, and that variable should have a negative correlation with longer canals
and more silos, but that is obvious).
It may also be that "farm infrastructure" is a complex concept with several
dimensions not necessarily correlated to each other. This may easily happen
when you have a heterogeneous farm sector with farms of several types, say
cattle ranches, dairy farms, extensive cereal farms and intensive
horticultural farms, with or without irrigation. Infrastructures relevant in
one type will be absent or irrelevant in another, and thus uncorrelated. In
that case you'll find some variables associated with the first factor, some
with the second, and so on. In that case you may have several INDEPENDENT
factors related to farm infrastructure. A certain farm may be low on Factor
1 and 2, but high on Factor 3.
In this situation you may take several analytical roads. Two of them are as
follows:
1. Rotation and secondary factor extraction. The factor structure may be
made clearer by a bit of rotation, making your factors correlated (to some
extent) to each other. Probably SOME characteristics most prevalent in
cereal farms are correlated with some characteristics more abundant in dairy
farms. Once you have done so, you may apply factor analysis for a second
time, now on the oblique (rotated and correlated) factor scores, including
only the main factors extracted in the first round, and now probably the
first factor will explain most of the variance in the oblique factors.
2. Factor score aggregation. You may construct a scale by aggregating Factor
1, 2, 3... (up to the number of factors you retain), weighted by their
contribution to the explanation of total variance explained by those
factors. That is, if the first factor explains 30% of total variance, and
you are using 5 factors explaining 75%, the weight of the first factor in
the scale will be 30/75. This aggregation, as explained in the precedent
sentence, applies only to orthogonal (independent, unrotated) factors. For
obliquely rotated (correlated) factors you must take their correlation into
account when constructing the weights. But I surmise the oblique rotation in
this case is not necessary.
These solutions rest on the theoretical assumption that farm infrastructure
is the argument of a production function with possibly independent inputs,
like a Leontief function with zero substitutability, which is adequate for
the orthogonal factor aggregation solution described above, or (for oblique
factors) a production function with certain degree of substitutability given
by the factor correlation matrix. The farm output would be a function of
farm infrastructure (thus measured) plus other inputs (labor, equipment,
current or expendable inputs like seed or fertilizer, etc.) to be measured
separately. If your theory about farm technology contradicts your
scale-construction choice, you should reconsider the scale choice (even if
your purpose is NOT to analyze farm production functions).
Hector