SPSSX Discussion

SPSS Principal Components Analysis with Horn’s parallel analysis

Classic

List

Threaded

18 messages Options

Johnny Amora

SPSS Principal Components Analysis with Horn’s parallel analysis

Hello everyone,

Syntax for SPSS Principal Components Analysis with Horn’s parallel analysis to
determine significant eigenvalues is highly solicited.

Thank you.
J. Amora

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kylie

RE: SPSS Principal Components Analysis with Horn's parallel analysis

Hi Johnny,

This site may help you: http://flash.lakeheadu.ca/~boconno2/nfactors.html

Kylie.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Johnny Amora
Sent: Wednesday, 9 July 2008 10:40 am
To: [hidden email]
Subject: SPSS Principal Components Analysis with Horns parallel analysis

Hello everyone,

Syntax for SPSS Principal Components Analysis with Horns parallel analysis
to
determine significant eigenvalues is highly solicited.

Thank you.
J. Amora

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Zdaniuk, Bozena-2

insufficient N for factor analysis

Hello, I was asked to do a factor analysis of 40 variables but I only have 70 cases. Needless to say, I had to increase iterations to 100 to get the program to converge and I still believe that it makes no sense to do a factor analysis with less than 2 cases per variable. I was then asked to provide a citation for that. Could someone point me to a source discussing the minimum case per variable requirement for factor analysis that I can cite? Thanks a lot.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

SR Millis-3

Re: insufficient N for factor analysis

Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of:

50 as very poor
100 as poor
200 as fair
300 as good
500 as very good
1000 as excellent for factor analysis.

Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases.

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

--- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote:

> From: Zdaniuk, Bozena <[hidden email]>
> Subject: insufficient N for factor analysis
> To: [hidden email]
> Date: Wednesday, July 9, 2008, 12:03 PM
> Hello, I was asked to do a factor analysis of 40 variables
> but I only have 70 cases. Needless to say, I had to
> increase iterations to 100 to get the program to converge
> and I still believe that it makes no sense to do a factor
> analysis with less than 2 cases per variable. I was then
> asked to provide a citation for that. Could someone point
> me to a source discussing the minimum case per variable
> requirement for factor analysis that I can cite? Thanks a
> lot.
> Bozena
>
> Bozena Zdaniuk, Ph.D.
> University of Pittsburgh
> UCSUR, 6th Fl.
> 121 University Place
> Pittsburgh, PA 15260
> Ph.: 412-624-5736
> Fax: 412-624-4810
> Email: [hidden email]
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

Robert Marshall-7

Re: insufficient N for factor analysis

In reply to this post by Zdaniuk, Bozena-2

Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003. Washington DC. Page 100. Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables."

--
Robert A. Marshall, PhD, PMP
Atlanta, GA 30030

-------------- Original message --------------
From: "Zdaniuk, Bozena" <[hidden email]>

> Hello, I was asked to do a factor analysis of 40 variables but I only have 70
> cases. Needless to say, I had to increase iterations to 100 to get the program
> to converge and I still believe that it makes no sense to do a factor analysis
> with less than 2 cases per variable. I was then asked to provide a citation for
> that. Could someone point me to a source discussing the minimum case per
> variable requirement for factor analysis that I can cite? Thanks a lot.
> Bozena
>
> Bozena Zdaniuk, Ph.D.
> University of Pittsburgh
> UCSUR, 6th Fl.
> 121 University Place
> Pittsburgh, PA 15260
> Ph.: 412-624-5736
> Fax: 412-624-4810
> Email: [hidden email]
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

news

Re: insufficient N for factor analysis

In reply to this post by SR Millis-3

And how do you find a structure when you only have 50 cases because the
institution analysed is constituted of 50 services ? Besides common sese
guessing which form of common pattern detection can be used in this case ?

Regards
Frank Thomas

SR Millis wrote:

> Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of:
>
> 50 as very poor
> 100 as poor
> 200 as fair
> 300 as good
> 500 as very good
> 1000 as excellent for factor analysis.
>
> Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases.
>
>
> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
> Professor & Director of Research
> Dept of Physical Medicine & Rehabilitation
> Wayne State University School of Medicine
> 261 Mack Blvd
> Detroit, MI 48201
> Email: [hidden email]
> Tel: 313-993-8085
> Fax: 313-966-7682
>
>
> --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote:
>
>
>> From: Zdaniuk, Bozena <[hidden email]>
>> Subject: insufficient N for factor analysis
>> To: [hidden email]
>> Date: Wednesday, July 9, 2008, 12:03 PM
>> Hello, I was asked to do a factor analysis of 40 variables
>> but I only have 70 cases. Needless to say, I had to
>> increase iterations to 100 to get the program to converge
>> and I still believe that it makes no sense to do a factor
>> analysis with less than 2 cases per variable. I was then
>> asked to provide a citation for that. Could someone point
>> me to a source discussing the minimum case per variable
>> requirement for factor analysis that I can cite? Thanks a
>> lot.
>> Bozena
>>
>> Bozena Zdaniuk, Ph.D.
>> University of Pittsburgh
>> UCSUR, 6th Fl.
>> 121 University Place
>> Pittsburgh, PA 15260
>> Ph.: 412-624-5736
>> Fax: 412-624-4810
>> Email: [hidden email]
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body
>> text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the
>> command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

Dale Glaser

Re: insufficient N for factor analysis

In reply to this post by Robert Marshall-7

I would look at the article by McCallum et al in Psycholgical Methods as well as some in MBR that show problems with rules of thumb for EFA......one needs to take into account scaling issues, over/under determination, communalities/saturation, etc..........

Robert Marshall <[hidden email]> wrote: Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003. Washington DC. Page 100. Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables."

--
Robert A. Marshall, PhD, PMP
Atlanta, GA 30030

-------------- Original message --------------
From: "Zdaniuk, Bozena"

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Dale Glaser, Ph.D.
Principal--Glaser Consulting
Lecturer/Adjunct Faculty--SDSU/USD/AIU
President, San Diego Chapter of
American Statistical Association
3115 4th Avenue
San Diego, CA 92103
phone: 619-220-0602
fax: 619-220-0412
email: [hidden email]
website: www.glaserconsult.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: insufficient N for factor analysis

In reply to this post by Zdaniuk, Bozena-2

I do not remember a specific citation, but the general idea is that factor
analysis is a derivation of regression, and regression rests on the normal
distribution of estimation errors. This normal distribution of estimation
errors is known as "the law of large numbers" and is a tendency shown by
errors as N gets larger and larger. More exactly, as the "degrees of
freedom" get larger. The degrees of freedom equal number of cases minus
number of variables, N-k-1, which in your case is quite small. As the number
of cases are few, the margin of error of your estimates will be very wide,
and you could not be sure of their probable true value in the universe or
population, especially for minor factors after the first or second one,
where the coefficients or loadings will be close to zero (and there may
therefore be difficult to tell whether they are not zero in the population).

An old rule of thumb says you need at the very least 10 cases per variable,
but this is "the very least". With less than 30-50 cases experimental error
distributions hardly (or very infrequently) resemble a normal curve.
So my advise is you try a model with fewer variables, possibly one
underlying factor if your 40 variables are mostly explained by one
overarching factor, or abandon factor analysis altogether and try some more
modest approaches like a simple summatory scale, simple regression, 2 or 3
way cross tabulations, and the like. Next time, go bigger in your sample
design. And then again, do you really have a theory that is so complex that
no less than 40 independent factors are required by it? Isaac Newton
explained the universe with only two or three variables, and did very well
indeed, thank you.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Zdaniuk, Bozena
Sent: 09 July 2008 13:03
To: [hidden email]
Subject: insufficient N for factor analysis

Hello, I was asked to do a factor analysis of 40 variables but I only have
70 cases. Needless to say, I had to increase iterations to 100 to get the
program to converge and I still believe that it makes no sense to do a
factor analysis with less than 2 cases per variable. I was then asked to
provide a citation for that. Could someone point me to a source discussing
the minimum case per variable requirement for factor analysis that I can
cite? Thanks a lot.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Schacht-3

Re: insufficient N for factor analysis

At 08:48 AM 7/9/2008, Hector Maletta wrote:

>I do not remember a specific citation, but the general idea is that factor
>analysis is a derivation of regression, and regression rests on the normal
>distribution of estimation errors. This normal distribution of estimation
>errors is known as "the law of large numbers" and is a tendency shown by
>errors as N gets larger and larger. More exactly, as the "degrees of
>freedom" get larger. The degrees of freedom equal number of cases minus
>number of variables, N-k-1, which in your case is quite small. As the number
>of cases are few, the margin of error of your estimates will be very wide,
>and you could not be sure of their probable true value in the universe or
>population, especially for minor factors after the first or second one,
>where the coefficients or loadings will be close to zero (and there may
>therefore be difficult to tell whether they are not zero in the population).
>
>An old rule of thumb says you need at the very least 10 cases per variable,
>but this is "the very least". With less than 30-50 cases experimental error
>distributions hardly (or very infrequently) resemble a normal curve.
>So my advise is you try a model with fewer variables, possibly one
>underlying factor if your 40 variables are mostly explained by one
>overarching factor, or abandon factor analysis altogether and try some more
>modest approaches like a simple summatory scale, simple regression, 2 or 3
>way cross tabulations, and the like. Next time, go bigger in your sample
>design. And then again, do you really have a theory that is so complex that
>no less than 40 independent factors are required by it? Isaac Newton
>explained the universe with only two or three variables, and did very well
>indeed, thank you.
>Hector

I have been following this discussion with much interest, as I have a
similar problem at hand.
For years, we have been conducting a consumer satisfaction survey that
consists of one page, about 10 questions, plus a single open-ended
question. Although the questions were intended to probe consumer
satisfaction in a number of different areas, basically the level of
correlation is so high that it seems that we're really only tracking one
factor: overall satisfaction.

So we conducted literature reviews, and went back to the drawing boards,
formulating more than 100 questions in 6 broad areas of consumer
satisfaction. Our intention was to pilot test these questions with
participants, examine the results, throw out the redundant questions
(discerned through factor analysis), and emerge with, say, 20 questions
known to reflect different dimensions of consumer satisfaction. However,
our sample size thus far is in the pitiful range: perhaps 35 respondents.
Needless to say, we have a long way to go. With our response rates, and
consumer base, we would be lucky to get more than 100 respondents in a year.

In order to improve the subjects to variables ratio (STV), we need either
to greatly increase the sample size (which is difficult for us to do), or
reduce the number of variables, or both. Our questions are short simple
statements requesting responses on a 5-point likert scale. Some of the
questions are worded in almost identical language, and some of these are
almost certainly redundant. Given our relatively small sample size thus
far, what is the best way to proceed to remove redundant questions while
retaining maximum diversity of responses?

From one perspective, it would appear that rank correlations might be the
preferred measure of association, but I wonder if Likert scales are,
analytically speaking, equivalent to rank order variables? What other
measures would be most appropriate? I hesitate to downgrade the measure of
association to categorical, because that throws out the information on
directionality and degree. Likewise, I hesitate to overgrade the measure of
association to ratio, because clearly the intervals are arbitrary and not
additive.

Intuitively, I am seeking to extract, out of these 100 questions, 4-5
groups of 2-3 questions each, such that within-group correlations are high,
but correlations with the other groups are low. The within-group redundancy
reinforces degree of satisfaction with that particular factor, and the low
between-group correlation assures that different aspects of satisfaction
are represented.

Suggestions, please?

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: insufficient N for factor analysis

In such kind of case my main suggestion is forget about factor analysis, and
simply try to add up the number of "correct" answers. If all questions are
highly correlated and clearly measure various aspects of overall
satisfaction, subtle differences in weighting (provided by factor analysis)
would not matter much, and would probably vary from one sample to the next.
So go ahead with a no-weight (i.e. equal weight) scale and relax. You can
check whether this simple additive score still correlates well with
individual questions, and with other (external) indicators associated with
satisfaction (such as returning for more), but assuming all goes well the
simple scale is easier to compute, easier to explain, and lacks the many
statistical pitfalls of factor analysis and regression. It only lacks the
false pretenses of scientificity coming from mere difficulty or
sophistication, and some people live off being difficult, and get famous
just because of that, e.g. some postmodern "philosophes", but you better
don't care much about that.
Hector

-----Original Message-----
From: Bob Schacht [mailto:[hidden email]]
Sent: 09 July 2008 18:11
To: Hector Maletta; [hidden email]
Subject: Re: insufficient N for factor analysis

At 08:48 AM 7/9/2008, Hector Maletta wrote:
>I do not remember a specific citation, but the general idea is that factor
>analysis is a derivation of regression, and regression rests on the normal
>distribution of estimation errors. This normal distribution of estimation
>errors is known as "the law of large numbers" and is a tendency shown by
>errors as N gets larger and larger. More exactly, as the "degrees of
>freedom" get larger. The degrees of freedom equal number of cases minus
>number of variables, N-k-1, which in your case is quite small. As the
number
>of cases are few, the margin of error of your estimates will be very wide,
>and you could not be sure of their probable true value in the universe or
>population, especially for minor factors after the first or second one,
>where the coefficients or loadings will be close to zero (and there may
>therefore be difficult to tell whether they are not zero in the
population).

>
>An old rule of thumb says you need at the very least 10 cases per variable,
>but this is "the very least". With less than 30-50 cases experimental error
>distributions hardly (or very infrequently) resemble a normal curve.
>So my advise is you try a model with fewer variables, possibly one
>underlying factor if your 40 variables are mostly explained by one
>overarching factor, or abandon factor analysis altogether and try some more
>modest approaches like a simple summatory scale, simple regression, 2 or 3
>way cross tabulations, and the like. Next time, go bigger in your sample
>design. And then again, do you really have a theory that is so complex that
>no less than 40 independent factors are required by it? Isaac Newton
>explained the universe with only two or three variables, and did very well
>indeed, thank you.
>Hector

Edgar F. Johns

Re: insufficient N for factor analysis

In reply to this post by SR Millis-3

I'm not sure the effort is worth it, but....

You can try to use Dwyer's extension analysis. You start by creating a set
of homogenous item packages or parcels - combine sets of 2-4 items into new
scales by reviewing the item correlations (combine those items with the
highest inter-item correlations). Then, factor analyze the item parcels (you
will have reduced the number of variables in the factor analysis to about
10-15 (instead of 40). Convergence and iterations should behave better.
Rotate and then use the Dwyer extension procedure described in Gorsuch
(1983) Factor Analysis (2nd Ed.) on pages 236-238. Essentially, the factor
solution of the parcels is projected onto the original set of items. You'll
get your factor structure and pattern matrix (if you rotate obliquely) of
the 40 items.

If you need some background on item parceling, you can find out more about
it by searching "item parcels." I know their use is controversial. You can
also check up on Andrew Comrey's work in developing his personality
inventory and Ray Cattell's work.

Edgar
---
Discover Technologies
2906 River Meadow Circle.
Canton, MI 48188
(734) 564-4964
(734) 468-0800 fax

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SR
Millis
Sent: Wednesday, July 09, 2008 12:23 PM
To: [hidden email]
Subject: Re: insufficient N for factor analysis

Comrey & Lee (1992, A first course in factor analysis) give as a guide
sample sizes of:

50 as very poor
100 as poor
200 as fair
300 as good
500 as very good
1000 as excellent for factor analysis.

Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at
least 300 cases.

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

--- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote:

Linda Zientek

Re: insufficient N for factor analysis

In reply to this post by Bob Schacht-3

In addition to the recommended ratios of 10 to 20 people per variable, the following has also been suggested:

Some Monte Carlo simulation research (Guadagnoli & Velincer, 1998) suggest ... replicable factors tend to be estimated if:
1. factors are each defined by four or more measured variables with structure coefficients each great than .6 [in absolute value], regardless or sample size; or
2. factors are each defined with 10 or more structure coefficients each around .4[in absolute value], if sample size is greater than 150; or
3. sample size is at least 300." (Thompson, 2004, p. 24)

Linda

Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association.

--- On Wed, 7/9/08, Bob Schacht <[hidden email]> wrote:

From: Bob Schacht <[hidden email]>
Subject: Re: insufficient N for factor analysis
To: [hidden email]
Date: Wednesday, July 9, 2008, 4:10 PM

At 08:48 AM 7/9/2008, Hector Maletta wrote:
>I do not remember a specific citation, but the general idea is that factor
>analysis is a derivation of regression, and regression rests on the normal
>distribution of estimation errors. This normal distribution of estimation
>errors is known as "the law of large numbers" and is a tendency
shown by
>errors as N gets larger and larger. More exactly, as the "degrees of
>freedom" get larger. The degrees of freedom equal number of cases
minus
>number of variables, N-k-1, which in your case is quite small. As the
number
>of cases are few, the margin of error of your estimates will be very wide,
>and you could not be sure of their probable true value in the universe or
>population, especially for minor factors after the first or second one,
>where the coefficients or loadings will be close to zero (and there may
>therefore be difficult to tell whether they are not zero in the
population).
>
>An old rule of thumb says you need at the very least 10 cases per variable,
>but this is "the very least". With less than 30-50 cases
experimental error

>distributions hardly (or very infrequently) resemble a normal curve.
>So my advise is you try a model with fewer variables, possibly one
>underlying factor if your 40 variables are mostly explained by one
>overarching factor, or abandon factor analysis altogether and try some more
>modest approaches like a simple summatory scale, simple regression, 2 or 3
>way cross tabulations, and the like. Next time, go bigger in your sample
>design. And then again, do you really have a theory that is so complex that
>no less than 40 independent factors are required by it? Isaac Newton
>explained the universe with only two or three variables, and did very well
>indeed, thank you.
>Hector

Juanito Talili

Re: insufficient N for factor analysis

In reply to this post by SR Millis-3

For me 100 is not poor if I have only 10 variables and 500 is not very good if I have more than 100 variables. I think we should consider the number of variables rather than the sample size alone.

J Talili

--- On Wed, 7/9/08, SR Millis <[hidden email]> wrote:
From: SR Millis <[hidden email]>
Subject: Re: insufficient N for factor analysis
To: [hidden email]
Date: Wednesday, July 9, 2008, 4:22 PM

Comrey & Lee (1992, A first course in factor analysis) give as a guide
sample sizes of:

50 as very poor
100 as poor
200 as fair
300 as good
500 as very good
1000 as excellent for factor analysis.

Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at
least 300 cases.

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

--- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote:

Bob Schacht-3

Re: insufficient N for factor analysis

In reply to this post by Hector Maletta

At 08:48 AM 7/9/2008, Hector Maletta wrote:

Hector (and anyone else),
I have been pondering your advice to Bozena (see below), since my situation
is similar, only much worse.
In my previous note, I wrote that I have over 100 variables (potential
questions for a survey), and so far only about 30 pilot test responses.

One thought that occurs to me is that our 100 variables actually fall into
half a dozen groups. Each group of questions was designed to elicit a
particular dimension of consumer satisfaction. Rather than attempting to
run a factor analysis on all 100+ variables at once, with so few cases,
would it make more sense to
* run the factor analysis on one group of questions at a time
* reduce the group to one or two questions with the highest loadings on
the principal component
* repeat the above procedure for each group of questions
* Finally, conduct a factor analysis on the reduced set of variables to
test the hypothesis that consumer satisfaction as reflected in this set of
questions really is multidimensional.
The guiding theory here is that consumer satisfaction has multiple
components. Each group of questions is designed to elicit degree of
satisfaction with a particular dimension of consumer experience suggested
in the literature. There is a great deal of overlap in the language of the
questions, as we seek to identify the language that has resonance with our
consumers. Our goal is to develop a consumer satisfaction instrument for
our agency that is genuinely multidimensional, allowing the agency to get a
better idea of where improvements are most needed. Our current instrument
is short, and seems to address different issues, but the answers we get are
so highly correlated that we really only seem to be measuring global
satisfaction, which is really not a very useful result.

Thanks in advance,

Bob Schacht

>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Zdaniuk, Bozena
>Sent: 09 July 2008 13:03
>To: [hidden email]
>Subject: insufficient N for factor analysis
>
>Hello, I was asked to do a factor analysis of 40 variables but I only have
>70 cases. Needless to say, I had to increase iterations to 100 to get the
>program to converge and I still believe that it makes no sense to do a
>factor analysis with less than 2 cases per variable. I was then asked to
>provide a citation for that. Could someone point me to a source discussing
>the minimum case per variable requirement for factor analysis that I can
>cite? Thanks a lot.
>Bozena
>
>Bozena Zdaniuk, Ph.D.
>University of Pittsburgh
>UCSUR, 6th Fl.
>121 University Place
>Pittsburgh, PA 15260
>Ph.: 412-624-5736
>Fax: 412-624-4810
>Email: [hidden email]
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

news

Re: insufficient N for factor analysis

In reply to this post by news

I would like to come back to my question: how to reduce the complexity
of a large set of variables if you have few cases ?

This happens in comparative political science all the time when you have
countries as cases and a large set of variables that describe them.

I now have a set of some 20 countries in Europe. If you study the EU
member states at a aggregate level today you have 27 countries. There
are no more member states. I have even fewer cases due to unequal
covering of the countries in my sources (the OECD data do not survey the
same countries as the EU, the European Social Survey, etc.).
At the same time I have a large set of variables describing the
economic, social, and cultural structure of the same 20 countries. So
how to find a a pattern in the variables if the condition of 1O cases
per variable for a sound factor analysis are not met ?

A second question:
Factor analysis does not print the KMO or AIC info, even if I demand all
stats in the print command. Is this due to the low no. of cases ? How
can I force SPSS to print the KMO or the AIC info ?

TIA

Frank Thomas

Frank Thomas wrote:

> And how do you find a structure when you only have 50 cases because the
> institution analysed is constituted of 50 services ? Besides common sese
> guessing which form of common pattern detection can be used in this case ?
>
> Regards
> Frank Thomas
>
> SR Millis wrote:
>> Comrey & Lee (1992, A first course in factor analysis) give as a guide
>> sample sizes of:
>>
>> 50 as very poor
>> 100 as poor
>> 200 as fair
>> 300 as good
>> 500 as very good
>> 1000 as excellent for factor analysis.
>>
>> Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend
>> at least 300 cases.
>>
>>
>> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
>> Professor & Director of Research
>> Dept of Physical Medicine & Rehabilitation
>> Wayne State University School of Medicine
>> 261 Mack Blvd
>> Detroit, MI 48201
>> Email: [hidden email]
>> Tel: 313-993-8085
>> Fax: 313-966-7682
>>
>>
>> --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote:
>>
>>
>>> From: Zdaniuk, Bozena <[hidden email]>
>>> Subject: insufficient N for factor analysis
>>> To: [hidden email]
>>> Date: Wednesday, July 9, 2008, 12:03 PM
>>> Hello, I was asked to do a factor analysis of 40 variables
>>> but I only have 70 cases. Needless to say, I had to
>>> increase iterations to 100 to get the program to converge
>>> and I still believe that it makes no sense to do a factor
>>> analysis with less than 2 cases per variable. I was then
>>> asked to provide a citation for that. Could someone point
>>> me to a source discussing the minimum case per variable
>>> requirement for factor analysis that I can cite? Thanks a
>>> lot.
>>> Bozena
>>>
>>> Bozena Zdaniuk, Ph.D.
>>> University of Pittsburgh
>>> UCSUR, 6th Fl.
>>> 121 University Place
>>> Pittsburgh, PA 15260
>>> Ph.: 412-624-5736
>>> Fax: 412-624-4810
>>> Email: [hidden email]
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body
>>> text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the
>>> command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

Art Kendall

Re: insufficient N for factor analysis

Some kludges.

Create meaningful subsets of the variables.

Sidestep the question about whether the obtained matrices are reasonable representations of the population matrix.
IFF you want to consider the 27 countries the total population about which you wish to make statements, then take a large dose of salt, hold your nose, and pretend that the obtained correlation matrix IS the population matrix. Write out the matrix products (means, SDs, Rs) and read them back in faking the number of cases. Use unit weights to create summative scores of standardized item variables.

create a few nominal level variables that relate to clusters of countries based on clusters of countries on the subsets mentioned above. Add an additional cluster identifier for cases that do not have the variables to create the cluster. Each membership value in the clustering would stand for a meaningful profile

Relate the cluster memberships to each other with CROSSTABS, CATPCA and TWOSTEP treating the membership variables as nominal level.

Create choropleth (patch) maps of the memberships. Try different coordinate systems including weighting visual area by population.

Relate the cluster memberships to variables that were not used to create that clustering. E.g., relate industrial clusters to housing variables, etc.

Art Kendall
Social Research Consultants

ftr wrote:

I would like to come back to my question: how to reduce the complexity
of a large set of variables if you have few cases ?

This happens in comparative political science all the time when you have
countries as cases and a large set of variables that describe them.

I now have a set of some 20 countries in Europe. If you study the EU
member states at a aggregate level today you have 27 countries. There
are no more member states. I have even fewer cases due to unequal
covering of the countries in my sources (the OECD data do not survey the
same countries as the EU, the European Social Survey, etc.).
At the same time I have a large set of variables describing the
economic, social, and cultural structure of the same 20 countries. So
how to find a a pattern in the variables if the condition of 1O cases
per variable for a sound factor analysis are not met ?

A second question:
Factor analysis does not print the KMO or AIC info, even if I demand all
stats in the print command. Is this due to the low no. of cases ? How
can I force SPSS to print the KMO or the AIC info ?

TIA

Frank Thomas

Frank Thomas wrote:

And how do you find a structure when you only have 50 cases because the
institution analysed is constituted of 50 services ? Besides common sese
guessing which form of common pattern detection can be used in this case ?

Regards
Frank Thomas

SR Millis wrote:

Comrey & Lee (1992, A first course in factor analysis) give as a guide
sample sizes of:

50 as very poor
100 as poor
200 as fair
300 as good
500 as very good
1000 as excellent for factor analysis.

Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend
at least 300 cases.

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

--- On Wed, 7/9/08, Zdaniuk, Bozena [hidden email] wrote:

From: Zdaniuk, Bozena [hidden email]
Subject: insufficient N for factor analysis
To: [hidden email]
Date: Wednesday, July 9, 2008, 12:03 PM
Hello, I was asked to do a factor analysis of 40 variables
but I only have 70 cases. Needless to say, I had to
increase iterations to 100 to get the program to converge
and I still believe that it makes no sense to do a factor
analysis with less than 2 cases per variable. I was then
asked to provide a citation for that. Could someone point
me to a source discussing the minimum case per variable
requirement for factor analysis that I can cite? Thanks a
lot.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body
text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the
command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Peck, Jon

Re: insufficient N for factor analysis

In reply to this post by news

Another approach you might consider is Partial Least Squares. This is useful for both categorical and continuous (scale) dependent variables. This is available in SPSS Statistics v16 or 17 as an add-in via programmability that can be downloaded from Developer Central (www.spss.com/devcentral). Of course, you don't get all the inferential apparatus of traditional regression methods, but it has the advantage of finding best combinations of predictors for particular dependent variables.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ftr
Sent: Saturday, February 28, 2009 5:51 PM
To: [hidden email]
Subject: Re: [SPSSX-L] insufficient N for factor analysis

I would like to come back to my question: how to reduce the complexity
of a large set of variables if you have few cases ?

This happens in comparative political science all the time when you have
countries as cases and a large set of variables that describe them.

I now have a set of some 20 countries in Europe. If you study the EU
member states at a aggregate level today you have 27 countries. There
are no more member states. I have even fewer cases due to unequal
covering of the countries in my sources (the OECD data do not survey the
same countries as the EU, the European Social Survey, etc.).
At the same time I have a large set of variables describing the
economic, social, and cultural structure of the same 20 countries. So
how to find a a pattern in the variables if the condition of 1O cases
per variable for a sound factor analysis are not met ?

A second question:
Factor analysis does not print the KMO or AIC info, even if I demand all
stats in the print command. Is this due to the low no. of cases ? How
can I force SPSS to print the KMO or the AIC info ?

TIA

Frank Thomas

Frank Thomas wrote:

SPSS Support

Re: insufficient N for factor analysis

Hi Frank,
In response to your second question, the KMO and AIC (anti-image correlation) are not printed when the correlation matrix is nonpositive definite, which seems likely to apply from your description of the numbers of cases and variables in your study. I've pasted a related resolution from the support web site
( http://support.spss.com ) below.

David Matheson
SPSS Statistical Support

*********************

Resolution number: 20414 Created on: Aug 21 2001 Last Reviewed on: Feb 28 2009

Problem Subject: FACTOR does not print KMO or Bartlett test for Nonpositive Definite Matrices

Problem Description: I have run the SPSS FACTOR procedure with principal components analysis (PCA) as the extraction method. I requested the Kaiser-Mayer-Olkin (KMO) measure of sample adequacy and the Bartlett test of sphericity but neither of these measures was printed. The "Communalities", "Total Variance Explained" and "Component Matrix" tables were printed. Why was my request for KMO and Bartlett's sphericity test ignored?

Resolution Subject: KMO, Bartlett's sphericity, and anti-image correlation not printed for nonpositive definite matrices

Resolution Description:
It is likely the case that your correlation matrix is nonpositive definite (NPD), i.e., that some of the eigenvalues of your correlation matrix are not positive numbers. If this is the case, there will be a footnote to the correlation matrix that states "This matrix is not positive definite." Even if you did not request the correlation matrix as part of the FACTOR output, requesting the KMO or Bartlett test will cause the title "Correlation Matrix" to be printed. The footnote will be printed under this title if the correlation matrix was not requested. An NPD matrix will also result in suppression of other output from the 'Descriptives' dialog of the Factor dialog, namely the inverse of the correlation matrix, the anti-image correlation matrix, and the significance values for the correlations. If you had requested a factor extraction method other than PCA or unweighted least squares (ULS), an NPD matrix would have caused the procedure to stop without further analysis.

Matrices can be NPD as a result of various other properties. A correlation matrix will be NPD if there are linear dependencies among the variables, as reflected by one or more eigenvalues of 0. For example, if variable X12 can be reproduced by a weighted sum of variables X5, X7, and X10, then there is a linear dependency among those variables and the correlation matrix that includes them will be NPD. If there are more variables in the analysis than there are cases, then the correlation matrix will have linear dependencies and be NPD. Remember that FACTOR uses listwise deletion of cases with missing data by default. If you had more cases in the file than variables in the analysis but also had many missing values, listwise deletion could leave you with more variables than retained cases. Pairwise deletion of missing data can also lead to NPD matrices. Negative eigenvalues may be present in these situations. See the following chapter for a helpful discussion and illustration of!
how this
can happen.

Wothke, W. (1993) Nonpositive definite matrices in structural modeling. In K.A. Bollen & J.S. Long (Eds.), Testing Structural Equation Models. Newbury Park NJ: Sage. (Chap. 11, pp. 256-293).

Elements of the KMO and Bartlett test statistic can not be calculated if the correlation matrix is NPD. See the formulae for these statistics in the current Statistical Algorithms documentation by clicking Help->Algorithms in SPSS, then scrolling down to the link for Factor Algorithms. Then click the link for Optional Statistics. . The formulae are also on page 20 of the Factor chapter at
http://support.spss.com/ProductsExt/SPSS/Documentation/Statistics/algorithms/14.0/factor.pdf

The Bartlett formula includes the log of the determinant of the correlation matrix. If there are linear dependencies, then the determinant of the matrix will be 0 and its log will be undefined. The KMO measure formula includes elements of the anti-image covariance matrix, whose calculation involves the inverse of the correlation matrix. If the correlation matrix has linear dependencies, then its inverse can not be computed.

Apart from the inability to print the KMO or Bartlett's test, the presence of an NPD correlation matrix may lead you to rethink the choice of variables or attempt to acquire data on a larger sample to achieve more reliable results.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon
Sent: Sunday, March 01, 2009 9:32 AM
To: [hidden email]
Subject: Re: insufficient N for factor analysis

Another approach you might consider is Partial Least Squares. This is useful for both categorical and continuous (scale) dependent variables. This is available in SPSS Statistics v16 or 17 as an add-in via programmability that can be downloaded from Developer Central (www.spss.com/devcentral). Of course, you don't get all the inferential apparatus of traditional regression methods, but it has the advantage of finding best combinations of predictors for particular dependent variables.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ftr
Sent: Saturday, February 28, 2009 5:51 PM
To: [hidden email]
Subject: Re: [SPSSX-L] insufficient N for factor analysis

I would like to come back to my question: how to reduce the complexity
of a large set of variables if you have few cases ?

This happens in comparative political science all the time when you have
countries as cases and a large set of variables that describe them.

I now have a set of some 20 countries in Europe. If you study the EU
member states at a aggregate level today you have 27 countries. There
are no more member states. I have even fewer cases due to unequal
covering of the countries in my sources (the OECD data do not survey the
same countries as the EU, the European Social Survey, etc.).
At the same time I have a large set of variables describing the
economic, social, and cultural structure of the same 20 countries. So
how to find a a pattern in the variables if the condition of 1O cases
per variable for a sound factor analysis are not met ?

A second question:
Factor analysis does not print the KMO or AIC info, even if I demand all
stats in the print command. Is this due to the low no. of cases ? How
can I force SPSS to print the KMO or the AIC info ?

TIA

Frank Thomas

Frank Thomas wrote:

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD