|
Hello everyone,
Syntax for SPSS Principal Components Analysis with Horn’s parallel analysis to determine significant eigenvalues is highly solicited. Thank you. J. Amora ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Johnny,
This site may help you: http://flash.lakeheadu.ca/~boconno2/nfactors.html Kylie. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Johnny Amora Sent: Wednesday, 9 July 2008 10:40 am To: [hidden email] Subject: SPSS Principal Components Analysis with Horns parallel analysis Hello everyone, Syntax for SPSS Principal Components Analysis with Horns parallel analysis to determine significant eigenvalues is highly solicited. Thank you. J. Amora ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hello, I was asked to do a factor analysis of 40 variables but I only have 70 cases. Needless to say, I had to increase iterations to 100 to get the program to converge and I still believe that it makes no sense to do a factor analysis with less than 2 cases per variable. I was then asked to provide a citation for that. Could someone point me to a source discussing the minimum case per variable requirement for factor analysis that I can cite? Thanks a lot.
Bozena Bozena Zdaniuk, Ph.D. University of Pittsburgh UCSUR, 6th Fl. 121 University Place Pittsburgh, PA 15260 Ph.: 412-624-5736 Fax: 412-624-4810 Email: [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of:
50 as very poor 100 as poor 200 as fair 300 as good 500 as very good 1000 as excellent for factor analysis. Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases. Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat Professor & Director of Research Dept of Physical Medicine & Rehabilitation Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: > From: Zdaniuk, Bozena <[hidden email]> > Subject: insufficient N for factor analysis > To: [hidden email] > Date: Wednesday, July 9, 2008, 12:03 PM > Hello, I was asked to do a factor analysis of 40 variables > but I only have 70 cases. Needless to say, I had to > increase iterations to 100 to get the program to converge > and I still believe that it makes no sense to do a factor > analysis with less than 2 cases per variable. I was then > asked to provide a citation for that. Could someone point > me to a source discussing the minimum case per variable > requirement for factor analysis that I can cite? Thanks a > lot. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Zdaniuk, Bozena-2
Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003. Washington DC. Page 100. Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables."
-- Robert A. Marshall, PhD, PMP Atlanta, GA 30030 -------------- Original message -------------- From: "Zdaniuk, Bozena" <[hidden email]> > Hello, I was asked to do a factor analysis of 40 variables but I only have 70 > cases. Needless to say, I had to increase iterations to 100 to get the program > to converge and I still believe that it makes no sense to do a factor analysis > with less than 2 cases per variable. I was then asked to provide a citation for > that. Could someone point me to a source discussing the minimum case per > variable requirement for factor analysis that I can cite? Thanks a lot. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by SR Millis-3
And how do you find a structure when you only have 50 cases because the
institution analysed is constituted of 50 services ? Besides common sese guessing which form of common pattern detection can be used in this case ? Regards Frank Thomas SR Millis wrote: > Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of: > > 50 as very poor > 100 as poor > 200 as fair > 300 as good > 500 as very good > 1000 as excellent for factor analysis. > > Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases. > > > Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat > Professor & Director of Research > Dept of Physical Medicine & Rehabilitation > Wayne State University School of Medicine > 261 Mack Blvd > Detroit, MI 48201 > Email: [hidden email] > Tel: 313-993-8085 > Fax: 313-966-7682 > > > --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: > > >> From: Zdaniuk, Bozena <[hidden email]> >> Subject: insufficient N for factor analysis >> To: [hidden email] >> Date: Wednesday, July 9, 2008, 12:03 PM >> Hello, I was asked to do a factor analysis of 40 variables >> but I only have 70 cases. Needless to say, I had to >> increase iterations to 100 to get the program to converge >> and I still believe that it makes no sense to do a factor >> analysis with less than 2 cases per variable. I was then >> asked to provide a citation for that. Could someone point >> me to a source discussing the minimum case per variable >> requirement for factor analysis that I can cite? Thanks a >> lot. >> Bozena >> >> Bozena Zdaniuk, Ph.D. >> University of Pittsburgh >> UCSUR, 6th Fl. >> 121 University Place >> Pittsburgh, PA 15260 >> Ph.: 412-624-5736 >> Fax: 412-624-4810 >> Email: [hidden email] >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body >> text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the >> command >> INFO REFCARD >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Robert Marshall-7
I would look at the article by McCallum et al in Psycholgical Methods as well as some in MBR that show problems with rules of thumb for EFA......one needs to take into account scaling issues, over/under determination, communalities/saturation, etc..........
Robert Marshall <[hidden email]> wrote: Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003. Washington DC. Page 100. Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables." -- Robert A. Marshall, PhD, PMP Atlanta, GA 30030 -------------- Original message -------------- From: "Zdaniuk, Bozena" > Hello, I was asked to do a factor analysis of 40 variables but I only have 70 > cases. Needless to say, I had to increase iterations to 100 to get the program > to converge and I still believe that it makes no sense to do a factor analysis > with less than 2 cases per variable. I was then asked to provide a citation for > that. Could someone point me to a source discussing the minimum case per > variable requirement for factor analysis that I can cite? Thanks a lot. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Dale Glaser, Ph.D. Principal--Glaser Consulting Lecturer/Adjunct Faculty--SDSU/USD/AIU President, San Diego Chapter of American Statistical Association 3115 4th Avenue San Diego, CA 92103 phone: 619-220-0602 fax: 619-220-0412 email: [hidden email] website: www.glaserconsult.com ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Zdaniuk, Bozena-2
I do not remember a specific citation, but the general idea is that factor
analysis is a derivation of regression, and regression rests on the normal distribution of estimation errors. This normal distribution of estimation errors is known as "the law of large numbers" and is a tendency shown by errors as N gets larger and larger. More exactly, as the "degrees of freedom" get larger. The degrees of freedom equal number of cases minus number of variables, N-k-1, which in your case is quite small. As the number of cases are few, the margin of error of your estimates will be very wide, and you could not be sure of their probable true value in the universe or population, especially for minor factors after the first or second one, where the coefficients or loadings will be close to zero (and there may therefore be difficult to tell whether they are not zero in the population). An old rule of thumb says you need at the very least 10 cases per variable, but this is "the very least". With less than 30-50 cases experimental error distributions hardly (or very infrequently) resemble a normal curve. So my advise is you try a model with fewer variables, possibly one underlying factor if your 40 variables are mostly explained by one overarching factor, or abandon factor analysis altogether and try some more modest approaches like a simple summatory scale, simple regression, 2 or 3 way cross tabulations, and the like. Next time, go bigger in your sample design. And then again, do you really have a theory that is so complex that no less than 40 independent factors are required by it? Isaac Newton explained the universe with only two or three variables, and did very well indeed, thank you. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Zdaniuk, Bozena Sent: 09 July 2008 13:03 To: [hidden email] Subject: insufficient N for factor analysis Hello, I was asked to do a factor analysis of 40 variables but I only have 70 cases. Needless to say, I had to increase iterations to 100 to get the program to converge and I still believe that it makes no sense to do a factor analysis with less than 2 cases per variable. I was then asked to provide a citation for that. Could someone point me to a source discussing the minimum case per variable requirement for factor analysis that I can cite? Thanks a lot. Bozena Bozena Zdaniuk, Ph.D. University of Pittsburgh UCSUR, 6th Fl. 121 University Place Pittsburgh, PA 15260 Ph.: 412-624-5736 Fax: 412-624-4810 Email: [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 08:48 AM 7/9/2008, Hector Maletta wrote:
>I do not remember a specific citation, but the general idea is that factor >analysis is a derivation of regression, and regression rests on the normal >distribution of estimation errors. This normal distribution of estimation >errors is known as "the law of large numbers" and is a tendency shown by >errors as N gets larger and larger. More exactly, as the "degrees of >freedom" get larger. The degrees of freedom equal number of cases minus >number of variables, N-k-1, which in your case is quite small. As the number >of cases are few, the margin of error of your estimates will be very wide, >and you could not be sure of their probable true value in the universe or >population, especially for minor factors after the first or second one, >where the coefficients or loadings will be close to zero (and there may >therefore be difficult to tell whether they are not zero in the population). > >An old rule of thumb says you need at the very least 10 cases per variable, >but this is "the very least". With less than 30-50 cases experimental error >distributions hardly (or very infrequently) resemble a normal curve. >So my advise is you try a model with fewer variables, possibly one >underlying factor if your 40 variables are mostly explained by one >overarching factor, or abandon factor analysis altogether and try some more >modest approaches like a simple summatory scale, simple regression, 2 or 3 >way cross tabulations, and the like. Next time, go bigger in your sample >design. And then again, do you really have a theory that is so complex that >no less than 40 independent factors are required by it? Isaac Newton >explained the universe with only two or three variables, and did very well >indeed, thank you. >Hector I have been following this discussion with much interest, as I have a similar problem at hand. For years, we have been conducting a consumer satisfaction survey that consists of one page, about 10 questions, plus a single open-ended question. Although the questions were intended to probe consumer satisfaction in a number of different areas, basically the level of correlation is so high that it seems that we're really only tracking one factor: overall satisfaction. So we conducted literature reviews, and went back to the drawing boards, formulating more than 100 questions in 6 broad areas of consumer satisfaction. Our intention was to pilot test these questions with participants, examine the results, throw out the redundant questions (discerned through factor analysis), and emerge with, say, 20 questions known to reflect different dimensions of consumer satisfaction. However, our sample size thus far is in the pitiful range: perhaps 35 respondents. Needless to say, we have a long way to go. With our response rates, and consumer base, we would be lucky to get more than 100 respondents in a year. In order to improve the subjects to variables ratio (STV), we need either to greatly increase the sample size (which is difficult for us to do), or reduce the number of variables, or both. Our questions are short simple statements requesting responses on a 5-point likert scale. Some of the questions are worded in almost identical language, and some of these are almost certainly redundant. Given our relatively small sample size thus far, what is the best way to proceed to remove redundant questions while retaining maximum diversity of responses? From one perspective, it would appear that rank correlations might be the preferred measure of association, but I wonder if Likert scales are, analytically speaking, equivalent to rank order variables? What other measures would be most appropriate? I hesitate to downgrade the measure of association to categorical, because that throws out the information on directionality and degree. Likewise, I hesitate to overgrade the measure of association to ratio, because clearly the intervals are arbitrary and not additive. Intuitively, I am seeking to extract, out of these 100 questions, 4-5 groups of 2-3 questions each, such that within-group correlations are high, but correlations with the other groups are low. The within-group redundancy reinforces degree of satisfaction with that particular factor, and the low between-group correlation assures that different aspects of satisfaction are represented. Suggestions, please? Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In such kind of case my main suggestion is forget about factor analysis, and
simply try to add up the number of "correct" answers. If all questions are highly correlated and clearly measure various aspects of overall satisfaction, subtle differences in weighting (provided by factor analysis) would not matter much, and would probably vary from one sample to the next. So go ahead with a no-weight (i.e. equal weight) scale and relax. You can check whether this simple additive score still correlates well with individual questions, and with other (external) indicators associated with satisfaction (such as returning for more), but assuming all goes well the simple scale is easier to compute, easier to explain, and lacks the many statistical pitfalls of factor analysis and regression. It only lacks the false pretenses of scientificity coming from mere difficulty or sophistication, and some people live off being difficult, and get famous just because of that, e.g. some postmodern "philosophes", but you better don't care much about that. Hector -----Original Message----- From: Bob Schacht [mailto:[hidden email]] Sent: 09 July 2008 18:11 To: Hector Maletta; [hidden email] Subject: Re: insufficient N for factor analysis At 08:48 AM 7/9/2008, Hector Maletta wrote: >I do not remember a specific citation, but the general idea is that factor >analysis is a derivation of regression, and regression rests on the normal >distribution of estimation errors. This normal distribution of estimation >errors is known as "the law of large numbers" and is a tendency shown by >errors as N gets larger and larger. More exactly, as the "degrees of >freedom" get larger. The degrees of freedom equal number of cases minus >number of variables, N-k-1, which in your case is quite small. As the number >of cases are few, the margin of error of your estimates will be very wide, >and you could not be sure of their probable true value in the universe or >population, especially for minor factors after the first or second one, >where the coefficients or loadings will be close to zero (and there may >therefore be difficult to tell whether they are not zero in the population). > >An old rule of thumb says you need at the very least 10 cases per variable, >but this is "the very least". With less than 30-50 cases experimental error >distributions hardly (or very infrequently) resemble a normal curve. >So my advise is you try a model with fewer variables, possibly one >underlying factor if your 40 variables are mostly explained by one >overarching factor, or abandon factor analysis altogether and try some more >modest approaches like a simple summatory scale, simple regression, 2 or 3 >way cross tabulations, and the like. Next time, go bigger in your sample >design. And then again, do you really have a theory that is so complex that >no less than 40 independent factors are required by it? Isaac Newton >explained the universe with only two or three variables, and did very well >indeed, thank you. >Hector I have been following this discussion with much interest, as I have a similar problem at hand. For years, we have been conducting a consumer satisfaction survey that consists of one page, about 10 questions, plus a single open-ended question. Although the questions were intended to probe consumer satisfaction in a number of different areas, basically the level of correlation is so high that it seems that we're really only tracking one factor: overall satisfaction. So we conducted literature reviews, and went back to the drawing boards, formulating more than 100 questions in 6 broad areas of consumer satisfaction. Our intention was to pilot test these questions with participants, examine the results, throw out the redundant questions (discerned through factor analysis), and emerge with, say, 20 questions known to reflect different dimensions of consumer satisfaction. However, our sample size thus far is in the pitiful range: perhaps 35 respondents. Needless to say, we have a long way to go. With our response rates, and consumer base, we would be lucky to get more than 100 respondents in a year. In order to improve the subjects to variables ratio (STV), we need either to greatly increase the sample size (which is difficult for us to do), or reduce the number of variables, or both. Our questions are short simple statements requesting responses on a 5-point likert scale. Some of the questions are worded in almost identical language, and some of these are almost certainly redundant. Given our relatively small sample size thus far, what is the best way to proceed to remove redundant questions while retaining maximum diversity of responses? From one perspective, it would appear that rank correlations might be the preferred measure of association, but I wonder if Likert scales are, analytically speaking, equivalent to rank order variables? What other measures would be most appropriate? I hesitate to downgrade the measure of association to categorical, because that throws out the information on directionality and degree. Likewise, I hesitate to overgrade the measure of association to ratio, because clearly the intervals are arbitrary and not additive. Intuitively, I am seeking to extract, out of these 100 questions, 4-5 groups of 2-3 questions each, such that within-group correlations are high, but correlations with the other groups are low. The within-group redundancy reinforces degree of satisfaction with that particular factor, and the low between-group correlation assures that different aspects of satisfaction are represented. Suggestions, please? Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by SR Millis-3
I'm not sure the effort is worth it, but....
You can try to use Dwyer's extension analysis. You start by creating a set of homogenous item packages or parcels - combine sets of 2-4 items into new scales by reviewing the item correlations (combine those items with the highest inter-item correlations). Then, factor analyze the item parcels (you will have reduced the number of variables in the factor analysis to about 10-15 (instead of 40). Convergence and iterations should behave better. Rotate and then use the Dwyer extension procedure described in Gorsuch (1983) Factor Analysis (2nd Ed.) on pages 236-238. Essentially, the factor solution of the parcels is projected onto the original set of items. You'll get your factor structure and pattern matrix (if you rotate obliquely) of the 40 items. If you need some background on item parceling, you can find out more about it by searching "item parcels." I know their use is controversial. You can also check up on Andrew Comrey's work in developing his personality inventory and Ray Cattell's work. Edgar --- Discover Technologies 2906 River Meadow Circle. Canton, MI 48188 (734) 564-4964 (734) 468-0800 fax -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SR Millis Sent: Wednesday, July 09, 2008 12:23 PM To: [hidden email] Subject: Re: insufficient N for factor analysis Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of: 50 as very poor 100 as poor 200 as fair 300 as good 500 as very good 1000 as excellent for factor analysis. Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases. Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat Professor & Director of Research Dept of Physical Medicine & Rehabilitation Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: > From: Zdaniuk, Bozena <[hidden email]> > Subject: insufficient N for factor analysis > To: [hidden email] > Date: Wednesday, July 9, 2008, 12:03 PM > Hello, I was asked to do a factor analysis of 40 variables > but I only have 70 cases. Needless to say, I had to > increase iterations to 100 to get the program to converge > and I still believe that it makes no sense to do a factor > analysis with less than 2 cases per variable. I was then > asked to provide a citation for that. Could someone point > me to a source discussing the minimum case per variable > requirement for factor analysis that I can cite? Thanks a > lot. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Bob Schacht-3
In addition to the recommended ratios of 10 to 20 people per variable, the following has also been suggested:
Some Monte Carlo simulation research (Guadagnoli & Velincer, 1998) suggest ... replicable factors tend to be estimated if: 1. factors are each defined by four or more measured variables with structure coefficients each great than .6 [in absolute value], regardless or sample size; or 2. factors are each defined with 10 or more structure coefficients each around .4[in absolute value], if sample size is greater than 150; or 3. sample size is at least 300." (Thompson, 2004, p. 24) Linda Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association. --- On Wed, 7/9/08, Bob Schacht <[hidden email]> wrote: From: Bob Schacht <[hidden email]> Subject: Re: insufficient N for factor analysis To: [hidden email] Date: Wednesday, July 9, 2008, 4:10 PM At 08:48 AM 7/9/2008, Hector Maletta wrote: >I do not remember a specific citation, but the general idea is that factor >analysis is a derivation of regression, and regression rests on the normal >distribution of estimation errors. This normal distribution of estimation >errors is known as "the law of large numbers" and is a tendency shown by >errors as N gets larger and larger. More exactly, as the "degrees of >freedom" get larger. The degrees of freedom equal number of cases minus >number of variables, N-k-1, which in your case is quite small. As the number >of cases are few, the margin of error of your estimates will be very wide, >and you could not be sure of their probable true value in the universe or >population, especially for minor factors after the first or second one, >where the coefficients or loadings will be close to zero (and there may >therefore be difficult to tell whether they are not zero in the population). > >An old rule of thumb says you need at the very least 10 cases per variable, >but this is "the very least". With less than 30-50 cases experimental error >distributions hardly (or very infrequently) resemble a normal curve. >So my advise is you try a model with fewer variables, possibly one >underlying factor if your 40 variables are mostly explained by one >overarching factor, or abandon factor analysis altogether and try some more >modest approaches like a simple summatory scale, simple regression, 2 or 3 >way cross tabulations, and the like. Next time, go bigger in your sample >design. And then again, do you really have a theory that is so complex that >no less than 40 independent factors are required by it? Isaac Newton >explained the universe with only two or three variables, and did very well >indeed, thank you. >Hector I have been following this discussion with much interest, as I have a similar problem at hand. For years, we have been conducting a consumer satisfaction survey that consists of one page, about 10 questions, plus a single open-ended question. Although the questions were intended to probe consumer satisfaction in a number of different areas, basically the level of correlation is so high that it seems that we're really only tracking one factor: overall satisfaction. So we conducted literature reviews, and went back to the drawing boards, formulating more than 100 questions in 6 broad areas of consumer satisfaction. Our intention was to pilot test these questions with participants, examine the results, throw out the redundant questions (discerned through factor analysis), and emerge with, say, 20 questions known to reflect different dimensions of consumer satisfaction. However, our sample size thus far is in the pitiful range: perhaps 35 respondents. Needless to say, we have a long way to go. With our response rates, and consumer base, we would be lucky to get more than 100 respondents in a year. In order to improve the subjects to variables ratio (STV), we need either to greatly increase the sample size (which is difficult for us to do), or reduce the number of variables, or both. Our questions are short simple statements requesting responses on a 5-point likert scale. Some of the questions are worded in almost identical language, and some of these are almost certainly redundant. Given our relatively small sample size thus far, what is the best way to proceed to remove redundant questions while retaining maximum diversity of responses? From one perspective, it would appear that rank correlations might be the preferred measure of association, but I wonder if Likert scales are, analytically speaking, equivalent to rank order variables? What other measures would be most appropriate? I hesitate to downgrade the measure of association to categorical, because that throws out the information on directionality and degree. Likewise, I hesitate to overgrade the measure of association to ratio, because clearly the intervals are arbitrary and not additive. Intuitively, I am seeking to extract, out of these 100 questions, 4-5 groups of 2-3 questions each, such that within-group correlations are high, but correlations with the other groups are low. The within-group redundancy reinforces degree of satisfaction with that particular factor, and the low between-group correlation assures that different aspects of satisfaction are represented. Suggestions, please? Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by SR Millis-3
For me 100 is not poor if I have only 10 variables and 500 is not very good if I have more than 100 variables. I think we should consider the number of variables rather than the sample size alone.
J Talili --- On Wed, 7/9/08, SR Millis <[hidden email]> wrote: From: SR Millis <[hidden email]> Subject: Re: insufficient N for factor analysis To: [hidden email] Date: Wednesday, July 9, 2008, 4:22 PM Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of: 50 as very poor 100 as poor 200 as fair 300 as good 500 as very good 1000 as excellent for factor analysis. Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases. Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat Professor & Director of Research Dept of Physical Medicine & Rehabilitation Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: > From: Zdaniuk, Bozena <[hidden email]> > Subject: insufficient N for factor analysis > To: [hidden email] > Date: Wednesday, July 9, 2008, 12:03 PM > Hello, I was asked to do a factor analysis of 40 variables > but I only have 70 cases. Needless to say, I had to > increase iterations to 100 to get the program to converge > and I still believe that it makes no sense to do a factor > analysis with less than 2 cases per variable. I was then > asked to provide a citation for that. Could someone point > me to a source discussing the minimum case per variable > requirement for factor analysis that I can cite? Thanks a > lot. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hector Maletta
At 08:48 AM 7/9/2008, Hector Maletta wrote:
>I do not remember a specific citation, but the general idea is that factor >analysis is a derivation of regression, and regression rests on the normal >distribution of estimation errors. This normal distribution of estimation >errors is known as "the law of large numbers" and is a tendency shown by >errors as N gets larger and larger. More exactly, as the "degrees of >freedom" get larger. The degrees of freedom equal number of cases minus >number of variables, N-k-1, which in your case is quite small. As the number >of cases are few, the margin of error of your estimates will be very wide, >and you could not be sure of their probable true value in the universe or >population, especially for minor factors after the first or second one, >where the coefficients or loadings will be close to zero (and there may >therefore be difficult to tell whether they are not zero in the population). > >An old rule of thumb says you need at the very least 10 cases per variable, >but this is "the very least". With less than 30-50 cases experimental error >distributions hardly (or very infrequently) resemble a normal curve. >So my advise is you try a model with fewer variables, possibly one >underlying factor if your 40 variables are mostly explained by one >overarching factor, or abandon factor analysis altogether and try some more >modest approaches like a simple summatory scale, simple regression, 2 or 3 >way cross tabulations, and the like. Next time, go bigger in your sample >design. And then again, do you really have a theory that is so complex that >no less than 40 independent factors are required by it? Isaac Newton >explained the universe with only two or three variables, and did very well >indeed, thank you. >Hector Hector (and anyone else), I have been pondering your advice to Bozena (see below), since my situation is similar, only much worse. In my previous note, I wrote that I have over 100 variables (potential questions for a survey), and so far only about 30 pilot test responses. One thought that occurs to me is that our 100 variables actually fall into half a dozen groups. Each group of questions was designed to elicit a particular dimension of consumer satisfaction. Rather than attempting to run a factor analysis on all 100+ variables at once, with so few cases, would it make more sense to * run the factor analysis on one group of questions at a time * reduce the group to one or two questions with the highest loadings on the principal component * repeat the above procedure for each group of questions * Finally, conduct a factor analysis on the reduced set of variables to test the hypothesis that consumer satisfaction as reflected in this set of questions really is multidimensional. The guiding theory here is that consumer satisfaction has multiple components. Each group of questions is designed to elicit degree of satisfaction with a particular dimension of consumer experience suggested in the literature. There is a great deal of overlap in the language of the questions, as we seek to identify the language that has resonance with our consumers. Our goal is to develop a consumer satisfaction instrument for our agency that is genuinely multidimensional, allowing the agency to get a better idea of where improvements are most needed. Our current instrument is short, and seems to address different issues, but the answers we get are so highly correlated that we really only seem to be measuring global satisfaction, which is really not a very useful result. Thanks in advance, Bob Schacht >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Zdaniuk, Bozena >Sent: 09 July 2008 13:03 >To: [hidden email] >Subject: insufficient N for factor analysis > >Hello, I was asked to do a factor analysis of 40 variables but I only have >70 cases. Needless to say, I had to increase iterations to 100 to get the >program to converge and I still believe that it makes no sense to do a >factor analysis with less than 2 cases per variable. I was then asked to >provide a citation for that. Could someone point me to a source discussing >the minimum case per variable requirement for factor analysis that I can >cite? Thanks a lot. >Bozena > >Bozena Zdaniuk, Ph.D. >University of Pittsburgh >UCSUR, 6th Fl. >121 University Place >Pittsburgh, PA 15260 >Ph.: 412-624-5736 >Fax: 412-624-4810 >Email: [hidden email] > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by news
I would like to come back to my question: how to reduce the complexity
of a large set of variables if you have few cases ? This happens in comparative political science all the time when you have countries as cases and a large set of variables that describe them. I now have a set of some 20 countries in Europe. If you study the EU member states at a aggregate level today you have 27 countries. There are no more member states. I have even fewer cases due to unequal covering of the countries in my sources (the OECD data do not survey the same countries as the EU, the European Social Survey, etc.). At the same time I have a large set of variables describing the economic, social, and cultural structure of the same 20 countries. So how to find a a pattern in the variables if the condition of 1O cases per variable for a sound factor analysis are not met ? A second question: Factor analysis does not print the KMO or AIC info, even if I demand all stats in the print command. Is this due to the low no. of cases ? How can I force SPSS to print the KMO or the AIC info ? TIA Frank Thomas Frank Thomas wrote: > And how do you find a structure when you only have 50 cases because the > institution analysed is constituted of 50 services ? Besides common sese > guessing which form of common pattern detection can be used in this case ? > > Regards > Frank Thomas > > SR Millis wrote: >> Comrey & Lee (1992, A first course in factor analysis) give as a guide >> sample sizes of: >> >> 50 as very poor >> 100 as poor >> 200 as fair >> 300 as good >> 500 as very good >> 1000 as excellent for factor analysis. >> >> Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend >> at least 300 cases. >> >> >> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat >> Professor & Director of Research >> Dept of Physical Medicine & Rehabilitation >> Wayne State University School of Medicine >> 261 Mack Blvd >> Detroit, MI 48201 >> Email: [hidden email] >> Tel: 313-993-8085 >> Fax: 313-966-7682 >> >> >> --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: >> >> >>> From: Zdaniuk, Bozena <[hidden email]> >>> Subject: insufficient N for factor analysis >>> To: [hidden email] >>> Date: Wednesday, July 9, 2008, 12:03 PM >>> Hello, I was asked to do a factor analysis of 40 variables >>> but I only have 70 cases. Needless to say, I had to >>> increase iterations to 100 to get the program to converge >>> and I still believe that it makes no sense to do a factor >>> analysis with less than 2 cases per variable. I was then >>> asked to provide a citation for that. Could someone point >>> me to a source discussing the minimum case per variable >>> requirement for factor analysis that I can cite? Thanks a >>> lot. >>> Bozena >>> >>> Bozena Zdaniuk, Ph.D. >>> University of Pittsburgh >>> UCSUR, 6th Fl. >>> 121 University Place >>> Pittsburgh, PA 15260 >>> Ph.: 412-624-5736 >>> Fax: 412-624-4810 >>> Email: [hidden email] >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body >>> text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the >>> command >>> INFO REFCARD >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Some kludges.
Create meaningful subsets of the variables. Sidestep the question about whether the obtained matrices are reasonable representations of the population matrix. IFF you want to consider the 27 countries the total population about which you wish to make statements, then take a large dose of salt, hold your nose, and pretend that the obtained correlation matrix IS the population matrix. Write out the matrix products (means, SDs, Rs) and read them back in faking the number of cases. Use unit weights to create summative scores of standardized item variables. create a few nominal level variables that relate to clusters of countries based on clusters of countries on the subsets mentioned above. Add an additional cluster identifier for cases that do not have the variables to create the cluster. Each membership value in the clustering would stand for a meaningful profile Relate the cluster memberships to each other with CROSSTABS, CATPCA and TWOSTEP treating the membership variables as nominal level. Create choropleth (patch) maps of the memberships. Try different coordinate systems including weighting visual area by population. Relate the cluster memberships to variables that were not used to create that clustering. E.g., relate industrial clusters to housing variables, etc. Art Kendall Social Research Consultants ftr wrote: I would like to come back to my question: how to reduce the complexity
Art Kendall
Social Research Consultants |
|
In reply to this post by news
Another approach you might consider is Partial Least Squares. This is useful for both categorical and continuous (scale) dependent variables. This is available in SPSS Statistics v16 or 17 as an add-in via programmability that can be downloaded from Developer Central (www.spss.com/devcentral). Of course, you don't get all the inferential apparatus of traditional regression methods, but it has the advantage of finding best combinations of predictors for particular dependent variables.
HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ftr Sent: Saturday, February 28, 2009 5:51 PM To: [hidden email] Subject: Re: [SPSSX-L] insufficient N for factor analysis I would like to come back to my question: how to reduce the complexity of a large set of variables if you have few cases ? This happens in comparative political science all the time when you have countries as cases and a large set of variables that describe them. I now have a set of some 20 countries in Europe. If you study the EU member states at a aggregate level today you have 27 countries. There are no more member states. I have even fewer cases due to unequal covering of the countries in my sources (the OECD data do not survey the same countries as the EU, the European Social Survey, etc.). At the same time I have a large set of variables describing the economic, social, and cultural structure of the same 20 countries. So how to find a a pattern in the variables if the condition of 1O cases per variable for a sound factor analysis are not met ? A second question: Factor analysis does not print the KMO or AIC info, even if I demand all stats in the print command. Is this due to the low no. of cases ? How can I force SPSS to print the KMO or the AIC info ? TIA Frank Thomas Frank Thomas wrote: > And how do you find a structure when you only have 50 cases because the > institution analysed is constituted of 50 services ? Besides common sese > guessing which form of common pattern detection can be used in this case ? > > Regards > Frank Thomas > > SR Millis wrote: >> Comrey & Lee (1992, A first course in factor analysis) give as a guide >> sample sizes of: >> >> 50 as very poor >> 100 as poor >> 200 as fair >> 300 as good >> 500 as very good >> 1000 as excellent for factor analysis. >> >> Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend >> at least 300 cases. >> >> >> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat >> Professor & Director of Research >> Dept of Physical Medicine & Rehabilitation >> Wayne State University School of Medicine >> 261 Mack Blvd >> Detroit, MI 48201 >> Email: [hidden email] >> Tel: 313-993-8085 >> Fax: 313-966-7682 >> >> >> --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: >> >> >>> From: Zdaniuk, Bozena <[hidden email]> >>> Subject: insufficient N for factor analysis >>> To: [hidden email] >>> Date: Wednesday, July 9, 2008, 12:03 PM >>> Hello, I was asked to do a factor analysis of 40 variables >>> but I only have 70 cases. Needless to say, I had to >>> increase iterations to 100 to get the program to converge >>> and I still believe that it makes no sense to do a factor >>> analysis with less than 2 cases per variable. I was then >>> asked to provide a citation for that. Could someone point >>> me to a source discussing the minimum case per variable >>> requirement for factor analysis that I can cite? Thanks a >>> lot. >>> Bozena >>> >>> Bozena Zdaniuk, Ph.D. >>> University of Pittsburgh >>> UCSUR, 6th Fl. >>> 121 University Place >>> Pittsburgh, PA 15260 >>> Ph.: 412-624-5736 >>> Fax: 412-624-4810 >>> Email: [hidden email] >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body >>> text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the >>> command >>> INFO REFCARD >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Frank,
In response to your second question, the KMO and AIC (anti-image correlation) are not printed when the correlation matrix is nonpositive definite, which seems likely to apply from your description of the numbers of cases and variables in your study. I've pasted a related resolution from the support web site ( http://support.spss.com ) below. David Matheson SPSS Statistical Support ********************* Resolution number: 20414 Created on: Aug 21 2001 Last Reviewed on: Feb 28 2009 Problem Subject: FACTOR does not print KMO or Bartlett test for Nonpositive Definite Matrices Problem Description: I have run the SPSS FACTOR procedure with principal components analysis (PCA) as the extraction method. I requested the Kaiser-Mayer-Olkin (KMO) measure of sample adequacy and the Bartlett test of sphericity but neither of these measures was printed. The "Communalities", "Total Variance Explained" and "Component Matrix" tables were printed. Why was my request for KMO and Bartlett's sphericity test ignored? Resolution Subject: KMO, Bartlett's sphericity, and anti-image correlation not printed for nonpositive definite matrices Resolution Description: It is likely the case that your correlation matrix is nonpositive definite (NPD), i.e., that some of the eigenvalues of your correlation matrix are not positive numbers. If this is the case, there will be a footnote to the correlation matrix that states "This matrix is not positive definite." Even if you did not request the correlation matrix as part of the FACTOR output, requesting the KMO or Bartlett test will cause the title "Correlation Matrix" to be printed. The footnote will be printed under this title if the correlation matrix was not requested. An NPD matrix will also result in suppression of other output from the 'Descriptives' dialog of the Factor dialog, namely the inverse of the correlation matrix, the anti-image correlation matrix, and the significance values for the correlations. If you had requested a factor extraction method other than PCA or unweighted least squares (ULS), an NPD matrix would have caused the procedure to stop without further analysis. Matrices can be NPD as a result of various other properties. A correlation matrix will be NPD if there are linear dependencies among the variables, as reflected by one or more eigenvalues of 0. For example, if variable X12 can be reproduced by a weighted sum of variables X5, X7, and X10, then there is a linear dependency among those variables and the correlation matrix that includes them will be NPD. If there are more variables in the analysis than there are cases, then the correlation matrix will have linear dependencies and be NPD. Remember that FACTOR uses listwise deletion of cases with missing data by default. If you had more cases in the file than variables in the analysis but also had many missing values, listwise deletion could leave you with more variables than retained cases. Pairwise deletion of missing data can also lead to NPD matrices. Negative eigenvalues may be present in these situations. See the following chapter for a helpful discussion and illustration of! how this can happen. Wothke, W. (1993) Nonpositive definite matrices in structural modeling. In K.A. Bollen & J.S. Long (Eds.), Testing Structural Equation Models. Newbury Park NJ: Sage. (Chap. 11, pp. 256-293). Elements of the KMO and Bartlett test statistic can not be calculated if the correlation matrix is NPD. See the formulae for these statistics in the current Statistical Algorithms documentation by clicking Help->Algorithms in SPSS, then scrolling down to the link for Factor Algorithms. Then click the link for Optional Statistics. . The formulae are also on page 20 of the Factor chapter at http://support.spss.com/ProductsExt/SPSS/Documentation/Statistics/algorithms/14.0/factor.pdf The Bartlett formula includes the log of the determinant of the correlation matrix. If there are linear dependencies, then the determinant of the matrix will be 0 and its log will be undefined. The KMO measure formula includes elements of the anti-image covariance matrix, whose calculation involves the inverse of the correlation matrix. If the correlation matrix has linear dependencies, then its inverse can not be computed. Apart from the inability to print the KMO or Bartlett's test, the presence of an NPD correlation matrix may lead you to rethink the choice of variables or attempt to acquire data on a larger sample to achieve more reliable results. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Sunday, March 01, 2009 9:32 AM To: [hidden email] Subject: Re: insufficient N for factor analysis Another approach you might consider is Partial Least Squares. This is useful for both categorical and continuous (scale) dependent variables. This is available in SPSS Statistics v16 or 17 as an add-in via programmability that can be downloaded from Developer Central (www.spss.com/devcentral). Of course, you don't get all the inferential apparatus of traditional regression methods, but it has the advantage of finding best combinations of predictors for particular dependent variables. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ftr Sent: Saturday, February 28, 2009 5:51 PM To: [hidden email] Subject: Re: [SPSSX-L] insufficient N for factor analysis I would like to come back to my question: how to reduce the complexity of a large set of variables if you have few cases ? This happens in comparative political science all the time when you have countries as cases and a large set of variables that describe them. I now have a set of some 20 countries in Europe. If you study the EU member states at a aggregate level today you have 27 countries. There are no more member states. I have even fewer cases due to unequal covering of the countries in my sources (the OECD data do not survey the same countries as the EU, the European Social Survey, etc.). At the same time I have a large set of variables describing the economic, social, and cultural structure of the same 20 countries. So how to find a a pattern in the variables if the condition of 1O cases per variable for a sound factor analysis are not met ? A second question: Factor analysis does not print the KMO or AIC info, even if I demand all stats in the print command. Is this due to the low no. of cases ? How can I force SPSS to print the KMO or the AIC info ? TIA Frank Thomas Frank Thomas wrote: > And how do you find a structure when you only have 50 cases because the > institution analysed is constituted of 50 services ? Besides common sese > guessing which form of common pattern detection can be used in this case ? > > Regards > Frank Thomas > > SR Millis wrote: >> Comrey & Lee (1992, A first course in factor analysis) give as a guide >> sample sizes of: >> >> 50 as very poor >> 100 as poor >> 200 as fair >> 300 as good >> 500 as very good >> 1000 as excellent for factor analysis. >> >> Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend >> at least 300 cases. >> >> >> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat >> Professor & Director of Research >> Dept of Physical Medicine & Rehabilitation >> Wayne State University School of Medicine >> 261 Mack Blvd >> Detroit, MI 48201 >> Email: [hidden email] >> Tel: 313-993-8085 >> Fax: 313-966-7682 >> >> >> --- On Wed, 7/9/08, Zdaniuk, Bozena <[hidden email]> wrote: >> >> >>> From: Zdaniuk, Bozena <[hidden email]> >>> Subject: insufficient N for factor analysis >>> To: [hidden email] >>> Date: Wednesday, July 9, 2008, 12:03 PM >>> Hello, I was asked to do a factor analysis of 40 variables >>> but I only have 70 cases. Needless to say, I had to >>> increase iterations to 100 to get the program to converge >>> and I still believe that it makes no sense to do a factor >>> analysis with less than 2 cases per variable. I was then >>> asked to provide a citation for that. Could someone point >>> me to a source discussing the minimum case per variable >>> requirement for factor analysis that I can cite? Thanks a >>> lot. >>> Bozena >>> >>> Bozena Zdaniuk, Ph.D. >>> University of Pittsburgh >>> UCSUR, 6th Fl. >>> 121 University Place >>> Pittsburgh, PA 15260 >>> Ph.: 412-624-5736 >>> Fax: 412-624-4810 >>> Email: [hidden email] >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body >>> text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the >>> command >>> INFO REFCARD >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
