|
Hi,
I'm a teaching assistant for a stat course. Today, a student asked me a question in class which I was not able to answer. I really appreciate if anyone can help me to figure out how computational formula of variance is derived from the conceptual formula.. Thank you, Figen |
|
Hi Figen,
Please follow the link for variance computation. ![]() http://en.wikipedia.org/wiki/Computational_formula_for_the_varianceRegards Dorraj Oet Date: Wed, 13 Jan 2010 02:19:18 +0000 From: [hidden email] Subject: Variance formula. To: [hidden email] Hi,
I'm a teaching assistant for a stat course. Today, a student asked me a question in class which I was not able to answer. I really appreciate if anyone can help me to figure out how computational formula of variance is derived from the conceptual formula.. Thank you, Figen New Windows 7: Find the right PC for you. Learn more. |
|
In reply to this post by Karadogan, Figen
X=
random variable with probability density function f(x)
x_bar
= M[X] = expectation = mean
variance = D[X]
then:
x_bar
= M[X] = integral(x*f(x)*dx) [-inf, +inf]
D[X] =
integral((x - x_bar)^2*f(x)*dx [-inf, +inf].
|
|
Thank you so much..:-)
I think I figured it out.. Figen From: MaxJasper [[hidden email]] Sent: Tuesday, January 12, 2010 9:44 PM To: [hidden email] Subject: RE: Variance formula. X= random variable with probability density function f(x)
x_bar = M[X] = expectation = mean
variance = D[X]
then:
x_bar = M[X] = integral(x*f(x)*dx) [-inf, +inf]
D[X] = integral((x - x_bar)^2*f(x)*dx [-inf, +inf].
|
|
In reply to this post by Karadogan, Figen
I am trying to locate rules of thumb on sample sizes required to fit 2PL and 3PL IRT models in a computer adaptive testing (CAT). Any references citing rules of thumb or comparing thetas between these two models would be greatly appreciated. V/r, Joy Oliver
Hi, I'm a teaching assistant for a stat course. Today, a student asked me a question in class which I was not able to answer. I really appreciate if anyone can help me to figure out how computational formula of variance is derived from the conceptual formula.. Thank you, Figen |
|
What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available.
Thanks, Angie ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Dear Angelina Factor analisis is not the same as Principal Componets Analisys )PCA) You coul read more in: The Use of Exploratory Factor Analysis and Principal Components Analysis in Comunication Hee Sun Park; Rene Dailey; Daisy Lemus Human Communication Research; Oct 1, 2002; 28, 4; Repairing Tom Swift’s Electric Factor Analysis Machine Kristopher J. Preacher and Robert C. MacCallum UNDERSTANDING STATISTICS, 2(1), 13–43 There are a lot of problems factoring dichotomous items, mainle the presence of sartificial factors, more on: On artificial results due to using factor analysis for dichotomous variables Klaus D Kubinger Psychology Science; 2003; 45, 1; One solution is to use tetrachoric correlations instead of phi correlations (with are the default option in SPSS), but you have to check if the assumption of an underlying normaly distributed latent variable(s) is plausible. Kindly Andrés Mg. Andrés Burga León Coordinador de Análisis e Informática Unidad de Medición de la Calidad Educativa Ministerio de Educación del Perú Calle El Comercio s/n (espalda del Museo de la Nación) Lima 41 Perú Teléfono 615-5840 |
|
In reply to this post by Angelina S. MacKewn
|
|
In reply to this post by Angelina S. MacKewn
Any factor analysis can be run on dichotomous variables, because these
variables can legitimately be considered as interval measures. As only one interval is involved (from 0 to 1), there is no question of comparing unequal intervals. Their mean is the proportion (p) of the value 1, and the variance is p(1-p). There is a specific SPSS procedure, CATPCA, for principal component analysis of categorical variables (ordinal or nominal, any number of categories). However, for dichotomous variables CATPCA gives the same solution as classical Principal Components Analysis of interval variables (PCA is one of the variants of factor analysis). Purists insist that dichotomous variables cannot be used in anything related to regression, because their residuals are not normally distributed. To see this, one has to see that the predicted value for a dichotomous variable is either a value between 0 and 1, or a value outside that interval. In the first case, the actual values will be either 1 or 0, and the residuals would therefore be piled at the ends of the 0,1 interval, and not around the predicted value. In the second case, the residuals will all be at one side of the predicted value. In any case, their distribution would not be normal. However, dummy variables (i.e. variables with value 0 or 1) are routinely used in regression. Factor analysis is a variant of linear regression (or, more widely, a variant of the Generalized Linear Model) and therefore this habitual use applies also to it. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Angelina S. MacKewn Sent: 13 January 2010 19:41 To: [hidden email] Subject: Factor Analysis on dichotomous variables What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available. Thanks, Angie ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Angelina
The number of factors (or components) worth retaining largely depends on the degree of linear correlation or association between the observed variables, either dichotomous or otherwise. If all variables are highly correlated among them, possibly one (or two) factors would explain most of the total or common variance, regardless of the type of variable involved. Besides, there is not a single unequivocal criterion to ascertain the number of factors worth retaining, and much depends on the purpose of the analysis. Sometimes you are after one factor only (which should explain a large fraction of total variance), sometimes you look for various underlying dimensions, either orthogonal to each other or correlated among them (this latter case is obtained through oblique rotation). The common criterion of using only factors with eigenvalue above 1, or using the scree curve to identify the cutoff factor, are only rules of thumb that not always are useful. One has, besides, to understand that factors are mathematical constructs, not real objects, and therefore one can heuristically select the most useful variant. I am of course speaking of exploratory factor analysis. What is called confirmatory factor analysis should more properly be treated as structural equation models with latent variables. However, in my humble opinion, these "confirmatory" analyses cannot "confirm" that the model is right, nor "prove" causal links between variables. Factor analysis simply replaces observed variables with a (possibly smaller) number of underlying scales, all of which are linear functions of the observed variables. Hector -----Original Message----- From: Angelina S. MacKewn [mailto:[hidden email]] Sent: 13 January 2010 20:32 To: Hector Maletta Subject: RE: Factor Analysis on dichotomous variables Hector, I have read the argument that dichotomous variables in a PCA produces too many components? Do you think this is something that one would get nailed on when we go to publish this? Thanks for an answer I could understand. I am not a statistician, just a researcher trying to write a paper. Cheers, Angie -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: Wed 1/13/2010 5:29 PM To: Angelina S. MacKewn; [hidden email] Subject: RE: Factor Analysis on dichotomous variables Any factor analysis can be run on dichotomous variables, because these variables can legitimately be considered as interval measures. As only one interval is involved (from 0 to 1), there is no question of comparing unequal intervals. Their mean is the proportion (p) of the value 1, and the variance is p(1-p). There is a specific SPSS procedure, CATPCA, for principal component analysis of categorical variables (ordinal or nominal, any number of categories). However, for dichotomous variables CATPCA gives the same solution as classical Principal Components Analysis of interval variables (PCA is one of the variants of factor analysis). Purists insist that dichotomous variables cannot be used in anything related to regression, because their residuals are not normally distributed. To see this, one has to see that the predicted value for a dichotomous variable is either a value between 0 and 1, or a value outside that interval. In the first case, the actual values will be either 1 or 0, and the residuals would therefore be piled at the ends of the 0,1 interval, and not around the predicted value. In the second case, the residuals will all be at one side of the predicted value. In any case, their distribution would not be normal. However, dummy variables (i.e. variables with value 0 or 1) are routinely used in regression. Factor analysis is a variant of linear regression (or, more widely, a variant of the Generalized Linear Model) and therefore this habitual use applies also to it. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Angelina S. MacKewn Sent: 13 January 2010 19:41 To: [hidden email] Subject: Factor Analysis on dichotomous variables What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available. Thanks, Angie ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Angelina S. MacKewn
Angie,
My third message: Besides conceptual issues addressed in my previous messages, I should call your attention to the fact that 50 variables with 500 cases is very likely to yield non-significant results, due to small size of sample in relation to the number of variables. Some books or teachers speak about an absolute minimum of 10 cases per variable. With 500/50 you are precisely at that supposed minimum, but it is widely seen as too optimistic. 40-60 cases per variable is more like it, although there is no general rule because all depends on the amount of correlation among the observed variables. Have you considered dividing the 50 items into a number of groups evidently related to different dimensions? PCA performed on each group of more closely related dichotomous variables may be more reliable, both because they are more closely related and because the cases/variables ratio is higher. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Angelina S. MacKewn Sent: 13 January 2010 19:41 To: [hidden email] Subject: Factor Analysis on dichotomous variables What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available. Thanks, Angie ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hector Maletta
One problem with doing traditional factor analyses on dichotomous variables is that unless the dichotomous variables have means around .5, they can seriously underestimate the true degree of correlation. Some recommend using tetrachoric correlations to get around this but such correlations may lead to matrices with negative eigenvalues. I tend to agree with Dale that it's often better to use a tool specifically designed to handle the problem. That's why I recommend Mplus for such analyses.
Paul Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Wednesday, January 13, 2010 5:29 PM To: [hidden email] Subject: Re: Factor Analysis on dichotomous variables Any factor analysis can be run on dichotomous variables, because these variables can legitimately be considered as interval measures. As only one interval is involved (from 0 to 1), there is no question of comparing unequal intervals. Their mean is the proportion (p) of the value 1, and the variance is p(1-p). There is a specific SPSS procedure, CATPCA, for principal component analysis of categorical variables (ordinal or nominal, any number of categories). However, for dichotomous variables CATPCA gives the same solution as classical Principal Components Analysis of interval variables (PCA is one of the variants of factor analysis). Purists insist that dichotomous variables cannot be used in anything related to regression, because their residuals are not normally distributed. To see this, one has to see that the predicted value for a dichotomous variable is either a value between 0 and 1, or a value outside that interval. In the first case, the actual values will be either 1 or 0, and the residuals would therefore be piled at the ends of the 0,1 interval, and not around the predicted value. In the second case, the residuals will all be at one side of the predicted value. In any case, their distribution would not be normal. However, dummy variables (i.e. variables with value 0 or 1) are routinely used in regression. Factor analysis is a variant of linear regression (or, more widely, a variant of the Generalized Linear Model) and therefore this habitual use applies also to it. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Angelina S. MacKewn Sent: 13 January 2010 19:41 To: [hidden email] Subject: Factor Analysis on dichotomous variables What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available. Thanks, Angie ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Angelina S. MacKewn
How did you select the set of behaviors? Do you have a priori groups of behaviors (items) meant to represent particular constructs? Is this a one time study or are you trying to establish summative scales for future use? Some considerations. An item can be considered to have 3 parts: common variance (which you hope is related to the construct), item specific variance, and error variance. For a scale of spelling achievement, common variance would be related to spelling ability, unique variance would be related to the stimulus word. If you are trying to create scales, then you are interested in finding factors that account for the common variance among items. It is routine, to use the "reliability" (squared multiple correlations with the other items), on the diagonal as an estimate of the common variance. This is the kind of factor analysis called principal axis factor analysis (PA2). (The kind of factor analysis called principal components has 1.00 on the diagonal of the correlation matrix.) In order to maintain the distinctiveness of the constructs in your explanation stick with the traditional orthogonal rotation. You would be starting with a 50 dimensional space. A major consideration is how many factors account for a meaningful amount of the variance you are trying to account for. A rule of thumb is that there is no way one would be interested in a factor that accounts for less variance than a single item. Kaiser started the practice of only extracting factors that have eigenvalues greater than one. This is a programming convenience. In over 35 years of experience, I have never seen it be reasonable to retain this many factors in the final solution. Some ways to ballpark the number of factors to consider retaining are: Cattell's scree test, and parallel factor analysis. You can find syntax to do parallel factor analysis in the archives of this list. In the end you would extract the number of factors where the set of cleanly loading items accounts for a substantial percentage of the common variance and has a meaningful interpretation. Usually that is the number of scales you would create. (Rarely, there will be a non-interpretable factor say as the third factor in a five factor extraction.) Although there are more elaborate ways of getting scores built into software, simply reflecting items to represent the underlying construct and summing them often creates a scale that stands up across studies. This is also called unit weighting. Then check the reliability of the scale. Many of your 50 items may not be useful on a scale because they are related to none of the retained factors, represent a construct without enough clean items to make a scale, or are related to more than one factor. It is often worthwhile, to see if newer and more sophisticated approaches yield substantially different grouping of items. Examples are: item clustering, IRT (item response theory), Rausch modeling, and structural equation modeling. If they do, you would need to figure out why. If they do not, your writeup might simply mention that the other methods produce similar results Art Kendall Social Research Consultants On 1/13/2010 5:41 PM, Angelina S. MacKewn wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDWhat is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available. Thanks, Angie ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Karadogan, Figen
The University of North Carolina at Greensboro
(www.uncg.edu), School of Health and Human Performance (HHP) (www.uncg.edu/hhp)
invites applications for either a tenure track (tenure eligible) or non tenure
track Quantitative Methodologist in the Department of Public Health Education.
This person will assist in the expansion of our research and graduate programs
within the School across four departments and two centers. Increased research
productivity including funded research is a central goal of the school;
quantitative support for our researchers is critical to meeting that goal. The
faculty is currently funded by NIAMS, NIDA, NCI, CDC and numerous private
foundations and state agencies. William N. Dudley, PhD Associate Dean for Research The School of Health and Human Performance Office of Research The University of North Carolina at Greensboro 126 HHP Building, PO Box 26170 Greensboro, NC 27402-6170 VOICE 336.2562475 FAX 336.334.3238 |
| Free forum by Nabble | Edit this page |
