SPSSX Discussion

Variance formula.

Classic

List

Threaded

14 messages Options

Karadogan, Figen

Variance formula.

Hi,

I'm a teaching assistant for a stat course. Today, a student asked me a question in class which I was not able to answer. I really appreciate if anyone can help me to figure out how computational formula of variance is derived from the conceptual formula..

Thank you,

Figen

Jarrod Teo-2

Re: Variance formula.

Hi Figen,

Please follow the link for variance computation.

http://en.wikipedia.org/wiki/Computational_formula_for_the_variance

Regards
Dorraj Oet

Date: Wed, 13 Jan 2010 02:19:18 +0000
From: [hidden email]
Subject: Variance formula.
To: [hidden email]

New Windows 7: Find the right PC for you. Learn more.

MaxJasper

Re: Variance formula.

In reply to this post by Karadogan, Figen

Message

X= random variable with probability density function f(x)

x_bar = M[X] = expectation = mean

variance = D[X]

then:

x_bar = M[X] = integral(x*f(x)*dx) [-inf, +inf]

D[X] = integral((x - x_bar)^2*f(x)*dx [-inf, +inf].

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Karadogan, Figen
Sent: Tuesday, January 12, 2010 19:19
To: [hidden email]
Subject: Variance formula.

Hi,

I'm a teaching assistant for a stat course. Today, a student asked me a question in class which I was not able to answer. I really appreciate if anyone can help me to figure out how computational formula of variance is derived from the conceptual formula..

Thank you,

Figen

Karadogan, Figen

Re: Variance formula.

Thank you so much..:-)
I think I figured it out..

Figen

From: MaxJasper [[hidden email]]
Sent: Tuesday, January 12, 2010 9:44 PM
To: [hidden email]
Subject: RE: Variance formula.

X= random variable with probability density function f(x)

x_bar = M[X] = expectation = mean

variance = D[X]

then:

x_bar = M[X] = integral(x*f(x)*dx) [-inf, +inf]

D[X] = integral((x - x_bar)^2*f(x)*dx [-inf, +inf].

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Karadogan, Figen
Sent: Tuesday, January 12, 2010 19:19
To: [hidden email]
Subject: Variance formula.

Hi,

I'm a teaching assistant for a stat course. Today, a student asked me a question in class which I was not able to answer. I really appreciate if anyone can help me to figure out how computational formula of variance is derived from the conceptual formula..

Thank you,

Figen

Joy Oliver

IRT sample sizes

In reply to this post by Karadogan, Figen

I am trying to locate rules of thumb on sample sizes required to fit 2PL and 3PL IRT models in a computer adaptive testing (CAT).
Any references citing rules of thumb or comparing thetas between these two models would be greatly appreciated.

V/r,
Joy Oliver

From:	"Karadogan, Figen" <[hidden email]>
To:	[hidden email]
Date:	01/12/2010 09:20 PM
Subject:	Variance formula.
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Angelina S. MacKewn

Factor Analysis on dichotomous variables

What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ANDRES ALBERTO BURGA LEON

Re: Factor Analysis on dichotomous variables

Dear Angelina

Factor analisis is not the same as Principal Componets Analisys )PCA) You coul read more in:

The Use of Exploratory Factor Analysis and Principal Components Analysis in Comunication
Hee Sun Park; Rene Dailey; Daisy Lemus
Human Communication Research; Oct 1, 2002; 28, 4;

Repairing Tom Swift’s Electric Factor
Analysis Machine
Kristopher J. Preacher and Robert C. MacCallum UNDERSTANDING STATISTICS, 2(1), 13–43

There are a lot of problems factoring dichotomous items, mainle the presence of sartificial factors, more on:

On artificial results due to using factor analysis for dichotomous variables
Klaus D Kubinger
Psychology Science; 2003; 45, 1;

One solution is to use tetrachoric correlations instead of phi correlations (with are the default option in SPSS), but you have to check if the assumption of an underlying normaly distributed latent variable(s) is plausible.

Kindly

Andrés

Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa
Ministerio de Educación del Perú
Calle El Comercio s/n (espalda del Museo de la Nación)
Lima 41
Perú
Teléfono 615-5840

Dale Glaser

Re: Factor Analysis on dichotomous variables

In reply to this post by Angelina S. MacKewn

Angelina, I will defer to others who may use SPSS for binary variables, but you may want to consider using software that is equipped to perform EFA on binary variables with the proper correlation matrix (i.e., tetrachoric) and estimator (Mplus uses a robust weighted least squares estimator). However, possibly someone on this listserv has developed a macro by which you can import a tetrachoric matrix.

Dale Glaser, Ph.D.
Principal--Glaser Consulting
Lecturer/Adjunct Faculty--SDSU/USD/AIU
Past-President, San Diego Chapter of
American Statistical Association
3115 4th Avenue
San Diego, CA 92103
phone: 619-220-0602
fax: 619-220-0412
email: [hidden email]
website: www.glaserconsult.com

--- On Wed, 1/13/10, Angelina S. MacKewn <[hidden email]> wrote:

From: Angelina S. MacKewn <[hidden email]>
Subject: Factor Analysis on dichotomous variables
To: [hidden email]
Date: Wednesday, January 13, 2010, 2:41 PM

What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: Factor Analysis on dichotomous variables

In reply to this post by Angelina S. MacKewn

Any factor analysis can be run on dichotomous variables, because these
variables can legitimately be considered as interval measures. As only one
interval is involved (from 0 to 1), there is no question of comparing
unequal intervals. Their mean is the proportion (p) of the value 1, and the
variance is p(1-p).
There is a specific SPSS procedure, CATPCA, for principal component analysis
of categorical variables (ordinal or nominal, any number of categories).
However, for dichotomous variables CATPCA gives the same solution as
classical Principal Components Analysis of interval variables (PCA is one of
the variants of factor analysis).
Purists insist that dichotomous variables cannot be used in anything related
to regression, because their residuals are not normally distributed. To see
this, one has to see that the predicted value for a dichotomous variable is
either a value between 0 and 1, or a value outside that interval. In the
first case, the actual values will be either 1 or 0, and the residuals would
therefore be piled at the ends of the 0,1 interval, and not around the
predicted value. In the second case, the residuals will all be at one side
of the predicted value. In any case, their distribution would not be normal.

However, dummy variables (i.e. variables with value 0 or 1) are routinely
used in regression. Factor analysis is a variant of linear regression (or,
more widely, a variant of the Generalized Linear Model) and therefore this
habitual use applies also to it.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Angelina S. MacKewn
Sent: 13 January 2010 19:41
To: [hidden email]
Subject: Factor Analysis on dichotomous variables

What is the factor analysis (PCA) equivalent that can be run on dichotomous
variables. I have 50 exhibited behaviours (yes/no) that I want to factor
together. I have a sample size of about 500. I would be using SPSS and could
use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: Factor Analysis on dichotomous variables

Angelina
The number of factors (or components) worth retaining largely depends on the
degree of linear correlation or association between the observed variables,
either dichotomous or otherwise. If all variables are highly correlated
among them, possibly one (or two) factors would explain most of the total or
common variance, regardless of the type of variable involved.
Besides, there is not a single unequivocal criterion to ascertain the number
of factors worth retaining, and much depends on the purpose of the analysis.
Sometimes you are after one factor only (which should explain a large
fraction of total variance), sometimes you look for various underlying
dimensions, either orthogonal to each other or correlated among them (this
latter case is obtained through oblique rotation).
The common criterion of using only factors with eigenvalue above 1, or using
the scree curve to identify the cutoff factor, are only rules of thumb that
not always are useful.
One has, besides, to understand that factors are mathematical constructs,
not real objects, and therefore one can heuristically select the most useful
variant. I am of course speaking of exploratory factor analysis. What is
called confirmatory factor analysis should more properly be treated as
structural equation models with latent variables. However, in my humble
opinion, these "confirmatory" analyses cannot "confirm" that the model is
right, nor "prove" causal links between variables. Factor analysis simply
replaces observed variables with a (possibly smaller) number of underlying
scales, all of which are linear functions of the observed variables.

Hector

-----Original Message-----
From: Angelina S. MacKewn [mailto:[hidden email]]
Sent: 13 January 2010 20:32
To: Hector Maletta
Subject: RE: Factor Analysis on dichotomous variables

Hector,

I have read the argument that dichotomous variables in a PCA produces too
many components? Do you think this is something that one would get nailed on
when we go to publish this?

Thanks for an answer I could understand. I am not a statistician, just a
researcher trying to write a paper.

Cheers,
Angie

-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Wed 1/13/2010 5:29 PM
To: Angelina S. MacKewn; [hidden email]
Subject: RE: Factor Analysis on dichotomous variables

Any factor analysis can be run on dichotomous variables, because these
variables can legitimately be considered as interval measures. As only one
interval is involved (from 0 to 1), there is no question of comparing
unequal intervals. Their mean is the proportion (p) of the value 1, and the
variance is p(1-p).
There is a specific SPSS procedure, CATPCA, for principal component analysis
of categorical variables (ordinal or nominal, any number of categories).
However, for dichotomous variables CATPCA gives the same solution as
classical Principal Components Analysis of interval variables (PCA is one of
the variants of factor analysis).
Purists insist that dichotomous variables cannot be used in anything related
to regression, because their residuals are not normally distributed. To see
this, one has to see that the predicted value for a dichotomous variable is
either a value between 0 and 1, or a value outside that interval. In the
first case, the actual values will be either 1 or 0, and the residuals would
therefore be piled at the ends of the 0,1 interval, and not around the
predicted value. In the second case, the residuals will all be at one side
of the predicted value. In any case, their distribution would not be normal.

However, dummy variables (i.e. variables with value 0 or 1) are routinely
used in regression. Factor analysis is a variant of linear regression (or,
more widely, a variant of the Generalized Linear Model) and therefore this
habitual use applies also to it.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Angelina S. MacKewn
Sent: 13 January 2010 19:41
To: [hidden email]
Subject: Factor Analysis on dichotomous variables

What is the factor analysis (PCA) equivalent that can be run on dichotomous
variables. I have 50 exhibited behaviours (yes/no) that I want to factor
together. I have a sample size of about 500. I would be using SPSS and could
use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: Factor Analysis on dichotomous variables

In reply to this post by Angelina S. MacKewn

Angie,
My third message:
Besides conceptual issues addressed in my previous messages, I should call
your attention to the fact that 50 variables with 500 cases is very likely
to yield non-significant results, due to small size of sample in relation to
the number of variables.
Some books or teachers speak about an absolute minimum of 10 cases per
variable. With 500/50 you are precisely at that supposed minimum, but it is
widely seen as too optimistic. 40-60 cases per variable is more like it,
although there is no general rule because all depends on the amount of
correlation among the observed variables.
Have you considered dividing the 50 items into a number of groups evidently
related to different dimensions? PCA performed on each group of more closely
related dichotomous variables may be more reliable, both because they are
more closely related and because the cases/variables ratio is higher.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Angelina S. MacKewn
Sent: 13 January 2010 19:41
To: [hidden email]
Subject: Factor Analysis on dichotomous variables

What is the factor analysis (PCA) equivalent that can be run on dichotomous
variables. I have 50 exhibited behaviours (yes/no) that I want to factor
together. I have a sample size of about 500. I would be using SPSS and could
use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Swank, Paul R

Re: Factor Analysis on dichotomous variables

In reply to this post by Hector Maletta

One problem with doing traditional factor analyses on dichotomous variables is that unless the dichotomous variables have means around .5, they can seriously underestimate the true degree of correlation. Some recommend using tetrachoric correlations to get around this but such correlations may lead to matrices with negative eigenvalues. I tend to agree with Dale that it's often better to use a tool specifically designed to handle the problem. That's why I recommend Mplus for such analyses.

Paul

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Wednesday, January 13, 2010 5:29 PM
To: [hidden email]
Subject: Re: Factor Analysis on dichotomous variables

Any factor analysis can be run on dichotomous variables, because these
variables can legitimately be considered as interval measures. As only one
interval is involved (from 0 to 1), there is no question of comparing
unequal intervals. Their mean is the proportion (p) of the value 1, and the
variance is p(1-p).
There is a specific SPSS procedure, CATPCA, for principal component analysis
of categorical variables (ordinal or nominal, any number of categories).
However, for dichotomous variables CATPCA gives the same solution as
classical Principal Components Analysis of interval variables (PCA is one of
the variants of factor analysis).
Purists insist that dichotomous variables cannot be used in anything related
to regression, because their residuals are not normally distributed. To see
this, one has to see that the predicted value for a dichotomous variable is
either a value between 0 and 1, or a value outside that interval. In the
first case, the actual values will be either 1 or 0, and the residuals would
therefore be piled at the ends of the 0,1 interval, and not around the
predicted value. In the second case, the residuals will all be at one side
of the predicted value. In any case, their distribution would not be normal.

However, dummy variables (i.e. variables with value 0 or 1) are routinely
used in regression. Factor analysis is a variant of linear regression (or,
more widely, a variant of the Generalized Linear Model) and therefore this
habitual use applies also to it.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Angelina S. MacKewn
Sent: 13 January 2010 19:41
To: [hidden email]
Subject: Factor Analysis on dichotomous variables

What is the factor analysis (PCA) equivalent that can be run on dichotomous
variables. I have 50 exhibited behaviours (yes/no) that I want to factor
together. I have a sample size of about 500. I would be using SPSS and could
use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Factor Analysis on dichotomous variables

In reply to this post by Angelina S. MacKewn

Factor analysis is routinely done on dichotomous items. Think of all the standardized tests that are used, SAT, IQ, LSAT, GRE, etc.

How did you select the set of behaviors?
Do you have a priori groups of behaviors (items) meant to represent particular constructs?
Is this a one time study or are you trying to establish summative scales for future use?

Some considerations.
An item can be considered to have 3 parts: common variance (which you hope is related to the construct), item specific variance, and error variance.
For a scale of spelling achievement, common variance would be related to spelling ability, unique variance would be related to the stimulus word.
If you are trying to create scales, then you are interested in finding factors that account for the common variance among items. It is routine, to use the "reliability" (squared multiple correlations with the other items), on the diagonal as an estimate of the common variance. This is the kind of factor analysis called principal axis factor analysis (PA2). (The kind of factor analysis called principal components has 1.00 on the diagonal of the correlation matrix.)

In order to maintain the distinctiveness of the constructs in your explanation stick with the traditional orthogonal rotation.

You would be starting with a 50 dimensional space. A major consideration is how many factors account for a meaningful amount of the variance you are trying to account for.
A rule of thumb is that there is no way one would be interested in a factor that accounts for less variance than a single item. Kaiser started the practice of only extracting factors that have eigenvalues greater than one. This is a programming convenience. In over 35 years of experience, I have never seen it be reasonable to retain this many factors in the final solution.

Some ways to ballpark the number of factors to consider retaining are:
Cattell's scree test, and parallel factor analysis. You can find syntax to do parallel factor analysis in the archives of this list.

In the end you would extract the number of factors where the set of cleanly loading items accounts for a substantial percentage of the common variance and has a meaningful interpretation. Usually that is the number of scales you would create. (Rarely, there will be a non-interpretable factor say as the third factor in a five factor extraction.)

Although there are more elaborate ways of getting scores built into software, simply reflecting items to represent the underlying construct and summing them often creates a scale that stands up across studies. This is also called unit weighting. Then check the reliability of the scale. Many of your 50 items may not be useful on a scale because they are related to none of the retained factors, represent a construct without enough clean items to make a scale, or are related to more than one factor.

It is often worthwhile, to see if newer and more sophisticated approaches yield substantially different grouping of items. Examples are: item clustering, IRT (item response theory), Rausch modeling, and structural equation modeling. If they do, you would need to figure out why. If they do not, your writeup might simply mention that the other methods produce similar results

Art Kendall
Social Research Consultants

On 1/13/2010 5:41 PM, Angelina S. MacKewn wrote:

What is the factor analysis (PCA) equivalent that can be run on dichotomous variables. I have 50 exhibited behaviours (yes/no) that I want to factor together. I have a sample size of about 500. I would be using SPSS and could use syntax if it is available.

Thanks,
Angie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

William Dudley WNDUDLEY

PhD Quant Position Opening

In reply to this post by Karadogan, Figen

The University of North Carolina at Greensboro (www.uncg.edu), School of Health and Human Performance (HHP) (www.uncg.edu/hhp) invites applications for either a tenure track (tenure eligible) or non tenure track Quantitative Methodologist in the Department of Public Health Education. This person will assist in the expansion of our research and graduate programs within the School across four departments and two centers. Increased research productivity including funded research is a central goal of the school; quantitative support for our researchers is critical to meeting that goal. The faculty is currently funded by NIAMS, NIDA, NCI, CDC and numerous private foundations and state agencies.

For more details on the listing and application instructions please visit the web site for the posting

http://provost.uncg.edu/Academic/EPA_Personnel/JobLists/DetailPage.asp?s=999897

William N. Dudley, PhD
Associate Dean for Research
The School of Health and Human Performance Office of Research
The University of North Carolina at Greensboro
126 HHP Building, PO Box 26170
Greensboro, NC 27402-6170
VOICE 336.2562475
FAX 336.334.3238

====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD