SPSSX Discussion

again, more precise: Fisher s exact test for bigger tables than 2x2

Classic

List

Threaded

22 messages Options

Monika Heinzel-Gutenbrunner-2

again, more precise: Fisher s exact test for bigger tables than 2x2

Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: Digested by UGA Mail Gateway on 128.192.1.75

Hi ,
I should like to post once more my question concerning Fisher s exact
for bigger rables than 2x2 .
I learnt, that in a 2x2 table, for Fisher ' s exact test , I calculate
the p-value directly from the table and there is no test statistic.
Also in SPSS , in the 2x2 case, there is no test statistic cited.
But if I have a r x k table with r or k >2 , with the Exact Tests
option I get a test statistic. without degrees of freedom.
My question is, what is the distribution of it and/ or how is it called?
best regards
Monika

Albert-Jan Roskam

Collinearity

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

____________________________________________________________________________________
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

Anthony Babinec

Re: again, more precise: Fisher s exact test for bigger tables than 2x2

In reply to this post by Monika Heinzel-Gutenbrunner-2

How about: the Fisher-Freeman-Halton
test of independence for an unordered RxC table.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Monika Heinzel-Gutenbrunner
Sent: Thursday, February 01, 2007 3:37 AM
To: [hidden email]
Subject: again, more precise: Fisher s exact test for bigger tables than 2x2

Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: Digested by UGA Mail Gateway on 128.192.1.75

Hi ,
I should like to post once more my question concerning Fisher s exact
for bigger rables than 2x2 .
I learnt, that in a 2x2 table, for Fisher ' s exact test , I calculate
the p-value directly from the table and there is no test statistic.
Also in SPSS , in the 2x2 case, there is no test statistic cited.
But if I have a r x k table with r or k >2 , with the Exact Tests
option I get a test statistic. without degrees of freedom.
My question is, what is the distribution of it and/ or how is it called?
best regards
Monika

Bruno Berszoner

Re: Collinearity

In reply to this post by Albert-Jan Roskam

Albert-jan,

In my school days, I spent a lot of time studying Econometrics and
multicollinearity is a common topic. Referring back to one of my old text
books (Basic Econometrics from Damodar Gujarati), some methods for
detecting the presence of multicollinearity are:

1. Regression results with high R^2 but few significant t-ratios.
2. High pairwise correlations among regressors (you Spearman correlation
coefficients). In this book, they say the threshold is 0.8 but give no
source.
3. Auxiliary regressions -- regress each of your independent variables on
the other independent variables and look at the resulting R^2 for each.
According to something known as Klien's rule of thumb, if the R^2 of any
auxiliary regression is greater than the R^2 of the main regression, then
you should assume there's multicollinearity.
4. Compute the eigenvalues for the model you are using and find what is
known as the condition index. It is defined as SQRT(max eigenvalue/min
eigenvalue). If the condition index is between 100 and 1,000, there is
moderate to strong multicollinearity; if it exceeds 1,000, then there is
strong multicollinearity.
5. Tolerance and variance inflation factors (VIF). The VIF is computed as
1/(1-rhoij), where rhoij is the correlation coefficient between independent
variables i and j. The VIF shows how the variance of an estimator is
affected by the presence of multicollinearity, and the greater the rho
between 2 variables, the greater the impact on the variance (and therefore,
standard errors around the coefficient). Another rule of thumb is that if
the VIF is 10 or more (meaning and R^2ij or 0.9), you have a problem.
Tolerance is defined as 1/VIF (=1-R^2ij), so a value of 0 means perfect
multicollinearity and 1 means no multicollinearity.

What can you do if there is multicollinearity? Well, it depends on your
model and data. Given that you're using ordinal data, some of the
recommendations won't apply. Is there some relationship between your
variables that you can take advantage of? For example, if you know that X1
and X2 are related in some manner based on theory or previous empirical
work, then you can modify your model accordingly. You can also drop one of
the offending variables but at the risk of specification error.

Also, multicollinearity is a feature of samples, so is it possible to get
another sample from the same population? Additional or new data may help
if it is possible to obtain it. If you want to delve further, you can also
try factor analysis, principal components analysis or ridge regression.

I hope some of this helps.

Bruno Berszoner
Tufts Health Plan
Quality and Health Informatics
(617) 923-5868 x4393

"Albert-jan
Roskam"
<[hidden email]> To
Sent by: [hidden email]
"SPSSX(r) cc
Discussion"
<SPSSX-L@LISTSERV Subject
.UGA.EDU> Collinearity

02/01/2007 06:38
AM

Please respond to
"Albert-jan
Roskam"
<[hidden email]>

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

____________________________________________________________________________________

Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

Ornelas, Fermin

Re: Collinearity

In reply to this post by Albert-Jan Roskam

I am not aware that Spearman's coefficients were used to diagnose
collinearity but given that they are correlation coefficients they could
be used. A correlation matrix in general will have its coefficients
between -1 and 1, the closer to one the more correlated the variables
are.

There are other rules such as the variance inflation factors and the
variance proportion coefficients. The former shall not exceed a value of
10 while the latter shall not exceed .5 and no more than three variables
should have those values (.5) along the same row.

Since I am new to SPSS I am not aware that these measures of
collinearity are available in it.

Fermin Ornelas, Ph.D.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Albert-jan Roskam
Sent: Thursday, February 01, 2007 4:39 AM
To: [hidden email]
Subject: Collinearity

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

________________________________________________________________________
____________
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific individual(s) to whom it is addressed. It may contain
information that is privileged and confidential under state and federal
law. This information may be used or disclosed only in accordance with
law, and you may be subject to penalties under law for improper use or
further disclosure of the information in this e-mail and its
attachments. If you have received this e-mail in error, please
immediately notify the person named above by reply e-mail, and then
delete the original e-mail. Thank you.

Kooij, A.J. van der

Re: Collinearity

In reply to this post by Albert-Jan Roskam

If you have SPSS Categories, you can use CATREG for regression, using ordinal scaling level for the predictors (and numerical level for a continuous dependent variable). CATREG gives the tolerance for the quantified ordinal variables. The quantified variables can be saved and used with logistic regression.

Anita van der Kooij
Data Theory Group
Leiden University

________________________________

From: SPSSX(r) Discussion on behalf of Albert-jan Roskam
Sent: Thu 01/02/2007 12:38
To: [hidden email]
Subject: Collinearity

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

____________________________________________________________________________________
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************

statisticsdoc

Re: Collinearity

In reply to this post by Albert-Jan Roskam

Stephen Brand
www.statisticsdoc.com

Albert-jan,

A great deal of good advice has been given on this topic, particularly
Anita's suggestion to utilize CATREG. Just to add a couple of small items
to the pool, I would suggest the following:

(1) Perfect collinearity exists when one independent variable can be
predicted by a linear combination of the other independent variables, so in
addition to looking at the bivariate correlations between the predictors,
examine the multiple regression between each predictor and the other
predictors (e.g., to what extent can X1 be predicted by a weighted
combination of X2 and X3).

(2) If you have a large sample, you might want to consider splitting it
randomly into halves, and conducting the logistic regression analysis in
both halves, or cross-validating the regression weights from one half in the
other half. This approach will give some indication of how robust the
parameter estimates are.

HTH,

Stephen Brand

For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Albert-jan Roskam
Sent: Thursday, February 01, 2007 6:39 AM
To: [hidden email]
Subject: Collinearity

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

____________________________________________________________________________
________
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

Jeff-125

Re: Collinearity and new samples?

In reply to this post by Bruno Berszoner

At 08:32 AM 2/1/2007, you wrote:

>Also, multicollinearity is a feature of samples, so is it possible to get
>another sample from the same population? Additional or new data may help
>if it is possible to obtain it.

>Bruno Berszoner

...this one is slightly new to me. I assume that drawing another
sample would only prove useful in the event that you could oversample
certain characteristics to reduce the degree of multicollinearity and
then weight the final analysis to provide the equivalent of a
randomized sampling approach? E.g., if race and income were
collinear, then you would need to over-sample high income minority
groups to decrease the collinearity between these two variables.
...or is there another way that this might be expected to work?

Jeff

Peck, Jon

Re: Collinearity

In reply to this post by statisticsdoc

I'd like to register an objection to the idea of "testing for collinearity". One can measure the degree of collinearity in various ways and can look at the effect - joint confidence intervals that show the degree of dependence of the estimates - but there can be no definitive rules about when there is too much short of perfect collinearity. And software will take care of that rule for you in ways varying between helpful and rude. Collinearity is a matter of degree, not a yes or no outcome.

As long as you don't have experimental data designed to be orthogonal, you are going to have collinearity to some degree, and the more there is, the more unstable the estimates will be, but any rule short of perfect collinearity is arbitrary.

One useful reality check, collinearity or not, is this. Consider the accuracy of your variables - say you believe the values are correct to three or four significant figures. Then add a random variable to the variables that is small enough that the values round to the actual values to that degree of accuracy. Rerun your estimates and see how much you care about the differences in the results.

My two cents.

Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Statisticsdoc
Sent: Thursday, February 01, 2007 8:57 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Collinearity

Stephen Brand
www.statisticsdoc.com

Albert-jan,

A great deal of good advice has been given on this topic, particularly
Anita's suggestion to utilize CATREG. Just to add a couple of small items
to the pool, I would suggest the following:

(1) Perfect collinearity exists when one independent variable can be
predicted by a linear combination of the other independent variables, so in
addition to looking at the bivariate correlations between the predictors,
examine the multiple regression between each predictor and the other
predictors (e.g., to what extent can X1 be predicted by a weighted
combination of X2 and X3).

(2) If you have a large sample, you might want to consider splitting it
randomly into halves, and conducting the logistic regression analysis in
both halves, or cross-validating the regression weights from one half in the
other half. This approach will give some indication of how robust the
parameter estimates are.

HTH,

Stephen Brand

For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Albert-jan Roskam
Sent: Thursday, February 01, 2007 6:39 AM
To: [hidden email]
Subject: Collinearity

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

____________________________________________________________________________
________
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

Ornelas, Fermin

Re: Collinearity

In reply to this post by Albert-Jan Roskam

There is another issue aside collinearity. If the final model is a
logistic one whose purpose might be prediction then collinearity may not
be a big issue as long as development and validation results appear
reliable. In regular OLS regression with degrading collinearity
hypothesis testing is not valid.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Statisticsdoc
Sent: Thursday, February 01, 2007 7:57 PM
To: [hidden email]
Subject: Re: Collinearity

Stephen Brand
www.statisticsdoc.com

Albert-jan,

A great deal of good advice has been given on this topic, particularly
Anita's suggestion to utilize CATREG. Just to add a couple of small
items
to the pool, I would suggest the following:

(1) Perfect collinearity exists when one independent variable can be
predicted by a linear combination of the other independent variables, so
in
addition to looking at the bivariate correlations between the
predictors,
examine the multiple regression between each predictor and the other
predictors (e.g., to what extent can X1 be predicted by a weighted
combination of X2 and X3).

(2) If you have a large sample, you might want to consider splitting it
randomly into halves, and conducting the logistic regression analysis in
both halves, or cross-validating the regression weights from one half in
the
other half. This approach will give some indication of how robust the
parameter estimates are.

HTH,

Stephen Brand

For personalized and professional consultation in statistics and
research
design, visit
www.statisticsdoc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Albert-jan Roskam
Sent: Thursday, February 01, 2007 6:39 AM
To: [hidden email]
Subject: Collinearity

Dear list,

I would like to test for collinearity between three
ordinal variables. The variables have different
numbers of values, but are coded in a similar way,
i.e. category 1 is the lowest category for all three
vars.

I calculated Spearman's rho correlations for these
variables. The correlation coefficient never exceeds
.53; well below the generally used rule-of-thumb that
it should not exceed .85. --btw, does anybody have a
good reference for this rule?

Can I now safely assume that my variables are not
collinear when I use them simultaneously as
independent predictors in a logistic regression
analysis?

Thank you for your replies!

Albert-Jan

________________________________________________________________________
____
________
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific individual(s) to whom it is addressed. It may contain
information that is privileged and confidential under state and federal
law. This information may be used or disclosed only in accordance with
law, and you may be subject to penalties under law for improper use or
further disclosure of the information in this e-mail and its
attachments. If you have received this e-mail in error, please
immediately notify the person named above by reply e-mail, and then
delete the original e-mail. Thank you.

statisticsdoc

Re: Collinearity

In reply to this post by Albert-Jan Roskam

Jon,

Good point. I think most of those who have posted on this topic would agree that collinearity is a matter of degree, not an either/or condition. Perhaps a better way to phrase the initial question in this topic is "How do I assess the magnitude of collinearity among my predictors?" This a particularly interesting topic with respect to logistic regression.

Best,

Stephen Brand

---- "Peck wrote:

> I'd like to register an objection to the idea of "testing for collinearity". One can measure the degree of collinearity in various ways and can look at the effect - joint confidence intervals that show the degree of dependence of the estimates - but there can be no definitive rules about when there is too much short of perfect collinearity. And software will take care of that rule for you in ways varying between helpful and rude. Collinearity is a matter of degree, not a yes or no outcome.
>
> As long as you don't have experimental data designed to be orthogonal, you are going to have collinearity to some degree, and the more there is, the more unstable the estimates will be, but any rule short of perfect collinearity is arbitrary.
>
> One useful reality check, collinearity or not, is this. Consider the accuracy of your variables - say you believe the values are correct to three or four significant figures. Then add a random variable to the variables that is small enough that the values round to the actual values to that degree of accuracy. Rerun your estimates and see how much you care about the differences in the results.
>
> My two cents.
>
> Jon Peck
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Statisticsdoc
> Sent: Thursday, February 01, 2007 8:57 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Collinearity
>
> Stephen Brand
> www.statisticsdoc.com
>
> Albert-jan,
>
> A great deal of good advice has been given on this topic, particularly
> Anita's suggestion to utilize CATREG. Just to add a couple of small items
> to the pool, I would suggest the following:
>
> (1) Perfect collinearity exists when one independent variable can be
> predicted by a linear combination of the other independent variables, so in
> addition to looking at the bivariate correlations between the predictors,
> examine the multiple regression between each predictor and the other
> predictors (e.g., to what extent can X1 be predicted by a weighted
> combination of X2 and X3).
>
> (2) If you have a large sample, you might want to consider splitting it
> randomly into halves, and conducting the logistic regression analysis in
> both halves, or cross-validating the regression weights from one half in the
> other half. This approach will give some indication of how robust the
> parameter estimates are.
>
> HTH,
>
> Stephen Brand
>
> For personalized and professional consultation in statistics and research
> design, visit
> www.statisticsdoc.com
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
> Albert-jan Roskam
> Sent: Thursday, February 01, 2007 6:39 AM
> To: [hidden email]
> Subject: Collinearity
>
>
> Dear list,
>
> I would like to test for collinearity between three
> ordinal variables. The variables have different
> numbers of values, but are coded in a similar way,
> i.e. category 1 is the lowest category for all three
> vars.
>
> I calculated Spearman's rho correlations for these
> variables. The correlation coefficient never exceeds
> .53; well below the generally used rule-of-thumb that
> it should not exceed .85. --btw, does anybody have a
> good reference for this rule?
>
> Can I now safely assume that my variables are not
> collinear when I use them simultaneously as
> independent predictors in a logistic regression
> analysis?
>
> Thank you for your replies!
>
> Albert-Jan
>
>
>
> ____________________________________________________________________________
> ________
> Now that's room service! Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.
> http://farechase.yahoo.com/promo-generic-14795097

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com

Art Kendall

Re: Collinearity

In reply to this post by Peck, Jon

Depending on the software you are using, you might get a "rude" message
saying the matrix of x's is singular, perfectly collinear, the matrix
cannot be inverted, or equivalently that the determinant is zero and no
further information. The most common human error causes of this are to
enter the same variable twice, to enter the complete set of dummy
variables that represent a categorical variable, to enter subtotals and
grand totals, items and total scores, etc.

A very quick and dirty way to locate which variables are involved in the
problem is to pretend that all of the x variables are items in a scale
and run RELIABILITY.
This procedure shows you the SMC - squared multiple correlation- of each
variable with the other variables. It also shows you the corrected
item-total correlation, the correlation of each item with the sum of the
other items. Items that have SMCs (R**2s) of 1.00 have perfect
redundancy. The column of SMCs shows the fit of all possible regressions
of each variable in the set with all other variables in the set. The
SMCs tells you the degree to which variables are collinear (redundant)
with the other variables in the set.

Which variable(s) to drop from the set will depend on the substantive
nature of your analysis.

Art Kendall
Social Research Consultants

Peck, Jon wrote:

Art Kendall
Social Research Consultants

Maguin, Eugene

bioequivalence

In reply to this post by Ornelas, Fermin

All,

I am in need of some help. The basic problem is that I want to do a test of
equivalence of means from a paired t-test. I think this falls in the area of
bioequivalence. However, google is not so helpful as I would have hoped. Can
someone suggest a basic but useful article or book or website for my
education. I also like to know how to set this up in spss--I understand that
there is not a procedure for this but rather what numbers need to be
computed and how should they be combined.

Thanks,
Gene Maguin

Swank, Paul R

Re: Collinearity

In reply to this post by Albert-Jan Roskam

It seems to me that collinearity means dependence, that is, if the data
are collinear, there is a dependence. In that sense then we should talk
about the degree to which the data approaches collinearity, rather than
the degree of collinearity. Data are rarely collinear unless someone
makes a mistake with their data. But sometimes the data approaches
dependence or collinearity and because of the inability of the software
to manage with the finite number of digits available, it causes a
problem. Or perhaps I'm wrong and it is collinear that means dependency
and collinearity means the degree to which the data approaches being
collinear. Again it is the semantics that get in our way.

Paul R. Swank, Ph.D. Professor
Director of Reseach
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Statisticsdoc
Sent: Friday, February 02, 2007 9:08 AM
To: [hidden email]
Subject: Re: Collinearity

Jon,

Good point. I think most of those who have posted on this topic would
agree that collinearity is a matter of degree, not an either/or
condition. Perhaps a better way to phrase the initial question in this
topic is "How do I assess the magnitude of collinearity among my
predictors?" This a particularly interesting topic with respect to
logistic regression.

Best,

Stephen Brand

---- "Peck wrote:
> I'd like to register an objection to the idea of "testing for
collinearity". One can measure the degree of collinearity in various
ways and can look at the effect - joint confidence intervals that show
the degree of dependence of the estimates - but there can be no
definitive rules about when there is too much short of perfect
collinearity. And software will take care of that rule for you in ways
varying between helpful and rude. Collinearity is a matter of degree,
not a yes or no outcome.
>
> As long as you don't have experimental data designed to be orthogonal,
you are going to have collinearity to some degree, and the more there
is, the more unstable the estimates will be, but any rule short of
perfect collinearity is arbitrary.
>
> One useful reality check, collinearity or not, is this. Consider the
accuracy of your variables - say you believe the values are correct to
three or four significant figures. Then add a random variable to the
variables that is small enough that the values round to the actual
values to that degree of accuracy. Rerun your estimates and see how
much you care about the differences in the results.

>
> My two cents.
>
> Jon Peck
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of Statisticsdoc
> Sent: Thursday, February 01, 2007 8:57 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Collinearity
>
> Stephen Brand
> www.statisticsdoc.com
>
> Albert-jan,
>
> A great deal of good advice has been given on this topic, particularly

> Anita's suggestion to utilize CATREG. Just to add a couple of small
> items to the pool, I would suggest the following:
>
> (1) Perfect collinearity exists when one independent variable can be
> predicted by a linear combination of the other independent variables,
> so in addition to looking at the bivariate correlations between the
> predictors, examine the multiple regression between each predictor and

> the other predictors (e.g., to what extent can X1 be predicted by a
> weighted combination of X2 and X3).
>
> (2) If you have a large sample, you might want to consider splitting
> it randomly into halves, and conducting the logistic regression
> analysis in both halves, or cross-validating the regression weights
> from one half in the other half. This approach will give some
> indication of how robust the parameter estimates are.
>
> HTH,
>
> Stephen Brand
>
> For personalized and professional consultation in statistics and
> research design, visit www.statisticsdoc.com
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf
> Of Albert-jan Roskam
> Sent: Thursday, February 01, 2007 6:39 AM
> To: [hidden email]
> Subject: Collinearity
>
>
> Dear list,
>
> I would like to test for collinearity between three ordinal variables.

> The variables have different numbers of values, but are coded in a
> similar way, i.e. category 1 is the lowest category for all three
> vars.
>
> I calculated Spearman's rho correlations for these variables. The
> correlation coefficient never exceeds .53; well below the generally
> used rule-of-thumb that it should not exceed .85. --btw, does anybody
> have a good reference for this rule?
>
> Can I now safely assume that my variables are not collinear when I use

> them simultaneously as independent predictors in a logistic regression

> analysis?
>
> Thank you for your replies!
>
> Albert-Jan
>
>
>
> ______________________________________________________________________
> ______
> ________
> Now that's room service! Choose from over 150,000 hotels in 45,000
> destinations on Yahoo! Travel to find your fit.
> http://farechase.yahoo.com/promo-generic-14795097

--
For personalized and experienced consulting in statistics and research
design, visit www.statisticsdoc.com

Ornelas, Fermin

Re: Collinearity

In reply to this post by Albert-Jan Roskam

Variables that are collinear, as other responses have mentioned is
broadly defined as a relationship of one variable with another or with a
group of variables. As expected such relationship is measured by a
correlation coefficient whose values are from -1 to 1. The closer to
either extreme the more strong the relation would be. As others have
already pointed out more often than not this problem will present in
empirical work. The question is when it will represent a problem and
that is where the collinear diagnostics will help the researcher to
determine the severity of the problem. A correlation matrix of the
predictor variables should give the researcher a good idea on this. More
precisely, the condition index, the variance proportion values, and the
variance inflation factors will tell the researcher how serious the
problem is. Having said that in my own experience, if my regression
model has say 10 predictors and three of them are collinear (condition
index < 30, variance proportion < .5, and VIF < 7) then I can live with
this. Remember the alternatives to this are very limited. One can try to
collect additional data, try ridge regression, try different variables.
But if modeling intuition and experience and sign expectations hold then
one may decide to keep the model as it is. Of course, if development and
validation results are not deteriorating either, then one should leave
the model alone, especially if prediction is the main task for the
model.

As one of my professors used to say collinearity is like bad and good
cholesterol.

Fermin Ornelas, Ph.D.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R
Sent: Friday, February 02, 2007 10:07 AM
To: [hidden email]
Subject: Re: Collinearity

It seems to me that collinearity means dependence, that is, if the data
are collinear, there is a dependence. In that sense then we should talk
about the degree to which the data approaches collinearity, rather than
the degree of collinearity. Data are rarely collinear unless someone
makes a mistake with their data. But sometimes the data approaches
dependence or collinearity and because of the inability of the software
to manage with the finite number of digits available, it causes a
problem. Or perhaps I'm wrong and it is collinear that means dependency
and collinearity means the degree to which the data approaches being
collinear. Again it is the semantics that get in our way.

Paul R. Swank, Ph.D. Professor
Director of Reseach
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Statisticsdoc
Sent: Friday, February 02, 2007 9:08 AM
To: [hidden email]
Subject: Re: Collinearity

Jon,

Good point. I think most of those who have posted on this topic would
agree that collinearity is a matter of degree, not an either/or
condition. Perhaps a better way to phrase the initial question in this
topic is "How do I assess the magnitude of collinearity among my
predictors?" This a particularly interesting topic with respect to
logistic regression.

Best,

Stephen Brand

---- "Peck wrote:
> I'd like to register an objection to the idea of "testing for
collinearity". One can measure the degree of collinearity in various
ways and can look at the effect - joint confidence intervals that show
the degree of dependence of the estimates - but there can be no
definitive rules about when there is too much short of perfect
collinearity. And software will take care of that rule for you in ways
varying between helpful and rude. Collinearity is a matter of degree,
not a yes or no outcome.
>
> As long as you don't have experimental data designed to be orthogonal,
you are going to have collinearity to some degree, and the more there
is, the more unstable the estimates will be, but any rule short of
perfect collinearity is arbitrary.
>
> One useful reality check, collinearity or not, is this. Consider the
accuracy of your variables - say you believe the values are correct to
three or four significant figures. Then add a random variable to the
variables that is small enough that the values round to the actual
values to that degree of accuracy. Rerun your estimates and see how
much you care about the differences in the results.

> them simultaneously as independent predictors in a logistic regression

--
For personalized and experienced consulting in statistics and research
design, visit www.statisticsdoc.com

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific individual(s) to whom it is addressed. It may contain
information that is privileged and confidential under state and federal
law. This information may be used or disclosed only in accordance with
law, and you may be subject to penalties under law for improper use or
further disclosure of the information in this e-mail and its
attachments. If you have received this e-mail in error, please
immediately notify the person named above by reply e-mail, and then
delete the original e-mail. Thank you.

Richard Ristow

Re: Collinearity

In reply to this post by Swank, Paul R

At 12:06 PM 2/2/2007, Swank, Paul R wrote:

>Data are rarely collinear unless someone makes a mistake with their
>data. But sometimes the data approaches dependence or collinearity
>because of the the finite number of digits available, it causes a
>problem. Or perhaps I'm wrong and it is collinear that means
>dependency.

I'm not with you on this one.

Yes, any collinearity, however small, proves at least partial
statistical dependence among the variables. (It's shown in elementary
books on statistics that the converse is not true.)

*Perfect* collinearity, singularity of the data matrix, does almost
always result from mistakes: including variables with a structural
linear relationship. A common mistake, that Art Kendall pointed out, is
including dummies for all levels of a categorical variable, when
there's also a constant in the model.

If collinearity is so near that finite precision of computation makes a
difference, you can be pretty certain that there's a structural
relationship. Modern precision of numbers, and of algorithms, can
handle any degree of collinearity likely if there's not a structural
relationship.

But a degree of collinearity - pairwise, strong correlation between
variables; overall, high values of matrix condition index or similar
measures - is often found in real data without making gross mistakes.
Common hypotheses behind this are that some variables have a partial
causal effect on others; or, that variables have partial causal effects
on several of the observed variables.

As others have said, any degree of collinearity reduces the precision
with which parameters like regression coefficients can be estimated.
That isn't a problem of limited computation precision; it's as real a
measure of uncertainty as any other standard error of estimate.

How much collinearity it takes to create a problem varies, mainly with
how precise the data is otherwise. Correlations above 0.8 have been
mentioned in this discussion. I was recently on a study where a
correlation of 0.69 essentially prevented estimating the regression
coefficient for either variable. (This was a psychological study using
questionnaire instruments, with a sample size of about 50. The
correlation was between two variables with little *a priori*
relationship. The connection is well up among the 'questions for
further research.')

If you really want to include a set of variables with high collinearity
- and there may well be reasons - transforming the data matrix may
help. There are sophisticated techniques like factor analysis, of
course. But simpler ones, like replacing two correlated variables by
their mean and their difference (when they're similarly scaled) are
often illuminating.

-Cheers, and onward,
Richard

statisticsdoc

Re: Collinearity

In reply to this post by Art Kendall

Art-

I would not call this quick and dirty - more like quick and very neat!

Thanks,

Steve

Art said:

A very quick and dirty way to locate which variables are involved in the
problem is to pretend that all of the x variables are items in a scale
and run RELIABILITY.
This procedure shows you the SMC - squared multiple correlation- of each
variable with the other variables. It also shows you the corrected
item-total correlation, the correlation of each item with the sum of the
other items. Items that have SMCs (R**2s) of 1.00 have perfect
redundancy. The column of SMCs shows the fit of all possible regressions
of each variable in the set with all other variables in the set. The
SMCs tells you the degree to which variables are collinear (redundant)
with the other variables in the set.

For personalized and professional consultation in statistics and research design, visit
www.statisticsdoc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Art Kendall
Sent: Friday, February 02, 2007 11:07 AM
To: [hidden email]
Subject: Re: Collinearity

Which variable(s) to drop from the set will depend on the substantive
nature of your analysis.

Art Kendall
Social Research Consultants

Peck, Jon wrote:

Albert-Jan Roskam

Re: Collinearity

In reply to this post by Art Kendall

Hi dear list,

Thanks a lot for your replies. They have been of great
help and I learnt a lot from them!

Btw, I was able to find a reference about this topic:
Farrar DE, Glauber, R.R. Multicollinearity in
Regression Analysis: The Problem Revisited. The Review
of Economics and Statistics 1967;49:92-107.

Perhaps this is the original source of the classic
r=.85 rule-of-thumb.

Thanks again!

Albert-Jan

--- Art Kendall <[hidden email]> wrote:

> Depending on the software you are using, you might
> get a "rude" message
> saying the matrix of x's is singular, perfectly
> collinear, the matrix
> cannot be inverted, or equivalently that the
> determinant is zero and no
> further information. The most common human error
> causes of this are to
> enter the same variable twice, to enter the complete
> set of dummy
> variables that represent a categorical variable, to
> enter subtotals and
> grand totals, items and total scores, etc.
>
> A very quick and dirty way to locate which variables
> are involved in the
> problem is to pretend that all of the x variables
> are items in a scale
> and run RELIABILITY.
> This procedure shows you the SMC - squared multiple
> correlation- of each
> variable with the other variables. It also shows
> you the corrected
> item-total correlation, the correlation of each item
> with the sum of the
> other items. Items that have SMCs (R**2s) of 1.00
> have perfect
> redundancy. The column of SMCs shows the fit of all
> possible regressions
> of each variable in the set with all other variables
> in the set. The
> SMCs tells you the degree to which variables are
> collinear (redundant)
> with the other variables in the set.
>
> Which variable(s) to drop from the set will depend
> on the substantive
> nature of your analysis.
>
> Art Kendall
> Social Research Consultants
>
>
>
>
>
> Peck, Jon wrote:
> > I'd like to register an objection to the idea of
> "testing for collinearity". One can measure the
> degree of collinearity in various ways and can look
> at the effect - joint confidence intervals that show
> the degree of dependence of the estimates - but
> there can be no definitive rules about when there is
> too much short of perfect collinearity. And
> software will take care of that rule for you in ways
> varying between helpful and rude. Collinearity is a
> matter of degree, not a yes or no outcome.
> >
> > As long as you don't have experimental data
> designed to be orthogonal, you are going to have
> collinearity to some degree, and the more there is,
> the more unstable the estimates will be, but any
> rule short of perfect collinearity is arbitrary.
> >
> > One useful reality check, collinearity or not, is
> this. Consider the accuracy of your variables - say
> you believe the values are correct to three or four
> significant figures. Then add a random variable to
> the variables that is small enough that the values
> round to the actual values to that degree of
> accuracy. Rerun your estimates and see how much you
> care about the differences in the results.
> >
> > My two cents.
> >
> > Jon Peck
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of
> Statisticsdoc
> > Sent: Thursday, February 01, 2007 8:57 PM
> > To: [hidden email]
> > Subject: Re: [SPSSX-L] Collinearity
> >
> > Stephen Brand
> > www.statisticsdoc.com
> >
> > Albert-jan,
> >
> > A great deal of good advice has been given on this
> topic, particularly
> > Anita's suggestion to utilize CATREG. Just to add
> a couple of small items
> > to the pool, I would suggest the following:
> >
> > (1) Perfect collinearity exists when one
> independent variable can be
> > predicted by a linear combination of the other
> independent variables, so in
> > addition to looking at the bivariate correlations
> between the predictors,
> > examine the multiple regression between each
> predictor and the other
> > predictors (e.g., to what extent can X1 be
> predicted by a weighted
> > combination of X2 and X3).
> >
> > (2) If you have a large sample, you might want to
> consider splitting it
> > randomly into halves, and conducting the logistic
> regression analysis in
> > both halves, or cross-validating the regression
> weights from one half in the
> > other half. This approach will give some
> indication of how robust the
> > parameter estimates are.
> >
> > HTH,
> >
> > Stephen Brand
> >
> > For personalized and professional consultation in
> statistics and research
> > design, visit
> > www.statisticsdoc.com
> >
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion
> [mailto:[hidden email]]On Behalf Of
> > Albert-jan Roskam
> > Sent: Thursday, February 01, 2007 6:39 AM
> > To: [hidden email]
> > Subject: Collinearity
> >
> >
> > Dear list,
> >
> > I would like to test for collinearity between
> three
> > ordinal variables. The variables have different
> > numbers of values, but are coded in a similar way,
> > i.e. category 1 is the lowest category for all
> three
> > vars.
> >
> > I calculated Spearman's rho correlations for these
> > variables. The correlation coefficient never
> exceeds
> > .53; well below the generally used rule-of-thumb
> that
> > it should not exceed .85. --btw, does anybody have
> a
> > good reference for this rule?
> >
> > Can I now safely assume that my variables are not
> > collinear when I use them simultaneously as
> > independent predictors in a logistic regression
> > analysis?
> >
> > Thank you for your replies!
> >
> > Albert-Jan
> >
> >
> >
> >
>

____________________________________________________________________________

> > ________
> > Now that's room service! Choose from over 150,000
> hotels
> > in 45,000 destinations on Yahoo! Travel to find
> your fit.
> > http://farechase.yahoo.com/promo-generic-14795097
> >
> >
> >
>

____________________________________________________________________________________
The fish are biting.
Get more visitors on your site using Yahoo! Search Marketing.
http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php

Art Kendall-2

Re: Collinearity

In reply to this post by statisticsdoc

It has worked for me since SPSS included RELIABILITY in the mid-70s.

Art

Statisticsdoc wrote:

> Art-
>
> I would not call this quick and dirty - more like quick and very neat!
>
> Thanks,
>
> Steve
>
> Art said:
>
> A very quick and dirty way to locate which variables are involved in the
> problem is to pretend that all of the x variables are items in a scale
> and run RELIABILITY.
> This procedure shows you the SMC - squared multiple correlation- of each
> variable with the other variables. It also shows you the corrected
> item-total correlation, the correlation of each item with the sum of the
> other items. Items that have SMCs (R**2s) of 1.00 have perfect
> redundancy. The column of SMCs shows the fit of all possible regressions
> of each variable in the set with all other variables in the set. The
> SMCs tells you the degree to which variables are collinear (redundant)
> with the other variables in the set.
>
>
> For personalized and professional consultation in statistics and research design, visit
> www.statisticsdoc.com
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
> Art Kendall
> Sent: Friday, February 02, 2007 11:07 AM
> To: [hidden email]
> Subject: Re: Collinearity
>
>
> Which variable(s) to drop from the set will depend on the substantive
> nature of your analysis.
>
> Art Kendall
> Social Research Consultants
>
>
>
>
>
> Peck, Jon wrote:
>
>> I'd like to register an objection to the idea of "testing for collinearity". One can measure the degree of collinearity in various ways and can look at the effect - joint confidence intervals that show the degree of dependence of the estimates - but there can be no definitive rules about when there is too much short of perfect collinearity. And software will take care of that rule for you in ways varying between helpful and rude. Collinearity is a matter of degree, not a yes or no outcome.
>>
>> As long as you don't have experimental data designed to be orthogonal, you are going to have collinearity to some degree, and the more there is, the more unstable the estimates will be, but any rule short of perfect collinearity is arbitrary.
>>
>> One useful reality check, collinearity or not, is this. Consider the accuracy of your variables - say you believe the values are correct to three or four significant figures. Then add a random variable to the variables that is small enough that the values round to the actual values to that degree of accuracy. Rerun your estimates and see how much you care about the differences in the results.
>>
>> My two cents.
>>
>> Jon Peck
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Statisticsdoc
>> Sent: Thursday, February 01, 2007 8:57 PM
>> To: [hidden email]
>> Subject: Re: [SPSSX-L] Collinearity
>>
>> Stephen Brand
>> www.statisticsdoc.com
>>
>> Albert-jan,
>>
>> A great deal of good advice has been given on this topic, particularly
>> Anita's suggestion to utilize CATREG. Just to add a couple of small items
>> to the pool, I would suggest the following:
>>
>> (1) Perfect collinearity exists when one independent variable can be
>> predicted by a linear combination of the other independent variables, so in
>> addition to looking at the bivariate correlations between the predictors,
>> examine the multiple regression between each predictor and the other
>> predictors (e.g., to what extent can X1 be predicted by a weighted
>> combination of X2 and X3).
>>
>> (2) If you have a large sample, you might want to consider splitting it
>> randomly into halves, and conducting the logistic regression analysis in
>> both halves, or cross-validating the regression weights from one half in the
>> other half. This approach will give some indication of how robust the
>> parameter estimates are.
>>
>> HTH,
>>
>> Stephen Brand
>>
>> For personalized and professional consultation in statistics and research
>> design, visit
>> www.statisticsdoc.com
>>
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
>> Albert-jan Roskam
>> Sent: Thursday, February 01, 2007 6:39 AM
>> To: [hidden email]
>> Subject: Collinearity
>>
>>
>> Dear list,
>>
>> I would like to test for collinearity between three
>> ordinal variables. The variables have different
>> numbers of values, but are coded in a similar way,
>> i.e. category 1 is the lowest category for all three
>> vars.
>>
>> I calculated Spearman's rho correlations for these
>> variables. The correlation coefficient never exceeds
>> .53; well below the generally used rule-of-thumb that
>> it should not exceed .85. --btw, does anybody have a
>> good reference for this rule?
>>
>> Can I now safely assume that my variables are not
>> collinear when I use them simultaneously as
>> independent predictors in a logistic regression
>> analysis?
>>
>> Thank you for your replies!
>>
>> Albert-Jan
>>
>>
>>
>> ____________________________________________________________________________
>> ________
>> Now that's room service! Choose from over 150,000 hotels
>> in 45,000 destinations on Yahoo! Travel to find your fit.
>> http://farechase.yahoo.com/promo-generic-14795097
>>
>>
>>
>>
>
>
>
>

Peck, Jon

Re: Collinearity

Bear in mind that the Tolerance statistic IS just 1 - R sq of each regressor on all the others. And Partial Correlations will also be helpful in going beyond that summary structure.

-Jon Peck

Statisticsdoc wrote: