Association between two nominal variables?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Association between two nominal variables?

Secrist, Kevin
Hello SPSS guru's,

I am a novice user trying to identify the association between two nominal variables and two nominal outcomes.
the variables are labeled Peer Pal (yes/no) (1 or 0) and the outcome labeled outpt (for outpatient service received) yes/no (1 or 0).
The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674.

What test(s) can SPSS 12.0 perform to indicate the association between variables and outcomes? and what would the expected value(s) be if the association exists, is weak, or is strong?


Thanks

Kevin Secrist - Administrative Analyst, Associate
Butte County Behavioral Health
CONFIDENTIALITY NOTICE:  This e-mail transmission, and any documents or messages attached to it, may contain confidential information that is legally privileged.  If you are not the intended recipient, or a person responsible for delivering this e-mail to the intended recipient, then you are (1) notified that any disclosure, copying, distribution, saving, reading or use of this information is strictly prohibited, (2) requested to discard and delete this e-mail and any attachments, and (3) requested to immediately notify us by e-mail that you mistakenly received this message [hidden email] fax (530) 895-6548, or telephone (530) 879-3305.  Thank you.

Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.
Ann. Math. Stat. 33 (1962) - John w. Tukey
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Dominic Lusinchi
Kevin,

For a 2X2 table you can use the Phi statistic, available in the Crosstab
procedure under Statistics. Phi varies from 0 to 1: closer to 1, stronger
the association. Of course this assumes that the chi-square statistic is
significant.

You can also compute the crossproduct ratio, better known as the odds ratio:
[(1,1)X(2,2)]/[(1,2)X(2,1)]; where the first number is the row and the
second the column, and these represent the frequencies in each cell.

An odd ratio (OR) of 1 means there is no relationship between the two
variables. An OR>1 means greater likelihood of the outcome given the risk
factor; an OR<1 means you are less likely to have the outcome.

I say you have to compute this because SPSS does not: that is, not in the
crosstab procedure.

Cheers,

Dominic Lusinchi
Statistician
Far West Research
Statistical Consulting
San Francisco, California
415-664-3032
www.farwestresearch.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Secrist, Kevin
Sent: Tuesday, September 19, 2006 2:24 PM
To: [hidden email]
Subject: Association between two nominal variables?

Hello SPSS guru's,

I am a novice user trying to identify the association between two nominal
variables and two nominal outcomes.
the variables are labeled Peer Pal (yes/no) (1 or 0) and the outcome labeled
outpt (for outpatient service received) yes/no (1 or 0).
The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674.

What test(s) can SPSS 12.0 perform to indicate the association between
variables and outcomes? and what would the expected value(s) be if the
association exists, is weak, or is strong?


Thanks

Kevin Secrist - Administrative Analyst, Associate
Butte County Behavioral Health
CONFIDENTIALITY NOTICE:  This e-mail transmission, and any documents or
messages attached to it, may contain confidential information that is
legally privileged.  If you are not the intended recipient, or a person
responsible for delivering this e-mail to the intended recipient, then you
are (1) notified that any disclosure, copying, distribution, saving, reading
or use of this information is strictly prohibited, (2) requested to discard
and delete this e-mail and any attachments, and (3) requested to immediately
notify us by e-mail that you mistakenly received this message
[hidden email] fax (530) 895-6548, or telephone (530) 879-3305.
Thank you.

Far better an approximate answer to the right question, which is often
vague, than the exact answer to the wrong question, which can always be made
precise.
Ann. Math. Stat. 33 (1962) - John w. Tukey
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Bob Schacht-3
At 12:42 PM 9/19/2006, Dominic Lusinchi wrote:
>Kevin,
>
>For a 2X2 table you can use the Phi statistic, available in the Crosstab
>procedure under Statistics. Phi varies from 0 to 1: closer to 1, stronger
>the association. Of course this assumes that the chi-square statistic is
>significant. . .

Of interest in this choice is whether the double null category is
meaningful-- that is, if Peer Pal is "no" (0) and the outcome labeled outpt
(for outpatient service received) is also no (0). If the double null is
meaningful and not merely a default, then the strength of Phi may depend
largely on how many double nulls there are. Is your sample population
defined in such a way that the double null applies to them differently than
to the 3 billion other people in the world for whom these two variables
have a value of 0?

If the double nulls are a poorly defined category, then you might want to
consider Jaccard's coefficient, which does not use the double nulls
(http://en.wikipedia.org/wiki/Jaccard_index). It is extremely easy to
calculate.

Bob




Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Elena Verbitskaya
In reply to this post by Dominic Lusinchi
Sorry, but you are wrong. You could count OR in SPSS using crosstab (crsstab
- statistics - RISK , version 12 and high)/ you will get OR with confidence
intervals and relative risks too
best

Elena Verbitskaya



> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Dominic Lusinchi
> Sent: Wednesday, September 20, 2006 2:43 AM
> To: [hidden email]
> Subject: Re: Association between two nominal variables?
>
>
> Kevin,
>
> For a 2X2 table you can use the Phi statistic, available in
> the Crosstab procedure under Statistics. Phi varies from 0 to
> 1: closer to 1, stronger the association. Of course this
> assumes that the chi-square statistic is significant.
>
> You can also compute the crossproduct ratio, better known as
> the odds ratio: [(1,1)X(2,2)]/[(1,2)X(2,1)]; where the first
> number is the row and the second the column, and these
> represent the frequencies in each cell.
>
> An odd ratio (OR) of 1 means there is no relationship between
> the two variables. An OR>1 means greater likelihood of the
> outcome given the risk factor; an OR<1 means you are less
> likely to have the outcome.
>
> I say you have to compute this because SPSS does not: that
> is, not in the crosstab procedure.
>
> Cheers,
>
> Dominic Lusinchi
> Statistician
> Far West Research
> Statistical Consulting
> San Francisco, California
> 415-664-3032
> www.farwestresearch.com
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Secrist, Kevin
> Sent: Tuesday, September 19, 2006 2:24 PM
> To: [hidden email]
> Subject: Association between two nominal variables?
>
> Hello SPSS guru's,
>
> I am a novice user trying to identify the association between
> two nominal variables and two nominal outcomes. the variables
> are labeled Peer Pal (yes/no) (1 or 0) and the outcome
> labeled outpt (for outpatient service received) yes/no (1 or
> 0). The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674.
>
> What test(s) can SPSS 12.0 perform to indicate the
> association between variables and outcomes? and what would
> the expected value(s) be if the association exists, is weak,
> or is strong?
>
>
> Thanks
>
> Kevin Secrist - Administrative Analyst, Associate
> Butte County Behavioral Health
> CONFIDENTIALITY NOTICE:  This e-mail transmission, and any
> documents or messages attached to it, may contain
> confidential information that is legally privileged.  If you
> are not the intended recipient, or a person responsible for
> delivering this e-mail to the intended recipient, then you
> are (1) notified that any disclosure, copying, distribution,
> saving, reading or use of this information is strictly
> prohibited, (2) requested to discard and delete this e-mail
> and any attachments, and (3) requested to immediately notify
> us by e-mail that you mistakenly received this message
> [hidden email] fax (530) 895-6548, or telephone
> (530) 879-3305. Thank you.
>
> Far better an approximate answer to the right question, which
> is often vague, than the exact answer to the wrong question,
> which can always be made precise. Ann. Math. Stat. 33 (1962)
> - John w. Tukey
>
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Dominic Lusinchi
You are absolutely correct, Elena. I stand corrected.

Thank you for pointing that out.

Cheers,
Dominic

-----Original Message-----
From: Elena Verbitskaya [mailto:[hidden email]]
Sent: Wednesday, September 20, 2006 1:44 AM
To: 'Dominic Lusinchi'; [hidden email]
Subject: RE: Association between two nominal variables?

Sorry, but you are wrong. You could count OR in SPSS using crosstab (crsstab
- statistics - RISK , version 12 and high)/ you will get OR with confidence
intervals and relative risks too
best

Elena Verbitskaya



> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Dominic Lusinchi
> Sent: Wednesday, September 20, 2006 2:43 AM
> To: [hidden email]
> Subject: Re: Association between two nominal variables?
>
>
> Kevin,
>
> For a 2X2 table you can use the Phi statistic, available in
> the Crosstab procedure under Statistics. Phi varies from 0 to
> 1: closer to 1, stronger the association. Of course this
> assumes that the chi-square statistic is significant.
>
> You can also compute the crossproduct ratio, better known as
> the odds ratio: [(1,1)X(2,2)]/[(1,2)X(2,1)]; where the first
> number is the row and the second the column, and these
> represent the frequencies in each cell.
>
> An odd ratio (OR) of 1 means there is no relationship between
> the two variables. An OR>1 means greater likelihood of the
> outcome given the risk factor; an OR<1 means you are less
> likely to have the outcome.
>
> I say you have to compute this because SPSS does not: that
> is, not in the crosstab procedure.
>
> Cheers,
>
> Dominic Lusinchi
> Statistician
> Far West Research
> Statistical Consulting
> San Francisco, California
> 415-664-3032
> www.farwestresearch.com
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Secrist, Kevin
> Sent: Tuesday, September 19, 2006 2:24 PM
> To: [hidden email]
> Subject: Association between two nominal variables?
>
> Hello SPSS guru's,
>
> I am a novice user trying to identify the association between
> two nominal variables and two nominal outcomes. the variables
> are labeled Peer Pal (yes/no) (1 or 0) and the outcome
> labeled outpt (for outpatient service received) yes/no (1 or
> 0). The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674.
>
> What test(s) can SPSS 12.0 perform to indicate the
> association between variables and outcomes? and what would
> the expected value(s) be if the association exists, is weak,
> or is strong?
>
>
> Thanks
>
> Kevin Secrist - Administrative Analyst, Associate
> Butte County Behavioral Health
> CONFIDENTIALITY NOTICE:  This e-mail transmission, and any
> documents or messages attached to it, may contain
> confidential information that is legally privileged.  If you
> are not the intended recipient, or a person responsible for
> delivering this e-mail to the intended recipient, then you
> are (1) notified that any disclosure, copying, distribution,
> saving, reading or use of this information is strictly
> prohibited, (2) requested to discard and delete this e-mail
> and any attachments, and (3) requested to immediately notify
> us by e-mail that you mistakenly received this message
> [hidden email] fax (530) 895-6548, or telephone
> (530) 879-3305. Thank you.
>
> Far better an approximate answer to the right question, which
> is often vague, than the exact answer to the wrong question,
> which can always be made precise. Ann. Math. Stat. 33 (1962)
> - John w. Tukey
>
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Bob Schacht-3
In reply to this post by Secrist, Kevin
At 04:42 AM 9/20/2006, Secrist, Kevin wrote:

>Thank you Bob and everyone who responded to my question.
>
>I utilized phi and Cramer's V.  and got an value of -.128 for phi and .128
>for Cramer's V.  So, as I understand it, the association between my
>variable and outcome is rather low. Is there an interpretive scale of
>influence like correlation coefficient
>.0 to .2 weak or no relationship
>.2 to .4 weak relationship
>.4 to .6 moderate relationship
>.6 to .8 strong relationship
>.8 to 1.0 very strong relationship
>(Salkind, Neil "Statistics for People who think they hate statistics, 2000
>pg. 96)

Kevin,
The Phi coefficient is related to the Chi-Square statistic; Blalock* says
Phi-square = Chi-square/N.
So I guess you could calculate N*Phi(squared) and use the Chi-square
tables. Of course, you could also just calculate the Chi-square directly.
Why are you avoiding the use of a simple Chi-square?

BTW, for a 2x2 table you should probably be using Fisher's exact test
anyway. Why not?

Bob

*Blalock, Hubert M. (rev. ed., 1972) Social Statistics.
Yes, there are more recent editions, but they weren't in print yet when I
was in grad school <g>






>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
>Bob Schacht
>Sent: Tuesday, September 19, 2006 4:32 PM
>To: [hidden email]
>Subject: Re: Association between two nominal variables?
>
>
>At 12:42 PM 9/19/2006, Dominic Lusinchi wrote:
> >Kevin,
> >
> >For a 2X2 table you can use the Phi statistic, available in the Crosstab
> >procedure under Statistics. Phi varies from 0 to 1: closer to 1, stronger
> >the association. Of course this assumes that the chi-square statistic is
> >significant. . .
>
>Of interest in this choice is whether the double null category is
>meaningful-- that is, if Peer Pal is "no" (0) and the outcome labeled outpt
>(for outpatient service received) is also no (0). If the double null is
>meaningful and not merely a default, then the strength of Phi may depend
>largely on how many double nulls there are. Is your sample population
>defined in such a way that the double null applies to them differently than
>to the 3 billion other people in the world for whom these two variables
>have a value of 0?
>
>If the double nulls are a poorly defined category, then you might want to
>consider Jaccard's coefficient, which does not use the double nulls
>(http://en.wikipedia.org/wiki/Jaccard_index). It is extremely easy to
>calculate.
>
>Bob
>
>
>
>
>Robert M. Schacht, Ph.D. <[hidden email]>
>Pacific Basin Rehabilitation Research & Training Center
>1268 Young Street, Suite #204
>Research Center, University of Hawaii
>Honolulu, HI 96814
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Scott Czepiel
Does anyone else experience pangs of nausea upon seeing this
"interpretation" of the correlation coefficient?  General rules of thumb
like this are highly dangerous:  what constitutes a weak or strong
correlation is entirely dependent on context.  For example, if you found a
0.7 correlation between number of planes that take off and the number that
subsequently land, would you really be happy to conclude that you'd found a
strong relationship?  Likewise, studies of manufacturing tolerance or mtbf
very often deal with situations where a 0.98 is too weak and a product is
scrapped or a factory floor is shut down.  Please be highly skeptical of
such reductionistic attempts of making statistics too easy!


>.0 to .2 weak or no relationship
> >.2 to .4 weak relationship
> >.4 to .6 moderate relationship
> >.6 to .8 strong relationship
> >.8 to 1.0 very strong relationship
> >(Salkind, Neil "Statistics for People who think they hate statistics,
> 2000
> >pg. 96)
>
Reply | Threaded
Open this post in threaded view
|

Re: Association between two nominal variables?

Sean McKenzie
Two Cents.

My background is mainly as a MacroEconomist forecasting time series.  I have
particularistic rules of thumb, like in a single variable equation where I
see R2>0.5 I say its good, thise means an R of .71+, but four multivariate
equations you tend to be looking for much higher numbers.

If you are casually working in the social sciences, especially outside of
economics, the above or below as rules of thumb for the non expert are OK, I
would say, but yes it is all particularistic.

In dealing with my undergraduate students, interns et al,  a classic is they
want to take the ratio of the variance to the mean, and I point out to them
that for a Standard Normal Variable that Statistic Can not be calculated;
OTOH >manufacturing tolerance or mtbf very often deal with situations where
a 0.98 is too weak and a product is< that measure is De Rigeur in other
situations.

Another one is contants of regression.

For those of you who read Barrons or Investor's Business daily, Alpha and
Beta in evaluating stocks is based on a simple regression like this:

Stock Price=Alpha + Beta*S&P500

And so Alpha is interpreted as some autonomous aspect of a stocks return,
and Beta its correlation with the overall market.

Simple Keynesian Consumption Functions (in economics) you will still see
explained in the same way:

Consumption=A+B*Income

Where A is interpreted as some autonomous component of consumption and B is
called the MPC=Marginal Propensity to Consume.

Stuff like this is still in many undergraduate text books, and talked about
by economists et al all the time.

The thing is these interpretations have been around for many years as the
Analysis goes back to the 19320's and 30's when running an equation per
semestre was a big deal.

When I was a first year graduate students asking for interpretations of
constants of regression was common, yet our professors would say that we
usually do not try to interpret them, even if we always include them. (even
though we still talks about Keynes and autonomous components of consumption,
anyone here an economist of more recent vintage than myself?  Have they
finally stopped saying this stuff?).

The thing is that as you add and subtract variables from your equations
these constants move from positive to negative, from low to big magnitudes
etc..Do a classic Keynesian Consumption Function but then add in Interest
rates or exchange rates and see what happens.

Too new a user of SPSS to say anything useful, so I thought I'd throw in my
2 cents.



>From: Scott Czepiel <[hidden email]>
>Reply-To: Scott Czepiel <[hidden email]>
>To: [hidden email]
>Subject: Re: Association between two nominal variables?
>Date: Wed, 20 Sep 2006 15:27:07 -0400
>
>Does anyone else experience pangs of nausea upon seeing this
>"interpretation" of the correlation coefficient?  General rules of thumb
>like this are highly dangerous:  what constitutes a weak or strong
>correlation is entirely dependent on context.  For example, if you found a
>0.7 correlation between number of planes that take off and the number that
>subsequently land, would you really be happy to conclude that you'd found a
>strong relationship?  Likewise, studies of manufacturing tolerance or mtbf
>very often deal with situations where a 0.98 is too weak and a product is
>scrapped or a factory floor is shut down.  Please be highly skeptical of
>such reductionistic attempts of making statistics too easy!
>
>
>>.0 to .2 weak or no relationship
>> >.2 to .4 weak relationship
>> >.4 to .6 moderate relationship
>> >.6 to .8 strong relationship
>> >.8 to 1.0 very strong relationship
>> >(Salkind, Neil "Statistics for People who think they hate statistics,
>>2000
>> >pg. 96)
>>
Reply | Threaded
Open this post in threaded view
|

School Pass rates

russell-19
In reply to this post by Bob Schacht-3
Dear Listers,

I am going to risk being called stupid, but I am nonetheless going to
ask for some help.

Assume X number of primary schools having primary school leaving
examinations. Those who pass go on to three different types of secondary
schools. Entry to one of the three secondary schools is based on the
final mark-the highest prestige is accorded secondary school type 1,
followed by secondary school type 2 and then secondary school type 3.
Normal pass rates are calculated per primary schools as simply (those
who passed/those who sat)*100. This is fine, but then two schools may
achieve the same pass rate (let's say 75%), but have their learners
placed in secondary schools of opposite quality. In a real sense, the
ordinary pass rate does not take the quality of the pass rate into
account. I would like to do so

Here is an imaginary data matrix



Learn sat
Learn pass
unweightpasrate
school type 1
school type 2
school type 3
weighted passrate??

Prim 1
20
15
75.0%
15
0
0


Prim 2
20
15
75.0%
0
0
15


Prim 3
20
10
50.0%
5
3
2


Prim 4
20
8
40.0%
2
2
4


Prim 5
20
20
100.0%
5
8
7


Totals
100
68

27
13
28


Many thanks,
Russell
Reply | Threaded
Open this post in threaded view
|

TwoStep cluster analysis

cbautista
In reply to this post by Bob Schacht-3
Hi,

I have a dataset of 600 subjects and I'm using the twostep cluster
analysis; however, I've found that this exploratory method works with large
(or very large) data sets, so my question is. 600 is a large dataset? I
think so. Besides, each categorical variable needs to have a multinomial
distribution, is it possible to test this distribution with SPSS?

thanks,

/Christian
Reply | Threaded
Open this post in threaded view
|

Re: School Pass rates

Dennis Deck
In reply to this post by russell-19
While you could define a weighted rate easily enough, it is not clear:
  a) on what basis you would establish the weights for levels 1, 2, & 3
  b) does your policy or research question demand a weighted rate

Depending on what your policy or reporting concerns are, I would argue
that it might make more sense to define two or three rates such as:
   Type 1 pass rate
   Type 1 and 2 pass rate
   Type 1, 2 and 3 pass rate (this is the current unweighted pass rate)

For some purposes, you might instead define:
   Type 1 rate
   Type 2 rate
   Type 3 rate

Dennis Deck, PhD
RMC Research Corporation
[hidden email]

-----Original Message-----
From: russell [mailto:[hidden email]]
Sent: Friday, September 22, 2006 5:59 AM
Subject: School Pass rates

Dear Listers,

I am going to risk being called stupid, but I am nonetheless going to
ask for some help.

Assume X number of primary schools having primary school leaving
examinations. Those who pass go on to three different types of secondary
schools. Entry to one of the three secondary schools is based on the
final mark-the highest prestige is accorded secondary school type 1,
followed by secondary school type 2 and then secondary school type 3.
Normal pass rates are calculated per primary schools as simply (those
who passed/those who sat)*100. This is fine, but then two schools may
achieve the same pass rate (let's say 75%), but have their learners
placed in secondary schools of opposite quality. In a real sense, the
ordinary pass rate does not take the quality of the pass rate into
account. I would like to do so

Here is an imaginary data matrix



Learn sat
Learn pass
unweightpasrate
school type 1
school type 2
school type 3
weighted passrate??

Prim 1
20
15
75.0%
15
0
0


Prim 2
20
15
75.0%
0
0
15


Prim 3
20
10
50.0%
5
3
2


Prim 4
20
8
40.0%
2
2
4


Prim 5
20
20
100.0%
5
8
7


Totals
100
68

27
13
28


Many thanks,
Russell