Hello SPSS guru's,
I am a novice user trying to identify the association between two nominal variables and two nominal outcomes. the variables are labeled Peer Pal (yes/no) (1 or 0) and the outcome labeled outpt (for outpatient service received) yes/no (1 or 0). The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674. What test(s) can SPSS 12.0 perform to indicate the association between variables and outcomes? and what would the expected value(s) be if the association exists, is weak, or is strong? Thanks Kevin Secrist - Administrative Analyst, Associate Butte County Behavioral Health CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents or messages attached to it, may contain confidential information that is legally privileged. If you are not the intended recipient, or a person responsible for delivering this e-mail to the intended recipient, then you are (1) notified that any disclosure, copying, distribution, saving, reading or use of this information is strictly prohibited, (2) requested to discard and delete this e-mail and any attachments, and (3) requested to immediately notify us by e-mail that you mistakenly received this message [hidden email] fax (530) 895-6548, or telephone (530) 879-3305. Thank you. Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise. Ann. Math. Stat. 33 (1962) - John w. Tukey |
Kevin,
For a 2X2 table you can use the Phi statistic, available in the Crosstab procedure under Statistics. Phi varies from 0 to 1: closer to 1, stronger the association. Of course this assumes that the chi-square statistic is significant. You can also compute the crossproduct ratio, better known as the odds ratio: [(1,1)X(2,2)]/[(1,2)X(2,1)]; where the first number is the row and the second the column, and these represent the frequencies in each cell. An odd ratio (OR) of 1 means there is no relationship between the two variables. An OR>1 means greater likelihood of the outcome given the risk factor; an OR<1 means you are less likely to have the outcome. I say you have to compute this because SPSS does not: that is, not in the crosstab procedure. Cheers, Dominic Lusinchi Statistician Far West Research Statistical Consulting San Francisco, California 415-664-3032 www.farwestresearch.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Secrist, Kevin Sent: Tuesday, September 19, 2006 2:24 PM To: [hidden email] Subject: Association between two nominal variables? Hello SPSS guru's, I am a novice user trying to identify the association between two nominal variables and two nominal outcomes. the variables are labeled Peer Pal (yes/no) (1 or 0) and the outcome labeled outpt (for outpatient service received) yes/no (1 or 0). The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674. What test(s) can SPSS 12.0 perform to indicate the association between variables and outcomes? and what would the expected value(s) be if the association exists, is weak, or is strong? Thanks Kevin Secrist - Administrative Analyst, Associate Butte County Behavioral Health CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents or messages attached to it, may contain confidential information that is legally privileged. If you are not the intended recipient, or a person responsible for delivering this e-mail to the intended recipient, then you are (1) notified that any disclosure, copying, distribution, saving, reading or use of this information is strictly prohibited, (2) requested to discard and delete this e-mail and any attachments, and (3) requested to immediately notify us by e-mail that you mistakenly received this message [hidden email] fax (530) 895-6548, or telephone (530) 879-3305. Thank you. Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise. Ann. Math. Stat. 33 (1962) - John w. Tukey |
At 12:42 PM 9/19/2006, Dominic Lusinchi wrote:
>Kevin, > >For a 2X2 table you can use the Phi statistic, available in the Crosstab >procedure under Statistics. Phi varies from 0 to 1: closer to 1, stronger >the association. Of course this assumes that the chi-square statistic is >significant. . . Of interest in this choice is whether the double null category is meaningful-- that is, if Peer Pal is "no" (0) and the outcome labeled outpt (for outpatient service received) is also no (0). If the double null is meaningful and not merely a default, then the strength of Phi may depend largely on how many double nulls there are. Is your sample population defined in such a way that the double null applies to them differently than to the 3 billion other people in the world for whom these two variables have a value of 0? If the double nulls are a poorly defined category, then you might want to consider Jaccard's coefficient, which does not use the double nulls (http://en.wikipedia.org/wiki/Jaccard_index). It is extremely easy to calculate. Bob Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
In reply to this post by Dominic Lusinchi
Sorry, but you are wrong. You could count OR in SPSS using crosstab (crsstab
- statistics - RISK , version 12 and high)/ you will get OR with confidence intervals and relative risks too best Elena Verbitskaya > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of Dominic Lusinchi > Sent: Wednesday, September 20, 2006 2:43 AM > To: [hidden email] > Subject: Re: Association between two nominal variables? > > > Kevin, > > For a 2X2 table you can use the Phi statistic, available in > the Crosstab procedure under Statistics. Phi varies from 0 to > 1: closer to 1, stronger the association. Of course this > assumes that the chi-square statistic is significant. > > You can also compute the crossproduct ratio, better known as > the odds ratio: [(1,1)X(2,2)]/[(1,2)X(2,1)]; where the first > number is the row and the second the column, and these > represent the frequencies in each cell. > > An odd ratio (OR) of 1 means there is no relationship between > the two variables. An OR>1 means greater likelihood of the > outcome given the risk factor; an OR<1 means you are less > likely to have the outcome. > > I say you have to compute this because SPSS does not: that > is, not in the crosstab procedure. > > Cheers, > > Dominic Lusinchi > Statistician > Far West Research > Statistical Consulting > San Francisco, California > 415-664-3032 > www.farwestresearch.com > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of Secrist, Kevin > Sent: Tuesday, September 19, 2006 2:24 PM > To: [hidden email] > Subject: Association between two nominal variables? > > Hello SPSS guru's, > > I am a novice user trying to identify the association between > two nominal variables and two nominal outcomes. the variables > are labeled Peer Pal (yes/no) (1 or 0) and the outcome > labeled outpt (for outpatient service received) yes/no (1 or > 0). The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674. > > What test(s) can SPSS 12.0 perform to indicate the > association between variables and outcomes? and what would > the expected value(s) be if the association exists, is weak, > or is strong? > > > Thanks > > Kevin Secrist - Administrative Analyst, Associate > Butte County Behavioral Health > CONFIDENTIALITY NOTICE: This e-mail transmission, and any > documents or messages attached to it, may contain > confidential information that is legally privileged. If you > are not the intended recipient, or a person responsible for > delivering this e-mail to the intended recipient, then you > are (1) notified that any disclosure, copying, distribution, > saving, reading or use of this information is strictly > prohibited, (2) requested to discard and delete this e-mail > and any attachments, and (3) requested to immediately notify > us by e-mail that you mistakenly received this message > [hidden email] fax (530) 895-6548, or telephone > (530) 879-3305. Thank you. > > Far better an approximate answer to the right question, which > is often vague, than the exact answer to the wrong question, > which can always be made precise. Ann. Math. Stat. 33 (1962) > - John w. Tukey > |
You are absolutely correct, Elena. I stand corrected.
Thank you for pointing that out. Cheers, Dominic -----Original Message----- From: Elena Verbitskaya [mailto:[hidden email]] Sent: Wednesday, September 20, 2006 1:44 AM To: 'Dominic Lusinchi'; [hidden email] Subject: RE: Association between two nominal variables? Sorry, but you are wrong. You could count OR in SPSS using crosstab (crsstab - statistics - RISK , version 12 and high)/ you will get OR with confidence intervals and relative risks too best Elena Verbitskaya > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of Dominic Lusinchi > Sent: Wednesday, September 20, 2006 2:43 AM > To: [hidden email] > Subject: Re: Association between two nominal variables? > > > Kevin, > > For a 2X2 table you can use the Phi statistic, available in > the Crosstab procedure under Statistics. Phi varies from 0 to > 1: closer to 1, stronger the association. Of course this > assumes that the chi-square statistic is significant. > > You can also compute the crossproduct ratio, better known as > the odds ratio: [(1,1)X(2,2)]/[(1,2)X(2,1)]; where the first > number is the row and the second the column, and these > represent the frequencies in each cell. > > An odd ratio (OR) of 1 means there is no relationship between > the two variables. An OR>1 means greater likelihood of the > outcome given the risk factor; an OR<1 means you are less > likely to have the outcome. > > I say you have to compute this because SPSS does not: that > is, not in the crosstab procedure. > > Cheers, > > Dominic Lusinchi > Statistician > Far West Research > Statistical Consulting > San Francisco, California > 415-664-3032 > www.farwestresearch.com > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of Secrist, Kevin > Sent: Tuesday, September 19, 2006 2:24 PM > To: [hidden email] > Subject: Association between two nominal variables? > > Hello SPSS guru's, > > I am a novice user trying to identify the association between > two nominal variables and two nominal outcomes. the variables > are labeled Peer Pal (yes/no) (1 or 0) and the outcome > labeled outpt (for outpatient service received) yes/no (1 or > 0). The N for Peer Pal 0 = 165 and the N for Peer Pal 1=674. > > What test(s) can SPSS 12.0 perform to indicate the > association between variables and outcomes? and what would > the expected value(s) be if the association exists, is weak, > or is strong? > > > Thanks > > Kevin Secrist - Administrative Analyst, Associate > Butte County Behavioral Health > CONFIDENTIALITY NOTICE: This e-mail transmission, and any > documents or messages attached to it, may contain > confidential information that is legally privileged. If you > are not the intended recipient, or a person responsible for > delivering this e-mail to the intended recipient, then you > are (1) notified that any disclosure, copying, distribution, > saving, reading or use of this information is strictly > prohibited, (2) requested to discard and delete this e-mail > and any attachments, and (3) requested to immediately notify > us by e-mail that you mistakenly received this message > [hidden email] fax (530) 895-6548, or telephone > (530) 879-3305. Thank you. > > Far better an approximate answer to the right question, which > is often vague, than the exact answer to the wrong question, > which can always be made precise. Ann. Math. Stat. 33 (1962) > - John w. Tukey > |
In reply to this post by Secrist, Kevin
At 04:42 AM 9/20/2006, Secrist, Kevin wrote:
>Thank you Bob and everyone who responded to my question. > >I utilized phi and Cramer's V. and got an value of -.128 for phi and .128 >for Cramer's V. So, as I understand it, the association between my >variable and outcome is rather low. Is there an interpretive scale of >influence like correlation coefficient >.0 to .2 weak or no relationship >.2 to .4 weak relationship >.4 to .6 moderate relationship >.6 to .8 strong relationship >.8 to 1.0 very strong relationship >(Salkind, Neil "Statistics for People who think they hate statistics, 2000 >pg. 96) Kevin, The Phi coefficient is related to the Chi-Square statistic; Blalock* says Phi-square = Chi-square/N. So I guess you could calculate N*Phi(squared) and use the Chi-square tables. Of course, you could also just calculate the Chi-square directly. Why are you avoiding the use of a simple Chi-square? BTW, for a 2x2 table you should probably be using Fisher's exact test anyway. Why not? Bob *Blalock, Hubert M. (rev. ed., 1972) Social Statistics. Yes, there are more recent editions, but they weren't in print yet when I was in grad school <g> >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of >Bob Schacht >Sent: Tuesday, September 19, 2006 4:32 PM >To: [hidden email] >Subject: Re: Association between two nominal variables? > > >At 12:42 PM 9/19/2006, Dominic Lusinchi wrote: > >Kevin, > > > >For a 2X2 table you can use the Phi statistic, available in the Crosstab > >procedure under Statistics. Phi varies from 0 to 1: closer to 1, stronger > >the association. Of course this assumes that the chi-square statistic is > >significant. . . > >Of interest in this choice is whether the double null category is >meaningful-- that is, if Peer Pal is "no" (0) and the outcome labeled outpt >(for outpatient service received) is also no (0). If the double null is >meaningful and not merely a default, then the strength of Phi may depend >largely on how many double nulls there are. Is your sample population >defined in such a way that the double null applies to them differently than >to the 3 billion other people in the world for whom these two variables >have a value of 0? > >If the double nulls are a poorly defined category, then you might want to >consider Jaccard's coefficient, which does not use the double nulls >(http://en.wikipedia.org/wiki/Jaccard_index). It is extremely easy to >calculate. > >Bob > > > > >Robert M. Schacht, Ph.D. <[hidden email]> >Pacific Basin Rehabilitation Research & Training Center >1268 Young Street, Suite #204 >Research Center, University of Hawaii >Honolulu, HI 96814 |
Does anyone else experience pangs of nausea upon seeing this
"interpretation" of the correlation coefficient? General rules of thumb like this are highly dangerous: what constitutes a weak or strong correlation is entirely dependent on context. For example, if you found a 0.7 correlation between number of planes that take off and the number that subsequently land, would you really be happy to conclude that you'd found a strong relationship? Likewise, studies of manufacturing tolerance or mtbf very often deal with situations where a 0.98 is too weak and a product is scrapped or a factory floor is shut down. Please be highly skeptical of such reductionistic attempts of making statistics too easy! >.0 to .2 weak or no relationship > >.2 to .4 weak relationship > >.4 to .6 moderate relationship > >.6 to .8 strong relationship > >.8 to 1.0 very strong relationship > >(Salkind, Neil "Statistics for People who think they hate statistics, > 2000 > >pg. 96) > |
Two Cents.
My background is mainly as a MacroEconomist forecasting time series. I have particularistic rules of thumb, like in a single variable equation where I see R2>0.5 I say its good, thise means an R of .71+, but four multivariate equations you tend to be looking for much higher numbers. If you are casually working in the social sciences, especially outside of economics, the above or below as rules of thumb for the non expert are OK, I would say, but yes it is all particularistic. In dealing with my undergraduate students, interns et al, a classic is they want to take the ratio of the variance to the mean, and I point out to them that for a Standard Normal Variable that Statistic Can not be calculated; OTOH >manufacturing tolerance or mtbf very often deal with situations where a 0.98 is too weak and a product is< that measure is De Rigeur in other situations. Another one is contants of regression. For those of you who read Barrons or Investor's Business daily, Alpha and Beta in evaluating stocks is based on a simple regression like this: Stock Price=Alpha + Beta*S&P500 And so Alpha is interpreted as some autonomous aspect of a stocks return, and Beta its correlation with the overall market. Simple Keynesian Consumption Functions (in economics) you will still see explained in the same way: Consumption=A+B*Income Where A is interpreted as some autonomous component of consumption and B is called the MPC=Marginal Propensity to Consume. Stuff like this is still in many undergraduate text books, and talked about by economists et al all the time. The thing is these interpretations have been around for many years as the Analysis goes back to the 19320's and 30's when running an equation per semestre was a big deal. When I was a first year graduate students asking for interpretations of constants of regression was common, yet our professors would say that we usually do not try to interpret them, even if we always include them. (even though we still talks about Keynes and autonomous components of consumption, anyone here an economist of more recent vintage than myself? Have they finally stopped saying this stuff?). The thing is that as you add and subtract variables from your equations these constants move from positive to negative, from low to big magnitudes etc..Do a classic Keynesian Consumption Function but then add in Interest rates or exchange rates and see what happens. Too new a user of SPSS to say anything useful, so I thought I'd throw in my 2 cents. >From: Scott Czepiel <[hidden email]> >Reply-To: Scott Czepiel <[hidden email]> >To: [hidden email] >Subject: Re: Association between two nominal variables? >Date: Wed, 20 Sep 2006 15:27:07 -0400 > >Does anyone else experience pangs of nausea upon seeing this >"interpretation" of the correlation coefficient? General rules of thumb >like this are highly dangerous: what constitutes a weak or strong >correlation is entirely dependent on context. For example, if you found a >0.7 correlation between number of planes that take off and the number that >subsequently land, would you really be happy to conclude that you'd found a >strong relationship? Likewise, studies of manufacturing tolerance or mtbf >very often deal with situations where a 0.98 is too weak and a product is >scrapped or a factory floor is shut down. Please be highly skeptical of >such reductionistic attempts of making statistics too easy! > > >>.0 to .2 weak or no relationship >> >.2 to .4 weak relationship >> >.4 to .6 moderate relationship >> >.6 to .8 strong relationship >> >.8 to 1.0 very strong relationship >> >(Salkind, Neil "Statistics for People who think they hate statistics, >>2000 >> >pg. 96) >> |
In reply to this post by Bob Schacht-3
Dear Listers,
I am going to risk being called stupid, but I am nonetheless going to ask for some help. Assume X number of primary schools having primary school leaving examinations. Those who pass go on to three different types of secondary schools. Entry to one of the three secondary schools is based on the final mark-the highest prestige is accorded secondary school type 1, followed by secondary school type 2 and then secondary school type 3. Normal pass rates are calculated per primary schools as simply (those who passed/those who sat)*100. This is fine, but then two schools may achieve the same pass rate (let's say 75%), but have their learners placed in secondary schools of opposite quality. In a real sense, the ordinary pass rate does not take the quality of the pass rate into account. I would like to do so Here is an imaginary data matrix Learn sat Learn pass unweightpasrate school type 1 school type 2 school type 3 weighted passrate?? Prim 1 20 15 75.0% 15 0 0 Prim 2 20 15 75.0% 0 0 15 Prim 3 20 10 50.0% 5 3 2 Prim 4 20 8 40.0% 2 2 4 Prim 5 20 20 100.0% 5 8 7 Totals 100 68 27 13 28 Many thanks, Russell |
In reply to this post by Bob Schacht-3
Hi,
I have a dataset of 600 subjects and I'm using the twostep cluster analysis; however, I've found that this exploratory method works with large (or very large) data sets, so my question is. 600 is a large dataset? I think so. Besides, each categorical variable needs to have a multinomial distribution, is it possible to test this distribution with SPSS? thanks, /Christian |
In reply to this post by russell-19
While you could define a weighted rate easily enough, it is not clear:
a) on what basis you would establish the weights for levels 1, 2, & 3 b) does your policy or research question demand a weighted rate Depending on what your policy or reporting concerns are, I would argue that it might make more sense to define two or three rates such as: Type 1 pass rate Type 1 and 2 pass rate Type 1, 2 and 3 pass rate (this is the current unweighted pass rate) For some purposes, you might instead define: Type 1 rate Type 2 rate Type 3 rate Dennis Deck, PhD RMC Research Corporation [hidden email] -----Original Message----- From: russell [mailto:[hidden email]] Sent: Friday, September 22, 2006 5:59 AM Subject: School Pass rates Dear Listers, I am going to risk being called stupid, but I am nonetheless going to ask for some help. Assume X number of primary schools having primary school leaving examinations. Those who pass go on to three different types of secondary schools. Entry to one of the three secondary schools is based on the final mark-the highest prestige is accorded secondary school type 1, followed by secondary school type 2 and then secondary school type 3. Normal pass rates are calculated per primary schools as simply (those who passed/those who sat)*100. This is fine, but then two schools may achieve the same pass rate (let's say 75%), but have their learners placed in secondary schools of opposite quality. In a real sense, the ordinary pass rate does not take the quality of the pass rate into account. I would like to do so Here is an imaginary data matrix Learn sat Learn pass unweightpasrate school type 1 school type 2 school type 3 weighted passrate?? Prim 1 20 15 75.0% 15 0 0 Prim 2 20 15 75.0% 0 0 15 Prim 3 20 10 50.0% 5 3 2 Prim 4 20 8 40.0% 2 2 4 Prim 5 20 20 100.0% 5 8 7 Totals 100 68 27 13 28 Many thanks, Russell |
Free forum by Nabble | Edit this page |