Z test for Proportions

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Z test for Proportions

Ramzan Afzal-2
Dear Members,
 
I hope you won't mind turning back again to an old question.  Actually, I am having a data in the follwoing format:
 
Group A: x1 n1 which means p1=x1/n1
Group B: x2 n2 refers to p2=x2/n2
Group C :x3 n3 refers to p3=x3/n3
.
.
.
Group Z: x20 n20 refers to p20=x20/n20
 
I want to compare all possible combinations  proportions  across the groups to check for any significant difference in proportions: The code provided by you actually, compare props by column or row whihc is not my objective. I tried using Ctables but did not able to get the desired result
 
I would be grateful to you for support and help.
 
Regards
 
Ramzan Afzal
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Samir Paul
Hi, ctable is the better option keeping group variable in the columns. Try to generate your desired combinations into new variables to be kept in separate rows (before running the ctable, else u can run multiple ctable syntax). This should work.

On Sun, Jun 28, 2009 at 6:40 PM, Ramzan Afzal <[hidden email]> wrote:
Dear Members,
 
I hope you won't mind turning back again to an old question.  Actually, I am having a data in the follwoing format:
 
Group A: x1 n1 which means p1=x1/n1
Group B: x2 n2 refers to p2=x2/n2
Group C :x3 n3 refers to p3=x3/n3
.
.
.
Group Z: x20 n20 refers to p20=x20/n20
 
I want to compare all possible combinations  proportions  across the groups to check for any significant difference in proportions: The code provided by you actually, compare props by column or row whihc is not my objective. I tried using Ctables but did not able to get the desired result
 
I would be grateful to you for support and help.
 
Regards
 
Ramzan Afzal



--
Samir Paul
382 Main Street, TORONTO ON M4C4X8
CANADA
Phone: ( 001 )  416 686 9958
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Marta Garcia-Granero
In reply to this post by Ramzan Afzal-2
Hi Ramzan

I don't fully grasp what you are trying to achieve. Do you want to
compare first group against all the other, then the 2nd against all the
rest, and so on? What's the format of the dataset (post an example), do
you have aggregated or raw data?

Marta

Ramzan Afzal wrote:

> I hope you won't mind turning back again to an old question.
> Actually, I am having a data in the follwoing format:
>
> Group A: x1 n1 which means p1=x1/n1
> Group B: x2 n2 refers to p2=x2/n2
> Group C :x3 n3 refers to p3=x3/n3
> .
> .
> .
> Group Z: x20 n20 refers to p20=x20/n20
>
> I want to compare all possible combinations  proportions  across the
> groups to check for any significant difference in proportions: The
> code provided by you actually, compare props by column or row whihc is
> not my objective. I tried using Ctables but did not able to get the
> desired result
>


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Marta Garcia-Granero
Hi Ramzan

Please, remember to address your messages to the list,not to me
privately, since the  response might benefit other people too, and
others might also contribute with better solutions.

Ramzan Afzal wrote:

>
> Yes - I want to compare first against the others and then second and
> so on. I have the aggregated data xs and ns for all groups and I have
> 16 groups.
>
> The format of the data is as mentioned below:
> Group         x       n
> A     20      100
> B     14      200
> C     30      145
> D     54      90
>
>
Here's the solution for the sample dataset you provided.

HTH,
Marta GG

* SYNTAX *.

*Sample dataset *
DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA
A     20     100
B     14     200
C     30     145
D     54      90
END DATA.

MATRIX.
PRINT /TITLE=' "ONE GROUP VS ALL THE OTHERS" COMPARISONS (ASYMPTOTIC)'.
GET Group /VAR=Group.
GET data /VAR= x n.
COMPUTE Crosstab={data(:,1),data(:,2)-data(:,1)}.
PRINT CrossTab
 /RNAMES=group
 /CLABEL='Yes','No'
 /TITLE='Input data: Contingency table'.
RELEASE data.
COMPUTE n=MSUM(crosstab).
COMPUTE K=NROW(CrossTab).
COMPUTE Report=MAKE(K,2,0).
* All comparisons *.
LOOP I=1 TO K.
. COMPUTE a=CrossTab(I,1).
. COMPUTE b=CrossTab(I,2).
. COMPUTE c=CSUM(CrossTab(:,1)) - a.
. COMPUTE d=CSUM(CrossTab(:,2)) - b.
. COMPUTE Gnames={Group(I),'Others'}.
PRINT {a,b,a+b;c,d,c+d;a+c,b+d,n}
 /FORMAT='F8'
 /RNAMES= Gnames
 /TITLE='Comparison:'.
* Chi-square (uncorrected) for 2x2 contingency table *.
. COMPUTE ChiSq=n*((a*d-b*c)**2)/((a+b)*(c+d)*(a+c)*(b+d)).
. COMPUTE ChiSig=1-CHICDF(ChiSq,1).
. COMPUTE Report(I,1)=ChiSq.
. COMPUTE Report(I,2)=ChiSig.
END LOOP.
PRINT Report
 /FORMAT='F8.4'
 /RNAMES=Group
 /CLABELS='Chi-Sq','Sig.'
 /TITLE='Results for every group compared vs the rest'.
END MATRIX.


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Ramzan Afzal-2
Hi Marta,
 
Thanks again for the solution. It is fine but it is comparing Group A with others (C+B+D). My problems is that i want this betwen the pair of variables like comparison between A & B, then A & C, A & D, B & C, B & D and finally C & D.
 
Ctables does this (proprotion test) but it sums x and n as a total - and calculates perecentages based on the total for x and n. For my case  logically it becomes wrong as I am dividing x/n for each group and then comparing the difference between proprotions for all possible combinations of groups.
 
Any help will be much appreciated whether by amedning the code or by Ctables
 


 
2009/6/29 Marta García-Granero <[hidden email]>
Hi Ramzan

Please, remember to address your messages to the list,not to me
privately, since the  response might benefit other people too, and
others might also contribute with better solutions.


Ramzan Afzal wrote:

Yes - I want to compare first against the others and then second and
so on. I have the aggregated data xs and ns for all groups and I have
16 groups.

The format of the data is as mentioned below:
Group         x       n
A     20      100
B     14      200
C     30      145
D     54      90


Here's the solution for the sample dataset you provided.

HTH,
Marta GG

* SYNTAX *.

*Sample dataset *
DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA

A     20     100
B     14     200
C     30     145
D     54      90
END DATA.

MATRIX.
PRINT /TITLE=' "ONE GROUP VS ALL THE OTHERS" COMPARISONS (ASYMPTOTIC)'.
GET Group /VAR=Group.
GET data /VAR= x n.
COMPUTE Crosstab={data(:,1),data(:,2)-data(:,1)}.
PRINT CrossTab
/RNAMES=group
/CLABEL='Yes','No'
/TITLE='Input data: Contingency table'.
RELEASE data.
COMPUTE n=MSUM(crosstab).
COMPUTE K=NROW(CrossTab).
COMPUTE Report=MAKE(K,2,0).
* All comparisons *.
LOOP I=1 TO K.
. COMPUTE a=CrossTab(I,1).
. COMPUTE b=CrossTab(I,2).
. COMPUTE c=CSUM(CrossTab(:,1)) - a.
. COMPUTE d=CSUM(CrossTab(:,2)) - b.
. COMPUTE Gnames={Group(I),'Others'}.
PRINT {a,b,a+b;c,d,c+d;a+c,b+d,n}
/FORMAT='F8'
/RNAMES= Gnames
/TITLE='Comparison:'.
* Chi-square (uncorrected) for 2x2 contingency table *.
. COMPUTE ChiSq=n*((a*d-b*c)**2)/((a+b)*(c+d)*(a+c)*(b+d)).
. COMPUTE ChiSig=1-CHICDF(ChiSq,1).
. COMPUTE Report(I,1)=ChiSq.
. COMPUTE Report(I,2)=ChiSig.
END LOOP.
PRINT Report
/FORMAT='F8.4'
/RNAMES=Group
/CLABELS='Chi-Sq','Sig.'
/TITLE='Results for every group compared vs the rest'.
END MATRIX.



--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Marta Garcia-Granero
Hi Ramzan

You wrote:

> Thanks again for the solution. It is fine but it is comparing Group A
> with others (C+B+D). My problems is that i want this betwen the pair
> of variables like comparison between A & B, then A & C, A & D, B & C,
> B & D and finally C & D.
>
> Ctables does this (proprotion test) but it sums x and n as a total -
> and calculates perecentages based on the total for x and n. For my
> case  logically it becomes wrong as I am dividing x/n for each group
> and then comparing the difference between proprotions for all possible
> combinations of groups.
>
> Any help will be much appreciated whether by amedning the code or by
> Ctables
Now I see. I didn't understand you the first time because you said
CTABLES doesn't do what you want, but it does indeed. Check the output,
because I think you must be misinterpreting it, or perhaps specifying
the comparisons in a wrong way. CTABLES performs pairwise comparisons
(that's the name, BTW, for what you wanted) for group percentages, with
Bonferroni correction to control for type I error. You can also check
the Marascuilo procedure code I sent to the list several weeks go,
because it also performs all pairwise comparisons, but with a
Scheffee-like correction instead of Bonferroni.

This is the syntax to run CTABLES for your data:

* Sample dataset *.
DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA
A     20     100
B     14     200
C     30     145
D     54     90
END DATA.

* Data must be restructured first *.
COMPUTE y = n - x .
EXECUTE .
DELETE VARIABLES n.
VARSTOCASES
 /MAKE frequency FROM x y
 /INDEX = Outcome(2)
 /KEEP =  Group
 /NULL = KEEP.
VALUE LABEL outcome 1'Yes' 2'No'.
WEIGHT BY frequency.

* Pairwise comparisons *.
CTABLES
  /VLABELS VARIABLES=Group Outcome DISPLAY=DEFAULT
  /TABLE Outcome [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Group
  /SLABELS POSITION=ROW
  /CATEGORIES VARIABLES=Group Outcome ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
  INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

The results are OK (to my knowledge), unless I am missing anything you
have not explained yet.

Marta

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Ramzan Afzal-2


2009/6/29 Marta García-Granero <[hidden email]>
Hi Ramzan


You wrote:
Thanks again for the solution. It is fine but it is comparing Group A
with others (C+B+D). My problems is that i want this betwen the pair
of variables like comparison between A & B, then A & C, A & D, B & C,
B & D and finally C & D.

Ctables does this (proprotion test) but it sums x and n as a total -
and calculates perecentages based on the total for x and n. For my
case  logically it becomes wrong as I am dividing x/n for each group
and then comparing the difference between proprotions for all possible
combinations of groups.

Any help will be much appreciated whether by amedning the code or by
Ctables
Now I see. I didn't understand you the first time because you said
CTABLES doesn't do what you want, but it does indeed. Check the output,
because I think you must be misinterpreting it, or perhaps specifying
the comparisons in a wrong way. CTABLES performs pairwise comparisons
(that's the name, BTW, for what you wanted) for group percentages, with
Bonferroni correction to control for type I error. You can also check
the Marascuilo procedure code I sent to the list several weeks go,
because it also performs all pairwise comparisons, but with a
Scheffee-like correction instead of Bonferroni.

This is the syntax to run CTABLES for your data:

* Sample dataset *.

DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA
A     20     100
B     14     200
C     30     145
D     54     90
END DATA.

* Data must be restructured first *.
COMPUTE y = n - x .
EXECUTE .
DELETE VARIABLES n.
VARSTOCASES
/MAKE frequency FROM x y
/INDEX = Outcome(2)
/KEEP =  Group
/NULL = KEEP.
VALUE LABEL outcome 1'Yes' 2'No'.
WEIGHT BY frequency.

* Pairwise comparisons *.
CTABLES
 /VLABELS VARIABLES=Group Outcome DISPLAY=DEFAULT
 /TABLE Outcome [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Group
 /SLABELS POSITION=ROW
 /CATEGORIES VARIABLES=Group Outcome ORDER=A KEY=VALUE EMPTY=INCLUDE
 /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

The results are OK (to my knowledge), unless I am missing anything you
have not explained yet.

Marta


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Ramzan Afzal-2
Hi Marta,
 
This code is the best and perfectly fine results are also ok. The only thing which i am not getting is the comparison of required proportions. Now, I explain to you
 
Group A: x1=20 and n1=80 I want p1=20/80 not like 20/100 and 80/100
Group B:x1=40 and n1=50, I want p2=40/50 not like 40/90 and 50/90
 
Hence I want to compare the difference between p1 and p2 i..e between 20/80 and 40/50 only. Now I am getting results for 20/100 and 80/100 which I do not need. I need to compare 20/80 and 40/50.
 
Rest your codes are excellent and many thanks for your patience, time and technical support.
 
Regards
 
Ramzan

 
On Mon, Jun 29, 2009 at 2:00 PM, Ramzan Afzal <[hidden email]> wrote:


2009/6/29 Marta García-Granero <[hidden email]>
Hi Ramzan


You wrote:
Thanks again for the solution. It is fine but it is comparing Group A
with others (C+B+D). My problems is that i want this betwen the pair
of variables like comparison between A & B, then A & C, A & D, B & C,
B & D and finally C & D.

Ctables does this (proprotion test) but it sums x and n as a total -
and calculates perecentages based on the total for x and n. For my
case  logically it becomes wrong as I am dividing x/n for each group
and then comparing the difference between proprotions for all possible
combinations of groups.

Any help will be much appreciated whether by amedning the code or by
Ctables
Now I see. I didn't understand you the first time because you said
CTABLES doesn't do what you want, but it does indeed. Check the output,
because I think you must be misinterpreting it, or perhaps specifying
the comparisons in a wrong way. CTABLES performs pairwise comparisons
(that's the name, BTW, for what you wanted) for group percentages, with
Bonferroni correction to control for type I error. You can also check
the Marascuilo procedure code I sent to the list several weeks go,
because it also performs all pairwise comparisons, but with a
Scheffee-like correction instead of Bonferroni.

This is the syntax to run CTABLES for your data:

* Sample dataset *.

DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA
A     20     100
B     14     200
C     30     145
D     54     90
END DATA.

* Data must be restructured first *.
COMPUTE y = n - x .
EXECUTE .
DELETE VARIABLES n.
VARSTOCASES
/MAKE frequency FROM x y
/INDEX = Outcome(2)
/KEEP =  Group
/NULL = KEEP.
VALUE LABEL outcome 1'Yes' 2'No'.
WEIGHT BY frequency.

* Pairwise comparisons *.
CTABLES
 /VLABELS VARIABLES=Group Outcome DISPLAY=DEFAULT
 /TABLE Outcome [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Group
 /SLABELS POSITION=ROW
 /CATEGORIES VARIABLES=Group Outcome ORDER=A KEY=VALUE EMPTY=INCLUDE
 /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

The results are OK (to my knowledge), unless I am missing anything you
have not explained yet.

Marta


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Marta Garcia-Granero
Hi

Ramzan Afzal wrote:

>  This code is the best and perfectly fine results are also ok. The
> only thing which i am not getting is the comparison of required
> proportions. Now, I explain to you
>
> Group A: x1=20 and n1=80 I want p1=20/80 not like 20/100 and 80/100
> Group B:x1=40 and n1=50, I want p2=40/50 not like 40/90 and 50/90
>
> Hence I want to compare the difference between p1 and p2 i..e between
> 20/80 and 40/50 only. Now I am getting results for 20/100 and 80/100
> which I do not need. I need to compare 20/80 and 40/50.

Ramzan, those are NOT proportions,  but odds. If x=20 and n=100
(according to the sample dataset you sent before) then the proportion is
20/100=20%, and that's what CTABLES is comparing. The ratio
20/(100-20)=20/80=0.25,  is an odd, not a proportion. Therefore, you
DON'T want a test to compare proportions, but odds. You should clarified
that when you asked the first time

This is the sample dataset you provided:

Group   x       n
A       20      100
B       14      200
C       30      145
D       54      90


Proportions are 20%, 7% .... Odds are 0.25, 0.075 ...

Logistic regression will provide only a partial response to your
question, but perhaps the matrix code could be adapted. Please, clarify
what you want, I'm not going to start writing code that doesn't answer
the correct question (I'm rather busy at the moment).

Marta

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Ramzan Afzal-2


2009/6/29 Marta García-Granero <[hidden email]>
Hi
 
Thanks again for your valuable time and effort. I formulated the problem just to give you an idea what I wanted to do basically p1=x1/n1 which is a proportion i.e. number of favorable case divided by the total number of cases. Anyway, if it is odds then I want to compare odds. I would be grateful to you for a modified code.
 
Thanks again for your highly skilled technical support.
 
Regards
 
Ramzan


Ramzan Afzal wrote:

 This code is the best and perfectly fine results are also ok. The
only thing which i am not getting is the comparison of required
proportions. Now, I explain to you

Group A: x1=20 and n1=80 I want p1=20/80 not like 20/100 and 80/100
Group B:x1=40 and n1=50, I want p2=40/50 not like 40/90 and 50/90

Hence I want to compare the difference between p1 and p2 i..e between
20/80 and 40/50 only. Now I am getting results for 20/100 and 80/100
which I do not need. I need to compare 20/80 and 40/50.

Ramzan, those are NOT proportions,  but odds. If x=20 and n=100
(according to the sample dataset you sent before) then the proportion is
20/100=20%, and that's what CTABLES is comparing. The ratio
20/(100-20)=20/80=0.25,  is an odd, not a proportion. Therefore, you
DON'T want a test to compare proportions, but odds. You should clarified
that when you asked the first time

This is the sample dataset you provided:

Group   x       n

A       20      100 .20
B       14      200
C       30      145
D       54      90


Proportions are 20%, 7% .... Odds are 0.25, 0.075 ...

Logistic regression will provide only a partial response to your
question, but perhaps the matrix code could be adapted. Please, clarify
what you want, I'm not going to start writing code that doesn't answer
the correct question (I'm rather busy at the moment).

Marta


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Marta Garcia-Granero
Ramzan

I think we should focus a bit on WHY doyou want to compare odds instead
of proportions. I'm afraid that too many translations to and from
English (I believe it is the native tongue of neither of us) may be
causing some misunderstanding between us. Right now, I have to leave,
I'll be back in some hours and then we'll discuss a bit your data layout
and what you are trying to achieve before I just send you a code that
might be the correct answer to a very wrong question.

Be back in a few hours
Marta GG

Ramzan Afzal WROTE:

>
>
> 2009/6/29 Marta García-Granero <[hidden email]
> <mailto:[hidden email]>>
>
>     Hi
>
>
> Thanks again for your valuable time and effort. I formulated the
> problem just to give you an idea what I wanted to do basically
> p1=x1/n1 which is a proportion i.e. number of favorable case divided
> by the total number of cases. Anyway, if it is odds then I want to
> compare odds. I would be grateful to you for a modified code.
>
> Thanks again for your highly skilled technical support.
>
> Regards
>
> Ramzan
>
>
>
>     Ramzan Afzal wrote:
>
>          This code is the best and perfectly fine results are also ok. The
>         only thing which i am not getting is the comparison of required
>         proportions. Now, I explain to you
>
>         Group A: x1=20 and n1=80 I want p1=20/80 not like 20/100 and
>         80/100
>         Group B:x1=40 and n1=50, I want p2=40/50 not like 40/90 and 50/90
>
>         Hence I want to compare the difference between p1 and p2 i..e
>         between
>         20/80 and 40/50 only. Now I am getting results for 20/100 and
>         80/100
>         which I do not need. I need to compare 20/80 and 40/50.
>
>
>     Ramzan, those are NOT proportions,  but odds. If x=20 and n=100
>     (according to the sample dataset you sent before) then the
>     proportion is
>     20/100=20%, and that's what CTABLES is comparing. The ratio
>     20/(100-20)=20/80=0.25,  is an odd, not a proportion. Therefore, you
>     DON'T want a test to compare proportions, but odds. You should
>     clarified
>     that when you asked the first time
>
>     This is the sample dataset you provided:
>
>     Group   x       n
>
>     A       20      100 .20
>     B       14      200
>     C       30      145
>     D       54      90
>
>
>     Proportions are 20%, 7% .... Odds are 0.25, 0.075 ...
>
>     Logistic regression will provide only a partial response to your
>     question, but perhaps the matrix code could be adapted. Please,
>     clarify
>     what you want, I'm not going to start writing code that doesn't answer
>     the correct question (I'm rather busy at the moment).
>
>     Marta
>
>
>     =====================
>     To manage your subscription to SPSSX-L, send a message to
>     [hidden email] <mailto:[hidden email]> (not
>     to SPSSX-L), with no body text except the
>     command. To leave the list, send the command
>     SIGNOFF SPSSX-L
>     For a list of commands to manage subscriptions, send the command
>     INFO REFCARD
>
>


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Marta Garcia-Granero
I'm back

Ramzan Afzal wrote:

> As I said that I wanted to compare proportions between different
> groups ok. So, lets say I have 4 groups i.e. A,B,C and D. Each group
> has (x,n and p)
>
> x is number of favorable cases
> n is total number of cases in a particular group
> p is the proportion
>
> So we have four proportions from four groups p1,p2,p3 and p4. We want
> to check a difference between the proportions for each group-  the all
> possible combinations of groups will be AB, AC, AD, BC, BD and CD. It
> is the *_simple application of z_* test for the difference between two
> proportions to all possible combinations of groups. My only problem is
> that I want to compute in a single run all possible differences
> between the groups.

MGG answers: That's EXACTLY what CTABLES is doing with your data. Where
on earth is the problem? You had the correct answer from the beginning.
>
> Now, again illustrating by the example already provided:
MGG answers: This NOT the example you provided the 1st time (just to
clarify everything). See below and you will see the example you actually
gave to me, the one I used to write the original code. Besides, I find
strange that if you add x+n you get a very round number (200 in all
groups but the last), this is a bit confusing, and leads to think that n
is not the total number of cases, but only the cases that are not
"favorable cases". I will consider that just a coincidence and work on
your example assuming that n is really the total number of cases, since
the proportions listed agree with that. Are those real date or were they
just prepared as an example?
>
> *Group*       *x*     *n*     *p*
> A     20      180     11.11%
> B     15      185     8.11%
> C     45      145     31.03%
> D     24      76      31.58%
>
>
> Hence we are comparing (11.11 and 8.11) then (11.11,8.11) and so on......
Now, with the clear example and definition of your goal, I can affirm
that CTABLES does exactly what you wanted: it computes in one go all the
Z tests and presents in a table (a bit confusing to interpret, I admit)
the significant results (after adjusting the p-values with Bonferroni
method). Period. End of the discussion. USE CTABLES, THE OUTPUT IS OK.


> You said that these are odds so I said ok though I still believe that
> these are proportions for groups where  p=x/n for each group.  This is
> all I can explain

MGG answers: I said they are odds because you offered an example where
x=1 and n=100, and then, afterwards, you said you did not want to
compute 20/100, but 20/80 (perhaps you modified your example without
telling me, and I got confused?). Therefore, I interpreted you want to
work with a particular type of ratio, where the numerator is NOT
included in the denominator: an odd. Going to your FIRST example (not
the one you have offered above)

>
>
>            This is the sample dataset you provided:
>
>            Group   x       n
>
>            A       20      100
>            B       14      200
>            C       30      145
>            D       54      90
>
I will add more columns to it to explain my point.

X: Favorable cases
Y: Not favorable cases
N: Total cases (X+Y)
P: Proportion (X/[X+Y]=X/N)
O: Odd (X/Y)

Group  X   Y   N   P    O
A      20  80 100 0.20 0.25
B      14 180 200 0.07 0.077
C      30 115 145 0.21 0.26
D      54  36  90 0.60 1.47

As you can see, there is quite a difference between a proportion and an
odd. With your second dataset and your more detailed explanation, it is
now clear you want to compute and compare proportions ----> USE CTABLES

Using your second dataset as a correct sample of your data, this is the
CTABLES and Marascuilo procedure syntax. You can check and see that both
methods compare the proportions you wanted to:

11.11%  8.11%   31.03%  31.58%


With the only difference that CTABLES uses Bonferroni adjustment for
p-values, and Marascuilo procedure uses a more conservative one
(Scheffee-like).

* Your SECOND dataset (copied exactly from your mail) *
DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA
A    20    180
B    15    185
C    45    145
D    24    76
END DATA.

* Data STILL need restructuring *.
COMPUTE y = n - x .
EXECUTE .
DELETE VARIABLES n.
VARSTOCASES
 /MAKE frequency FROM x y
 /INDEX = Outcome(2)
 /KEEP =  Group
 /NULL = KEEP.
VALUE LABEL outcome 1'Favorable cases' 2'Non favorable'.
WEIGHT BY frequency.

************ CTABLES ************ .

* Replace "Outcome" & "Group" by your variable names if different *.
CTABLES
  /VLABELS VARIABLES=Group Outcome DISPLAY=DEFAULT
  /TABLE Outcome [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Group
  /SLABELS POSITION=ROW
  /CATEGORIES VARIABLES=Group Outcome ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
  INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

************  MARASCUILO PROCEDURE ************ .
* Don't change anything here *.
DATASET NAME Data.
DATASET DECLARE Results1 WINDOW=HIDDEN.
DATASET DECLARE Results2 WINDOW=HIDDEN.
DATASET DECLARE Contingency.
OMS /SELECT TABLES
 /IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Crosstabulation"]
 /DESTINATION FORMAT = SAV
  OUTFILE = Contingency.
OMS /SELECT TABLES
 /IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Case Processing Summary"]
 /DESTINATION VIEWER = NO.

* Replace "Outcome" & "Group" by your variable names if different *.
CROSSTABS
  /TABLES=Outcome  BY group
  /FORMAT= AVALUE TABLES
  /STATISTIC=CHISQ
  /CELLS= COUNT COLUMN
  /COUNT ROUND CELL .

* Don't change anything from here (fully automatic) *.
OMSEND.
DATASET ACTIVATE Contingency.
COMPUTE Id=$casenum/2.
EXE.
SELECT IF (Id NE TRUNC(Id)).
EXE.
DELETE VARIABLES Command_ TO Var3 total Id.
PRESERVE.
SET MXLOOPS=200 .
MATRIX.
PRINT /TITLE='MARASCUILO PROCEDURE FOR MULTIPLE PROPORTIONS'.
GET Data /VAR=ALL /NAMES=vnames.
PRINT Data
 /CNAMES=vnames
 /RLABELS='Row 1','Row 2','Total'
 /TITLE='Input data'.
COMPUTE K=NCOL(Data).
COMPUTE P=Data(1,:)&/Data(3,:).
COMPUTE
PLabels={'P(1)','P(2)','P(3)','P(4)','P(5)','P(6)','P(7)','P(8)','P(9)','P(10)','P(11)',

'P(12)','P(13)','P(14)','P(15)','P(16)','P(17)','P(18)','P(19)','P(20)'}.
PRINT {100*P}
 /FORMAT='F8.3'
 /CNAMES=PLabels
 /TITLE='Proportions (%) to be compared'.
COMPUTE N=Data(3,:).
* Critical Chi-square values for up to k=20 *.
COMPUTE Chi2={ 3.8415, 5.9915, 7.8147,
9.4877,11.0705,12.5916,14.0671,15.5073,16.9190,18.3070,

19.6751,21.0261,22.3620,23.6848,24.9958,26.2962,27.5871,28.8693,30.1435,31.4104}.
COMPUTE Chi2Val=Chi2(K-1).
COMPUTE NComp=K*(K-1)/2.
COMPUTE Rij=    MAKE(NComp,1,0).
COMPUTE ABSDiff=MAKE(NComp,1,0).
COMPUTE Labels= MAKE(Ncomp,3," ").
COMPUTE Sig=    MAKE(NComp,1,'(ns)').
COMPUTE Index=1.
LOOP i=1 TO K-1.
. LOOP j=i+1 TO K.
.  COMPUTE ABSDiff(Index)=ABS(P(i)-P(j)).
.  COMPUTE
Rij(Index)=SQRT(Chi2Val)*SQRT(P(i)*(1-P(i))/N(i)+P(j)*(1-P(j))/N(j)).
.  COMPUTE Labels(Index,:)={PLabels(i),"-",PLabels(j)}.
.  DO IF (ABSDiff(Index) GT Rij(Index)).
.   COMPUTE Sig(Index)='(*)'.
.  END IF.
.  COMPUTE Index=Index+1.
. END LOOP.
END LOOP.
PRINT {ABSDiff,Rij}
 /FORMAT='F8.3'
 /CLABELS='|Pi-Pj|','Cr. Range'
 /TITLE='Absolute differences and their ranges'.
PRINT {Labels,Sig}
 /FORMAT='A4'
 /TITLE='Contrasts and their significance'.
SAVE {ABSDiff,Rij}/OUTFILE=Results1 /VARIABLES=Diffs,Rij.
SAVE {Labels,Sig} /OUTFILE=Results2 /VARIABLES=Labels1 TO Labels3,Sig
/STRINGS=Labels1 TO Labels3,Sig.
END MATRIX.
RESTORE.

* Cuter (pivot tables) report *.
DATASET ACTIVATE Results2.
DATASET CLOSE Contingency.
STRING Comp(A9).
COMPUTE Comp=CONCAT(RTRIM(Labels1),RTRIM(Labels2),RTRIM(Labels3)).
MATCH FILES /FILE=*
 /FILE='Results1'.
DATASET CLOSE Results1.
VAR LABEL Comp '|Pi)-P(j)|' Diffs'Absolute Difference' Rij'Critical
Range' Sig 'Significance'.
FORMAT Diffs Rij (F8.3).
OMS /SELECT TABLES
 /IF COMMANDS = ["Summarize"]
     SUBTYPES = ["Case Processing Summary"]
 /DESTINATION VIEWER = NO.
SUMMARIZE
  /TABLES=Comp Diffs Rij Sig
  /FORMAT=LIST NOCASENUM NOTOTAL
  /TITLE='Multiple comparisons: Marascuilo procedure'
  /MISSING=VARIABLE
  /CELLS=NONE.
OMSEND.
DATASET ACTIVATE Data.
DATASET CLOSE Results2.



MGG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z test for Proportions

Ramzan Afzal-2
Hi Marta,
 
Many thanks for the solution and it has worked perfectly fine. I have obtained the required results in a single run of syntax you created. This syntax is really nice and powerful. Thanks for your support, concern and patience as well.
 
Best wishes & regards
Ramzan

2009/6/29 Marta García-Granero <[hidden email]>
I'm back


Ramzan Afzal wrote:
As I said that I wanted to compare proportions between different
groups ok. So, lets say I have 4 groups i.e. A,B,C and D. Each group
has (x,n and p)

x is number of favorable cases
n is total number of cases in a particular group
p is the proportion

So we have four proportions from four groups p1,p2,p3 and p4. We want
to check a difference between the proportions for each group-  the all
possible combinations of groups will be AB, AC, AD, BC, BD and CD. It
is the *_simple application of z_* test for the difference between two
proportions to all possible combinations of groups. My only problem is
that I want to compute in a single run all possible differences
between the groups.

MGG answers: That's EXACTLY what CTABLES is doing with your data. Where
on earth is the problem? You had the correct answer from the beginning.


Now, again illustrating by the example already provided:
MGG answers: This NOT the example you provided the 1st time (just to
clarify everything). See below and you will see the example you actually
gave to me, the one I used to write the original code. Besides, I find
strange that if you add x+n you get a very round number (200 in all
groups but the last), this is a bit confusing, and leads to think that n
is not the total number of cases, but only the cases that are not
"favorable cases". I will consider that just a coincidence and work on
your example assuming that n is really the total number of cases, since
the proportions listed agree with that. Are those real date or were they
just prepared as an example?

*Group*       *x*     *n*     *p*

A     20      180     11.11%
B     15      185     8.11%
C     45      145     31.03%
D     24      76      31.58%


Hence we are comparing (11.11 and 8.11) then (11.11,8.11) and so on......
Now, with the clear example and definition of your goal, I can affirm
that CTABLES does exactly what you wanted: it computes in one go all the
Z tests and presents in a table (a bit confusing to interpret, I admit)
the significant results (after adjusting the p-values with Bonferroni
method). Period. End of the discussion. USE CTABLES, THE OUTPUT IS OK.



You said that these are odds so I said ok though I still believe that
these are proportions for groups where  p=x/n for each group.  This is
all I can explain

MGG answers: I said they are odds because you offered an example where
x=1 and n=100, and then, afterwards, you said you did not want to
compute 20/100, but 20/80 (perhaps you modified your example without
telling me, and I got confused?). Therefore, I interpreted you want to
work with a particular type of ratio, where the numerator is NOT
included in the denominator: an odd. Going to your FIRST example (not
the one you have offered above)


          This is the sample dataset you provided:

          Group   x       n

          A       20      100
          B       14      200
          C       30      145
          D       54      90

I will add more columns to it to explain my point.

X: Favorable cases
Y: Not favorable cases
N: Total cases (X+Y)
P: Proportion (X/[X+Y]=X/N)
O: Odd (X/Y)

Group  X   Y   N   P    O
A      20  80 100 0.20 0.25
B      14 180 200 0.07 0.077
C      30 115 145 0.21 0.26
D      54  36  90 0.60 1.47

As you can see, there is quite a difference between a proportion and an
odd. With your second dataset and your more detailed explanation, it is
now clear you want to compute and compare proportions ----> USE CTABLES

Using your second dataset as a correct sample of your data, this is the
CTABLES and Marascuilo procedure syntax. You can check and see that both
methods compare the proportions you wanted to:

11.11%  8.11%   31.03%  31.58%


With the only difference that CTABLES uses Bonferroni adjustment for
p-values, and Marascuilo procedure uses a more conservative one
(Scheffee-like).

* Your SECOND dataset (copied exactly from your mail) *

DATA LIST LIST/Group(A1) x n (2 F8).
BEGIN DATA
A    20    180
B    15    185
C    45    145
D    24    76
END DATA.

* Data STILL need restructuring *.

COMPUTE y = n - x .
EXECUTE .
DELETE VARIABLES n.
VARSTOCASES
/MAKE frequency FROM x y
/INDEX = Outcome(2)
/KEEP =  Group
/NULL = KEEP.
VALUE LABEL outcome 1'Favorable cases' 2'Non favorable'.
WEIGHT BY frequency.

************ CTABLES ************ .

* Replace "Outcome" & "Group" by your variable names if different *.

CTABLES
 /VLABELS VARIABLES=Group Outcome DISPLAY=DEFAULT
 /TABLE Outcome [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Group
 /SLABELS POSITION=ROW
 /CATEGORIES VARIABLES=Group Outcome ORDER=A KEY=VALUE EMPTY=INCLUDE
 /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

************  MARASCUILO PROCEDURE ************ .
* Don't change anything here *.
DATASET NAME Data.
DATASET DECLARE Results1 WINDOW=HIDDEN.
DATASET DECLARE Results2 WINDOW=HIDDEN.
DATASET DECLARE Contingency.
OMS /SELECT TABLES
/IF COMMANDS = ["Crosstabs"]
   SUBTYPES = ["Crosstabulation"]
/DESTINATION FORMAT = SAV
 OUTFILE = Contingency.
OMS /SELECT TABLES
/IF COMMANDS = ["Crosstabs"]
   SUBTYPES = ["Case Processing Summary"]
/DESTINATION VIEWER = NO.

* Replace "Outcome" & "Group" by your variable names if different *.
CROSSTABS
 /TABLES=Outcome  BY group
 /FORMAT= AVALUE TABLES
 /STATISTIC=CHISQ
 /CELLS= COUNT COLUMN
 /COUNT ROUND CELL .

* Don't change anything from here (fully automatic) *.
OMSEND.
DATASET ACTIVATE Contingency.
COMPUTE Id=$casenum/2.
EXE.
SELECT IF (Id NE TRUNC(Id)).
EXE.
DELETE VARIABLES Command_ TO Var3 total Id.
PRESERVE.
SET MXLOOPS=200 .
MATRIX.
PRINT /TITLE='MARASCUILO PROCEDURE FOR MULTIPLE PROPORTIONS'.
GET Data /VAR=ALL /NAMES=vnames.
PRINT Data
/CNAMES=vnames
/RLABELS='Row 1','Row 2','Total'
/TITLE='Input data'.
COMPUTE K=NCOL(Data).
COMPUTE P=Data(1,:)&/Data(3,:).
COMPUTE
PLabels={'P(1)','P(2)','P(3)','P(4)','P(5)','P(6)','P(7)','P(8)','P(9)','P(10)','P(11)',

'P(12)','P(13)','P(14)','P(15)','P(16)','P(17)','P(18)','P(19)','P(20)'}.
PRINT {100*P}
/FORMAT='F8.3'
/CNAMES=PLabels
/TITLE='Proportions (%) to be compared'.
COMPUTE N=Data(3,:).
* Critical Chi-square values for up to k=20 *.
COMPUTE Chi2={ 3.8415, 5.9915, 7.8147,
9.4877,11.0705,12.5916,14.0671,15.5073,16.9190,18.3070,

19.6751,21.0261,22.3620,23.6848,24.9958,26.2962,27.5871,28.8693,30.1435,31.4104}.
COMPUTE Chi2Val=Chi2(K-1).
COMPUTE NComp=K*(K-1)/2.
COMPUTE Rij=    MAKE(NComp,1,0).
COMPUTE ABSDiff=MAKE(NComp,1,0).
COMPUTE Labels= MAKE(Ncomp,3," ").
COMPUTE Sig=    MAKE(NComp,1,'(ns)').
COMPUTE Index=1.
LOOP i=1 TO K-1.
. LOOP j=i+1 TO K.
.  COMPUTE ABSDiff(Index)=ABS(P(i)-P(j)).
.  COMPUTE
Rij(Index)=SQRT(Chi2Val)*SQRT(P(i)*(1-P(i))/N(i)+P(j)*(1-P(j))/N(j)).
.  COMPUTE Labels(Index,:)={PLabels(i),"-",PLabels(j)}.
.  DO IF (ABSDiff(Index) GT Rij(Index)).
.   COMPUTE Sig(Index)='(*)'.
.  END IF.
.  COMPUTE Index=Index+1.
. END LOOP.
END LOOP.
PRINT {ABSDiff,Rij}
/FORMAT='F8.3'
/CLABELS='|Pi-Pj|','Cr. Range'
/TITLE='Absolute differences and their ranges'.
PRINT {Labels,Sig}
/FORMAT='A4'
/TITLE='Contrasts and their significance'.
SAVE {ABSDiff,Rij}/OUTFILE=Results1 /VARIABLES=Diffs,Rij.
SAVE {Labels,Sig} /OUTFILE=Results2 /VARIABLES=Labels1 TO Labels3,Sig
/STRINGS=Labels1 TO Labels3,Sig.
END MATRIX.
RESTORE.

* Cuter (pivot tables) report *.
DATASET ACTIVATE Results2.
DATASET CLOSE Contingency.
STRING Comp(A9).
COMPUTE Comp=CONCAT(RTRIM(Labels1),RTRIM(Labels2),RTRIM(Labels3)).
MATCH FILES /FILE=*
/FILE='Results1'.
DATASET CLOSE Results1.
VAR LABEL Comp '|Pi)-P(j)|' Diffs'Absolute Difference' Rij'Critical
Range' Sig 'Significance'.
FORMAT Diffs Rij (F8.3).
OMS /SELECT TABLES
/IF COMMANDS = ["Summarize"]
   SUBTYPES = ["Case Processing Summary"]
/DESTINATION VIEWER = NO.
SUMMARIZE
 /TABLES=Comp Diffs Rij Sig
 /FORMAT=LIST NOCASENUM NOTOTAL
 /TITLE='Multiple comparisons: Marascuilo procedure'
 /MISSING=VARIABLE
 /CELLS=NONE.
OMSEND.
DATASET ACTIVATE Data.
DATASET CLOSE Results2.



MGG


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD