False Discovery Benjamin-Hochberg

skype: kornbrotme_______________________________

Alex Reutter

Re: False Discovery Benjamin-Hochberg

Hi Diana,

Assuming you want the Benjamini-Hochberg procedure (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure), you could just throw the p-values into the data editor, sort them, compute (k/m)*alpha for your chosen false discovery rate for each p-value, and then find the largest p-value that satisfies the inequality in the procedure. P-values less than that value are then considered significant.

Alex

Maguin, Eugene

Re: False Discovery Benjamin-Hochberg

In reply to this post by Kornbrot, Diana

This seems like it would be a simple problem. One problem might be getting the p values into a data file. But let’s say that is done. As I recall the BH procedure, the p’s are sorted in ascending order. It would be a good idea to number the records, which can be done using $casenum. I’m probably not remembering the formula correctly but let’s say it’s .05/irec, where irec is the record number. A p passes if it is less than .05/irec; otherwise, it and all following p’s fail. The computation is this.

Compute pass=0.

If (p le .05/irec) pass=1.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kornbrot, Diana
Sent: Saturday, March 29, 2014 10:24 AM
To: [hidden email]
Subject: False Discovery Benjamin-Hochberg

Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e.

p-value adjusted to take account of false discovery

number of "true" rejections of the null?

Am looking at situations where the number of tests may be in the 1000s

This can be genome data

it might be simulations, where p value distributions are compared

best

Diana

_______________

Professor Diana Kornbrot

University of Hertfordshire

College Lane, Hatfield, Hertfordshire AL10 9AB, UK

+44 (0) 170 728 4626

+44 (0) 208 444 2081

+44 (0) 7403 18 16 12

http://www-01.ibm.com/support/docview.wss?uid=swg21476447

skype: kornbrotme_______________________________

Bruce Weaver

Re: False Discovery Benjamin-Hochberg

Administrator

In reply to this post by Kornbrot, Diana

http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR

Kornbrot, Diana wrote

Hi
Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e.
p-value adjusted to take account of false discovery
number of "true" rejections of the null?

Am looking at situations where the number of tests may be in the 1000s
This can be genome data
OR
it might be simulations, where p value distributions are compared

best

Diana

_______________
Professor Diana Kornbrot
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
+44 (0) 208 444 2081
+44 (0) 7403 18 16 12
[hidden email]<mailto:[hidden email]>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype: kornbrotme_______________________________

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Art Kendall

Re: False Discovery Benjamin-Hochberg

In reply to this post by Alex Reutter

Does this syntax do it? TENTATIVE needs to be checked by other list members.

* make up some data. new file. input program. string SomeTestName (a25). numeric ObtainedProb (f5.3) . loop id = 1 to 10000. compute SomeTestName = "some descriptive words". compute ObtainedProb = (rnd(1000*rv.uniform(0,1))/1000). end case. end loop. end file. end input program. *. * the part below here would be applied to your data. *. numeric LessEqual.05 (f1). compute LessEqual.05 = ObtainedProb le .05. frequencies variable=ObtainedProb LessEqual.05 /formats = notable /histogram. aggregate outfile=* mode=addvariables /TestsDone = N /FoundRaw.05 = sum(LessEqual.05). Numeric AdjustedProb(f5.3). compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb. compute NowSignificant = AdjustedProb le .05. sort cases by AdjustedProb. temporary. select if NowSignificant. List /variables = ID SomeTestName AdjustedProb.

Art Kendall
Social Research Consultants

On 3/30/2014 9:52 PM, Alex Reutter [via SPSSX Discussion] wrote:

Hi Diana,

Assuming you want the Benjamini-Hochberg procedure (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure), you could just throw the p-values into the data editor, sort them, compute (k/m)*alpha for your chosen false discovery rate for each p-value, and then find the largest p-value that satisfies the inequality in the procedure. P-values less than that value are then considered significant.

Alex

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/False-Discovery-Benjamin-Hochberg-tp5725087p5725123.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

BH possible syntax to check..sps (904 bytes) Download Attachment

Art Kendall
Social Research Consultants

Bruce Weaver

Re: False Discovery Benjamin-Hochberg

Administrator

In reply to this post by Bruce Weaver

I just noticed that the syntax on that MRC page uses the "GHB Horrible Hack" method to define a macro variable for the number of cases. Using AGGREGATE (as in the syntax Art K posted) is a far better way to go! I.e., this...

* Calculate the number of p values.
RANK PVALS /n into N.
* N contains the number of cases in the file.
* make a submacro to be invoked from the syntax.

DO IF $CASENUM=1.
WRITE OUTFILE 'C:\temp.sps' /"DEFINE !nbcases()"n"!ENDDEFINE.".
END IF.
EXE.
INCLUDE FILE='C:\temp.sps'.

...could be replaced with a simple AGGREGATE that writes the number of cases to a new variable.

Bruce Weaver wrote

http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR

Kornbrot, Diana wrote

Hi
Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e.
p-value adjusted to take account of false discovery
number of "true" rejections of the null?

Am looking at situations where the number of tests may be in the 1000s
This can be genome data
OR
it might be simulations, where p value distributions are compared

best

Diana

_______________
Professor Diana Kornbrot
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
+44 (0) 208 444 2081
+44 (0) 7403 18 16 12
[hidden email]<mailto:[hidden email]>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype: kornbrotme_______________________________

David Marso

Re: False Discovery Benjamin-Hochberg

Administrator

Bruce,
Not only that, but the whole thing could/should be done in MATRIX in the first place.
It uses MATRIX to do the roundup, so why bother with all that early calculation which is trivial to do in MATRIX?.
OTOH: WTF?

"COMPUTE ccmpmx = pvals LE (CCOMP>0)*CMAX(((CMAX(ccomp) &* (diff LE 0)) EQ ccomp) &* pvals) ."

Given a few minutes of eye bleeding I could sort that, but I'm not going to bother.
Could probably be cast in standard syntax but let sleeping dogs lie as it were.
I write that sort of s#$% on occasion but usually comment the hell out of it if I intend it for public consumption (OPU).
------

Bruce Weaver wrote

I just noticed that the syntax on that MRC page uses the "GHB Horrible Hack" method to define a macro variable for the number of cases. Using AGGREGATE (as in the syntax Art K posted) is a far better way to go! I.e., this...

* Calculate the number of p values.
RANK PVALS /n into N.
* N contains the number of cases in the file.
* make a submacro to be invoked from the syntax.

DO IF $CASENUM=1.
WRITE OUTFILE 'C:\temp.sps' /"DEFINE !nbcases()"n"!ENDDEFINE.".
END IF.
EXE.
INCLUDE FILE='C:\temp.sps'.

...could be replaced with a simple AGGREGATE that writes the number of cases to a new variable.

Bruce Weaver wrote

http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR

Kornbrot, Diana wrote

Hi
Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e.
p-value adjusted to take account of false discovery
number of "true" rejections of the null?

Am looking at situations where the number of tests may be in the 1000s
This can be genome data
OR
it might be simulations, where p value distributions are compared

best

Diana

_______________
Professor Diana Kornbrot
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
+44 (0) 208 444 2081
+44 (0) 7403 18 16 12
[hidden email]<mailto:[hidden email]>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype: kornbrotme_______________________________

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Ryan

Re: False Discovery Benjamin-Hochberg

In reply to this post by Kornbrot, Diana

On Sat, Mar 29, 2014 at 10:24 AM, Kornbrot, Diana <[hidden email]> wrote:

Hi
Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e.
p-value adjusted to take account of false discovery
number of "true" rejections of the null?

Am looking at situations where the number of tests may be in the 1000s
This can be genome data
OR
it might be simulations, where p value distributions are compared

best

Diana

_______________
Professor Diana Kornbrot
University of Hertfordshire

College Lane, Hatfield, Hertfordshire AL10 9AB, UK
<a href="tel:%2B44%20%280%29%20170%20728%204626" value="+441707284626" target="_blank">+44 (0) 170 728 4626
<a href="tel:%2B44%20%280%29%20208%20444%202081" value="+442084442081" target="_blank">+44 (0) 208 444 2081
<a href="tel:%2B44%20%280%29%207403%2018%2016%2012" value="+447403181612" target="_blank">+44 (0) 7403 18 16 12
[hidden email]
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot

skype: kornbrotme_______________________________

Bruce Weaver

Re: False Discovery Benjamin-Hochberg

Administrator

In reply to this post by David Marso

Agreed! I retract that link, and suggest that Diana use the one Ryan posted instead.

http://www-01.ibm.com/support/docview.wss?uid=swg21476447

David Marso wrote

Bruce,
Not only that, but the whole thing could/should be done in MATRIX in the first place.
It uses MATRIX to do the roundup, so why bother with all that early calculation which is trivial to do in MATRIX?.
OTOH: WTF?

"COMPUTE ccmpmx = pvals LE (CCOMP>0)*CMAX(((CMAX(ccomp) &* (diff LE 0)) EQ ccomp) &* pvals) ."

Given a few minutes of eye bleeding I could sort that, but I'm not going to bother.
Could probably be cast in standard syntax but let sleeping dogs lie as it were.
I write that sort of s#$% on occasion but usually comment the hell out of it if I intend it for public consumption (OPU).
------

Bruce Weaver wrote

I just noticed that the syntax on that MRC page uses the "GHB Horrible Hack" method to define a macro variable for the number of cases. Using AGGREGATE (as in the syntax Art K posted) is a far better way to go! I.e., this...

* Calculate the number of p values.
RANK PVALS /n into N.
* N contains the number of cases in the file.
* make a submacro to be invoked from the syntax.

DO IF $CASENUM=1.
WRITE OUTFILE 'C:\temp.sps' /"DEFINE !nbcases()"n"!ENDDEFINE.".
END IF.
EXE.
INCLUDE FILE='C:\temp.sps'.

...could be replaced with a simple AGGREGATE that writes the number of cases to a new variable.

Bruce Weaver wrote

http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR

Kornbrot, Diana wrote

Hi
Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e.
p-value adjusted to take account of false discovery
number of "true" rejections of the null?

Am looking at situations where the number of tests may be in the 1000s
This can be genome data
OR
it might be simulations, where p value distributions are compared

best

Diana

_______________
Professor Diana Kornbrot
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
+44 (0) 208 444 2081
+44 (0) 7403 18 16 12
[hidden email]<mailto:[hidden email]>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype: kornbrotme_______________________________

Bruce Weaver

Re: False Discovery Benjamin-Hochberg

Administrator

In reply to this post by Art Kendall

Art, I'm responding to your request for checking of your results. See below.

* Define a macro using the basic code from the IBM website posted by Ryan:
* http://www-01.ibm.com/support/docview.wss?uid=swg21476447.

* The first macro argument is the name of the variable holding the p-values.
* The second argument is the desired False Discovery Rate (FDR)--if it is
* omitted, the default value of .05 is used.

DEFINE !FDR
( p = !CHAREND('/') /
FDR = !DEFAULT(.05) !CMDEND ).
sort cases by !p (a).
compute @i=$casenum.
sort cases by @i (d).
compute @m=max(@i,lag(@m)).
compute FDRcv=!FDR*@i/@m.
compute FDRsig=(!p le FDRcv).
compute FDRsig=max(FDRsig,lag(FDRsig)).
execute.
formats @i @m FDRsig(f5.0) FDRcv(f8.6).
value labels FDRsig 1 'Significant' 0 'Not Significant'.
sort cases by !p(a).
!ENDDEFINE.

* Use the data from the MRC website example at:
* http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR.

DATA LIST FREE / ObtainedProb (F5.3) .
BEGIN DATA
0.021
0.001
0.017
0.041
0.005
0.036
0.042
0.023
0.07
0.1
END DATA .

* Use Art K's method. Note that I have made some minor modifications.

numeric LessEqual.05 (f1).
compute LessEqual.05 = ObtainedProb le .05.
frequencies variable=ObtainedProb LessEqual.05
/formats = notable /histogram.
aggregate outfile=* mode=addvariables
/TestsDone = N
/FoundRaw.05 = sum(LessEqual.05).
Numeric AdjustedProb(f5.3).
compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb.
compute NowSignificant = AdjustedProb le .05.
sort cases by AdjustedProb.
FORMATS NowSignificant(f1) .

* Now call the macro defined earlier and compare results.

!FDR p=ObtainedProb.
LIST ObtainedProb TestsDone @i @m NowSignificant FDRsig.

OUTPUT:

ObtainedProb TestsDone @i @m NowSignificant FDRsig

.001 10 1 10 1 1
.005 10 2 10 1 1
.017 10 3 10 1 1
.021 10 4 10 1 1
.023 10 5 10 1 1
.036 10 6 10 1 0 <-- NOTE
.041 10 7 10 0 0
.042 10 8 10 0 0
.070 10 9 10 0 0
.100 10 10 10 0 0

Number of cases read: 10 Number of cases listed: 10

As you can see, the method from the IBM website does not flag .036 as significant, whereas your syntax does. I haven't taken the time to look into why the two disagree.

Oh yes, and the macro results above agree with those shown on the MRC website:

P Value FDR Criterion FDR Significance
.001 .050 1
.005 .050 1
.017 .050 1
.021 .050 1
.023 .050 1
.036 .050 0
.041 .050 0
.042 .050 0
.070 .050 0
.100 .050 0

HTH.

Art Kendall wrote

Does this syntax do
it? TENTATIVE needs to be checked by other list members.

* make up some data.
new file.
input program.
   string SomeTestName (a25).
   numeric ObtainedProb (f5.3) .
   loop id = 1 to 10000.
      compute SomeTestName = "some descriptive words".
      compute ObtainedProb = (rnd(1000*rv.uniform(0,1))/1000).
      end case.
   end loop.
   end file.
end input program.
*.
* the part below here would be applied to your data.
*.
numeric LessEqual.05 (f1).
compute LessEqual.05 = ObtainedProb le .05.
frequencies variable=ObtainedProb LessEqual.05
   /formats = notable /histogram.
aggregate outfile=* mode=addvariables
   /TestsDone = N
   /FoundRaw.05 = sum(LessEqual.05).
Numeric AdjustedProb(f5.3).
compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb.
compute NowSignificant = AdjustedProb le .05.
sort cases by AdjustedProb.
temporary.
select if NowSignificant.
List /variables = ID SomeTestName AdjustedProb.

Art Kendall
Social Research Consultants
On 3/30/2014 9:52 PM, Alex Reutter [via SPSSX Discussion] wrote:

Hi Diana,

Assuming you want the
Benjamini-Hochberg
procedure ( http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure ),
you could just throw the p-values into the data editor, sort
them, compute
(k/m)*alpha for your chosen false discovery rate for each
p-value, and
then find the largest p-value that satisfies the inequality in
the procedure.
P-values less than that value are then considered significant.

Alex

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/False-Discovery-Benjamin-Hochberg-tp5725087p5725123.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

BH possible syntax to check..sps (904 bytes) <http://spssx-discussion.1045642.n5.nabble.com/attachment/5725127/0/BH%20possible%20syntax%20to%20check..sps>

David Marso

Re: False Discovery Benjamin-Hochberg

Administrator

I'd add a third and fourth argument for the output variables .
Alternatively use the variable name and FDR in the 2 new variables.

DEFINE !FDR
( p = !CHAREND('/') !DEFAULT(p)
/ FDR = !CHAREND('/') !DEFAULT(.05)
/ FDRsig !CHAREND('/') !DEFAULT(FDRsig )
/ FDRcv !CMDEND !DEFAULT(FDRcv ) ).
SORT CASES BY !p (A).
COMPUTE @i=$casenum.
SORT CASES BY @i (D).
COMPUTE @m=MAX(@i,lag(@m)).
COMPUTE !FDRcv=!FDR*@i/@m.
COMPUTE !FDRsig=(!p LE !FDRcv).
COMPUTE !FDRsig=MAX(!FDRsig , LAG(!FDRsig)).
FORMATS !FDRsig(f5.0) !FDRcv(f8.6).
VALUE LABELS !FDRsig 1 'Significant' 0 'Not Significant'.
SORT CASES BY !p(A).
DELETE VARIABLES @i @m.
!ENDDEFINE.

DEFINE !FDR
( p = !CHAREND('/') !DEFAULT(p)
/ FDR = !CHAREND('/') !DEFAULT(.05) ).
SORT CASES BY !p (A).
COMPUTE @i=$casenum.
SORT CASES BY @i (D).
COMPUTE @m = MAX(@i,lag(@m)).
!LET !fdrcv = !CONCAT('FDRcv_',!FDR,'_',!p)
!LET !fdrsig = !CONCAT('FDRsig_',!FDR,'_',!p)
COMPUTE !fdrcv = !FDR * @i/@m.
COMPUTE !FDRsig=(!p LE !FDRcv).
COMPUTE !FDRsig=MAX(!FDRsig , LAG(!FDRsig)).
FORMATS !FDRsig(f5.0) !FDRcv(f8.6).
VALUE LABELS !FDRsig 1 'Significant' 0 'Not Significant'.
SORT CASES BY !p(A).
DELETE VARIABLES @i @m.
!ENDDEFINE.

--

Bruce Weaver wrote

Art, I'm responding to your request for checking of your results. See below.

* Define a macro using the basic code from the IBM website posted by Ryan:
* http://www-01.ibm.com/support/docview.wss?uid=swg21476447.

* The first macro argument is the name of the variable holding the p-values.
* The second argument is the desired False Discovery Rate (FDR)--if it is
* omitted, the default value of .05 is used.

DEFINE !FDR
( p = !CHAREND('/') /
FDR = !DEFAULT(.05) !CMDEND ).
sort cases by !p (a).
compute @i=$casenum.
sort cases by @i (d).
compute @m=max(@i,lag(@m)).
compute FDRcv=!FDR*@i/@m.
compute FDRsig=(!p le FDRcv).
compute FDRsig=max(FDRsig,lag(FDRsig)).
execute.
formats @i @m FDRsig(f5.0) FDRcv(f8.6).
value labels FDRsig 1 'Significant' 0 'Not Significant'.
sort cases by !p(a).
!ENDDEFINE.

* Use the data from the MRC website example at:
* http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR.

DATA LIST FREE / ObtainedProb (F5.3) .
BEGIN DATA
0.021
0.001
0.017
0.041
0.005
0.036
0.042
0.023
0.07
0.1
END DATA .

* Use Art K's method. Note that I have made some minor modifications.

numeric LessEqual.05 (f1).
compute LessEqual.05 = ObtainedProb le .05.
frequencies variable=ObtainedProb LessEqual.05
/formats = notable /histogram.
aggregate outfile=* mode=addvariables
/TestsDone = N
/FoundRaw.05 = sum(LessEqual.05).
Numeric AdjustedProb(f5.3).
compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb.
compute NowSignificant = AdjustedProb le .05.
sort cases by AdjustedProb.
FORMATS NowSignificant(f1) .

* Now call the macro defined earlier and compare results.

!FDR p=ObtainedProb.
LIST ObtainedProb TestsDone @i @m NowSignificant FDRsig.

OUTPUT:

ObtainedProb TestsDone @i @m NowSignificant FDRsig

.001 10 1 10 1 1
.005 10 2 10 1 1
.017 10 3 10 1 1
.021 10 4 10 1 1
.023 10 5 10 1 1
.036 10 6 10 1 0 <-- NOTE
.041 10 7 10 0 0
.042 10 8 10 0 0
.070 10 9 10 0 0
.100 10 10 10 0 0

Number of cases read: 10 Number of cases listed: 10

As you can see, the method from the IBM website does not flag .036 as significant, whereas your syntax does. I haven't taken the time to look into why the two disagree.

Oh yes, and the macro results above agree with those shown on the MRC website:

P Value FDR Criterion FDR Significance
.001 .050 1
.005 .050 1
.017 .050 1
.021 .050 1
.023 .050 1
.036 .050 0
.041 .050 0
.042 .050 0
.070 .050 0
.100 .050 0

HTH.

Art Kendall wrote

Does this syntax do
it? TENTATIVE needs to be checked by other list members.

* make up some data.
new file.
input program.
   string SomeTestName (a25).
   numeric ObtainedProb (f5.3) .
   loop id = 1 to 10000.
      compute SomeTestName = "some descriptive words".
      compute ObtainedProb = (rnd(1000*rv.uniform(0,1))/1000).
      end case.
   end loop.
   end file.
end input program.
*.
* the part below here would be applied to your data.
*.
numeric LessEqual.05 (f1).
compute LessEqual.05 = ObtainedProb le .05.
frequencies variable=ObtainedProb LessEqual.05
   /formats = notable /histogram.
aggregate outfile=* mode=addvariables
   /TestsDone = N
   /FoundRaw.05 = sum(LessEqual.05).
Numeric AdjustedProb(f5.3).
compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb.
compute NowSignificant = AdjustedProb le .05.
sort cases by AdjustedProb.
temporary.
select if NowSignificant.
List /variables = ID SomeTestName AdjustedProb.

Art Kendall
Social Research Consultants
On 3/30/2014 9:52 PM, Alex Reutter [via SPSSX Discussion] wrote:

Hi Diana,

Assuming you want the
Benjamini-Hochberg
procedure ( http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure ),
you could just throw the p-values into the data editor, sort
them, compute
(k/m)*alpha for your chosen false discovery rate for each
p-value, and
then find the largest p-value that satisfies the inequality in
the procedure.
P-values less than that value are then considered significant.

Alex

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/False-Discovery-Benjamin-Hochberg-tp5725087p5725123.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

BH possible syntax to check..sps (904 bytes) <http://spssx-discussion.1045642.n5.nabble.com/attachment/5725127/0/BH%20possible%20syntax%20to%20check..sps>

Kornbrot, Diana

Re: False Discovery Benjamin-Hochberg

In reply to this post by Alex Reutter

Thanks for help from several SPSS experts.

There is no direct option for Benjamini-Hochberg in any SPSS procedure. Nor is there any stand-alone procedure.

Several people provided helpful scripts. Many thanks. However, this is inevitably a cumbersome approach

Consequently, I have produced a simple EXCEL spreadsheet that takes a column of p-value and returns the number that are 'trully' reject the null at any chosen false discovery rate, q.

http://wp.me/PYt7T-6M

See Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300. http://www.jstor.org/stable/2346101

Apparently SPSS is considering adding BH to the options available in some procedures [e.g. GLM].

I would also favour a stand alone procedure.

This is because the set of p-values may well come form running a single procedure with a 'by' variable, as in my case. SPSS was far from eager to run logistic with 2 factors, 1 of which had 3219 levels [after 3 hours it was still on iteration2 - i gave up]

Comments welcome

best

Diana

___________

Professor Diana Kornbrot

Work

University of Hertfordshire

College Lane, Hatfield, Hertfordshire AL10 9AB, UK

+44 (0) 170 728 4626

skype: kornbrotme

Home

19 Elmhurst Avenue

London N2 0LT, UK

+44 (0) 208 444 2081

Andy W

Re: False Discovery Benjamin-Hochberg

Many of the regression models in SPSS have options to save the parameter estimates, standard errors & p-values to a separate dataset (see the OUTFILE subcommand for whatever procedure). So basically given the listed macros all you need to do is save the outfile of the estimates, potentially eliminate superfluous rows in that parameter estimate dataset, and then subject the p-values of interest to the FDR procedure.

If the regression procedure does not have an outfile subcommand you could use OMS to do the same thing with alittle more work.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

David Marso

Re: False Discovery Benjamin-Hochberg

Administrator

In reply to this post by Kornbrot, Diana

"Many thanks. However, this is inevitably a cumbersome approach"
".....Consequently, I have produced a simple EXCEL spreadsheet that takes a column of p-value and returns the number that are 'trully' reject the null at any chosen false discovery rate, q."

Excel??? Come on now Diane, now are just being Silly.
April Fools day already happened 2 days ago. You missed it.
Please don't be foolish.
Have you not acquainted yourself with the use of SPSS Macros after all this time of using the software?

Cumbersome? I personally find Excel quite cumbersome and rather useless for doing any sort of statistics.
And this ?categorical? variable with 3219 levels.
We would love to hear more about this. What on earth are you thinking? Why?
What is this variable? Why does it have 3219 levels.
Are you certain you are not doing something really stupid here?
---

Kornbrot, Diana wrote

Thanks for help from several SPSS experts.
There is no direct option for Benjamini-Hochberg in any SPSS procedure. Nor is there any stand-alone procedure.
Several people provided helpful scripts. Many thanks. However, this is inevitably a cumbersome approach
Consequently, I have produced a simple EXCEL spreadsheet that takes a column of p-value and returns the number that are 'trully' reject the null at any chosen false discovery rate, q.
http://wp.me/PYt7T-6M
See Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300. http://www.jstor.org/stable/2346101

Apparently SPSS is considering adding BH to the options available in some procedures [e.g. GLM].
I would also favour a stand alone procedure.
This is because the set of p-values may well come form running a single procedure with a 'by' variable, as in my case. SPSS was far from eager to run logistic with 2 factors, 1 of which had 3219 levels [after 3 hours it was still on iteration2 - i gave up]

Comments welcome

best

Diana
___________
Professor Diana Kornbrot
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
[hidden email]<mailto:[hidden email]>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype: kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
+44 (0) 208 444 2081

Kornbrot, Diana

Re: False Discovery Benjamin-Hochberg

In reply to this post by Alex Reutter

Dear friends

Very sorry about this.

Have absolutely no intention of running model with factor with 3219 levels.

The comment was an ASIDE, as I was trying quickest way of getting separate p-values for 3219 tests.

Please do NOT give me any further advice about this non-problem

What I actually need is p-values corresponding to LL tests of binary logistic with just one 2-valued predictor.

This is easily available with SPLIT file via either crosstabs or logistic regression or generalised linear models, logistic.

The relevant values have to be extracted using EXCEL from SPSS tabular output, which is a pain but feasible.

EXPORT in these procedures has good features, but does not seem to include p-values. This is perfectly sensible, as most people are not running these procedures with a SPLIT.

EXCEL has its uses, however much David hates it. It is good for data manipulation and graphs, where tailoring fro publication is much easier than with SPSS.

Now it is also easier for the Benjamini-Hochberg procedure! http://wp.me/PYt7T-6M

The problem I am investigating is to determine the most sensitive measure for varieties of Likert scale.

Traditionally this is investigated by simulation. Target differences, e.g. group1 p= 6, group 2 p=.7, between two group are simulated , say 10,000 times. then the number of significant p-values is counted. this can be compared with the number when both groups have p=.65.

My aim is to repeat this with REAL data. I know there is an effect of unit and that it is, of course, a random effect. Each unit is self-contained and wants to know if there is any improvement. Head office is uninterested in the average effect. It wants to know, for each unit, was there a change and if so in what direction. This needs the BH adjustment. IT turns out that proportion(favourable) is not a very sensitive measure.

No doubt YOU have all worked out the best measure. Data on request.

Fear it will be news to the satisfaction/likert community who routinely use proportion(agree,strongly agree) as their preferred measure

best

Diana

___________

Professor Diana Kornbrot

Work

University of Hertfordshire

College Lane, Hatfield, Hertfordshire AL10 9AB, UK

+44 (0) 170 728 4626