|
Hi
Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e. p-value adjusted to take account of false discovery number of "true" rejections of the null? Am looking at situations where the number of tests may be in the 1000s This can be genome data OR it might be simulations, where p value distributions are compared best Diana _______________ Professor Diana Kornbrot University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK +44 (0) 170 728 4626 +44 (0) 208 444 2081 +44 (0) 7403 18 16 12 skype: kornbrotme_______________________________ |
|
Hi Diana,
Assuming you want the Benjamini-Hochberg procedure (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure), you could just throw the p-values into the data editor, sort them, compute (k/m)*alpha for your chosen false discovery rate for each p-value, and then find the largest p-value that satisfies the inequality in the procedure. P-values less than that value are then considered significant. Alex |
|
In reply to this post by Kornbrot, Diana
This seems like it would be a simple problem. One problem might be getting the p values into a data file. But let’s say that is done. As I recall the BH procedure,
the p’s are sorted in ascending order. It would be a good idea to number the records, which can be done using $casenum. I’m probably not remembering the formula correctly but let’s say it’s .05/irec, where irec is the record number. A p passes if it is less
than .05/irec; otherwise, it and all following p’s fail. The computation is this. Compute pass=0. If (p le .05/irec) pass=1. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Kornbrot, Diana Hi Is there any simple way to enter a clump of p-values and have SPSS give Benjamini-Hochberg results, i.e. p-value adjusted to take account of false discovery number of "true" rejections of the null? Am looking at situations where the number of tests may be in the 1000s This can be genome data OR it might be simulations, where p value distributions are compared best Diana _______________ Professor Diana Kornbrot University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK +44 (0) 170 728 4626 +44 (0) 208 444 2081 +44 (0) 7403 18 16 12 skype:
kornbrotme_______________________________ |
|
Administrator
|
In reply to this post by Kornbrot, Diana
http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Alex Reutter
Does this syntax do
it? TENTATIVE needs to be checked by other list members.
* make up some data. new file. input program. string SomeTestName (a25). numeric ObtainedProb (f5.3) . loop id = 1 to 10000. compute SomeTestName = "some descriptive words". compute ObtainedProb = (rnd(1000*rv.uniform(0,1))/1000). end case. end loop. end file. end input program. *. * the part below here would be applied to your data. *. numeric LessEqual.05 (f1). compute LessEqual.05 = ObtainedProb le .05. frequencies variable=ObtainedProb LessEqual.05 /formats = notable /histogram. aggregate outfile=* mode=addvariables /TestsDone = N /FoundRaw.05 = sum(LessEqual.05). Numeric AdjustedProb(f5.3). compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb. compute NowSignificant = AdjustedProb le .05. sort cases by AdjustedProb. temporary. select if NowSignificant. List /variables = ID SomeTestName AdjustedProb. Art Kendall Social Research ConsultantsOn 3/30/2014 9:52 PM, Alex Reutter [via SPSSX Discussion] wrote: Hi Diana,
Art Kendall
Social Research Consultants |
|
Administrator
|
In reply to this post by Bruce Weaver
I just noticed that the syntax on that MRC page uses the "GHB Horrible Hack" method to define a macro variable for the number of cases. Using AGGREGATE (as in the syntax Art K posted) is a far better way to go! I.e., this...
* Calculate the number of p values. RANK PVALS /n into N. * N contains the number of cases in the file. * make a submacro to be invoked from the syntax. DO IF $CASENUM=1. WRITE OUTFILE 'C:\temp.sps' /"DEFINE !nbcases()"n"!ENDDEFINE.". END IF. EXE. INCLUDE FILE='C:\temp.sps'. ...could be replaced with a simple AGGREGATE that writes the number of cases to a new variable.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
Bruce,
Not only that, but the whole thing could/should be done in MATRIX in the first place. It uses MATRIX to do the roundup, so why bother with all that early calculation which is trivial to do in MATRIX?. OTOH: WTF? "COMPUTE ccmpmx = pvals LE (CCOMP>0)*CMAX(((CMAX(ccomp) &* (diff LE 0)) EQ ccomp) &* pvals) ." Given a few minutes of eye bleeding I could sort that, but I'm not going to bother. Could probably be cast in standard syntax but let sleeping dogs lie as it were. I write that sort of s#$% on occasion but usually comment the hell out of it if I intend it for public consumption (OPU). ------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
In reply to this post by Kornbrot, Diana
On Sat, Mar 29, 2014 at 10:24 AM, Kornbrot, Diana <[hidden email]> wrote:
|
|
Administrator
|
In reply to this post by David Marso
Agreed! I retract that link, and suggest that Diana use the one Ryan posted instead.
http://www-01.ibm.com/support/docview.wss?uid=swg21476447
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
In reply to this post by Art Kendall
Art, I'm responding to your request for checking of your results. See below.
* Define a macro using the basic code from the IBM website posted by Ryan: * http://www-01.ibm.com/support/docview.wss?uid=swg21476447. * The first macro argument is the name of the variable holding the p-values. * The second argument is the desired False Discovery Rate (FDR)--if it is * omitted, the default value of .05 is used. DEFINE !FDR ( p = !CHAREND('/') / FDR = !DEFAULT(.05) !CMDEND ). sort cases by !p (a). compute @i=$casenum. sort cases by @i (d). compute @m=max(@i,lag(@m)). compute FDRcv=!FDR*@i/@m. compute FDRsig=(!p le FDRcv). compute FDRsig=max(FDRsig,lag(FDRsig)). execute. formats @i @m FDRsig(f5.0) FDRcv(f8.6). value labels FDRsig 1 'Significant' 0 'Not Significant'. sort cases by !p(a). !ENDDEFINE. * Use the data from the MRC website example at: * http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/FDR. DATA LIST FREE / ObtainedProb (F5.3) . BEGIN DATA 0.021 0.001 0.017 0.041 0.005 0.036 0.042 0.023 0.07 0.1 END DATA . * Use Art K's method. Note that I have made some minor modifications. numeric LessEqual.05 (f1). compute LessEqual.05 = ObtainedProb le .05. frequencies variable=ObtainedProb LessEqual.05 /formats = notable /histogram. aggregate outfile=* mode=addvariables /TestsDone = N /FoundRaw.05 = sum(LessEqual.05). Numeric AdjustedProb(f5.3). compute AdjustedProb = (TestsDone/FoundRaw.05) * ObtainedProb. compute NowSignificant = AdjustedProb le .05. sort cases by AdjustedProb. FORMATS NowSignificant(f1) . * Now call the macro defined earlier and compare results. !FDR p=ObtainedProb. LIST ObtainedProb TestsDone @i @m NowSignificant FDRsig. OUTPUT: ObtainedProb TestsDone @i @m NowSignificant FDRsig .001 10 1 10 1 1 .005 10 2 10 1 1 .017 10 3 10 1 1 .021 10 4 10 1 1 .023 10 5 10 1 1 .036 10 6 10 1 0 <-- NOTE .041 10 7 10 0 0 .042 10 8 10 0 0 .070 10 9 10 0 0 .100 10 10 10 0 0 Number of cases read: 10 Number of cases listed: 10 As you can see, the method from the IBM website does not flag .036 as significant, whereas your syntax does. I haven't taken the time to look into why the two disagree. Oh yes, and the macro results above agree with those shown on the MRC website: P Value FDR Criterion FDR Significance .001 .050 1 .005 .050 1 .017 .050 1 .021 .050 1 .023 .050 1 .036 .050 0 .041 .050 0 .042 .050 0 .070 .050 0 .100 .050 0 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
I'd add a third and fourth argument for the output variables .
Alternatively use the variable name and FDR in the 2 new variables. DEFINE !FDR ( p = !CHAREND('/') !DEFAULT(p) / FDR = !CHAREND('/') !DEFAULT(.05) / FDRsig !CHAREND('/') !DEFAULT(FDRsig ) / FDRcv !CMDEND !DEFAULT(FDRcv ) ). SORT CASES BY !p (A). COMPUTE @i=$casenum. SORT CASES BY @i (D). COMPUTE @m=MAX(@i,lag(@m)). COMPUTE !FDRcv=!FDR*@i/@m. COMPUTE !FDRsig=(!p LE !FDRcv). COMPUTE !FDRsig=MAX(!FDRsig , LAG(!FDRsig)). FORMATS !FDRsig(f5.0) !FDRcv(f8.6). VALUE LABELS !FDRsig 1 'Significant' 0 'Not Significant'. SORT CASES BY !p(A). DELETE VARIABLES @i @m. !ENDDEFINE. DEFINE !FDR ( p = !CHAREND('/') !DEFAULT(p) / FDR = !CHAREND('/') !DEFAULT(.05) ). SORT CASES BY !p (A). COMPUTE @i=$casenum. SORT CASES BY @i (D). COMPUTE @m = MAX(@i,lag(@m)). !LET !fdrcv = !CONCAT('FDRcv_',!FDR,'_',!p) !LET !fdrsig = !CONCAT('FDRsig_',!FDR,'_',!p) COMPUTE !fdrcv = !FDR * @i/@m. COMPUTE !FDRsig=(!p LE !FDRcv). COMPUTE !FDRsig=MAX(!FDRsig , LAG(!FDRsig)). FORMATS !FDRsig(f5.0) !FDRcv(f8.6). VALUE LABELS !FDRsig 1 'Significant' 0 'Not Significant'. SORT CASES BY !p(A). DELETE VARIABLES @i @m. !ENDDEFINE. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
In reply to this post by Alex Reutter
Thanks for help from several SPSS experts.
There is no direct option for Benjamini-Hochberg in any SPSS procedure. Nor is there any stand-alone procedure. Several people provided helpful scripts. Many thanks. However, this is inevitably a cumbersome approach Consequently, I have produced a simple EXCEL spreadsheet that takes a column of p-value and returns the number that are 'trully' reject the null at any chosen false discovery rate, q. See Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300. http://www.jstor.org/stable/2346101 Apparently SPSS is considering adding BH to the options available in some procedures [e.g. GLM]. I would also favour a stand alone procedure. This is because the set of p-values may well come form running a single procedure with a 'by' variable, as in my case. SPSS was far from eager to run logistic with 2 factors, 1 of which had 3219 levels [after 3 hours it was still on iteration2 - i gave up] Comments welcome best Diana
___________ Professor Diana Kornbrot Work University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK +44 (0) 170 728 4626 skype: kornbrotme Home 19 Elmhurst Avenue London N2 0LT, UK +44 (0) 208 444 2081 |
|
Many of the regression models in SPSS have options to save the parameter estimates, standard errors & p-values to a separate dataset (see the OUTFILE subcommand for whatever procedure). So basically given the listed macros all you need to do is save the outfile of the estimates, potentially eliminate superfluous rows in that parameter estimate dataset, and then subject the p-values of interest to the FDR procedure.
If the regression procedure does not have an outfile subcommand you could use OMS to do the same thing with alittle more work. |
|
Administrator
|
In reply to this post by Kornbrot, Diana
"Many thanks. However, this is inevitably a cumbersome approach"
".....Consequently, I have produced a simple EXCEL spreadsheet that takes a column of p-value and returns the number that are 'trully' reject the null at any chosen false discovery rate, q." Excel??? Come on now Diane, now are just being Silly. April Fools day already happened 2 days ago. You missed it. Please don't be foolish. Have you not acquainted yourself with the use of SPSS Macros after all this time of using the software? Cumbersome? I personally find Excel quite cumbersome and rather useless for doing any sort of statistics. And this ?categorical? variable with 3219 levels. We would love to hear more about this. What on earth are you thinking? Why? What is this variable? Why does it have 3219 levels. Are you certain you are not doing something really stupid here? ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
In reply to this post by Alex Reutter
Dear friends Very sorry about this. Have absolutely no intention of running model with factor with 3219 levels. The comment was an ASIDE, as I was trying quickest way of getting separate p-values for 3219 tests. Please do NOT give me any further advice about this non-problem What I actually need is p-values corresponding to LL tests of binary logistic with just one 2-valued predictor. This is easily available with SPLIT file via either crosstabs or logistic regression or generalised linear models, logistic. The relevant values have to be extracted using EXCEL from SPSS tabular output, which is a pain but feasible. EXPORT in these procedures has good features, but does not seem to include p-values. This is perfectly sensible, as most people are not running these procedures with a SPLIT. EXCEL has its uses, however much David hates it. It is good for data manipulation and graphs, where tailoring fro publication is much easier than with SPSS. Now it is also easier for the Benjamini-Hochberg procedure! http://wp.me/PYt7T-6M The problem I am investigating is to determine the most sensitive measure for varieties of Likert scale. Traditionally this is investigated by simulation. Target differences, e.g. group1 p= 6, group 2 p=.7, between two group are simulated , say 10,000 times. then the number of significant p-values is counted. this can be compared with the number when both groups have p=.65. My aim is to repeat this with REAL data. I know there is an effect of unit and that it is, of course, a random effect. Each unit is self-contained and wants to know if there is any improvement. Head office is uninterested in the average effect. It wants to know, for each unit, was there a change and if so in what direction. This needs the BH adjustment. IT turns out that proportion(favourable) is not a very sensitive measure. No doubt YOU have all worked out the best measure. Data on request. Fear it will be news to the satisfaction/likert community who routinely use proportion(agree,strongly agree) as their preferred measure best Diana
___________ Professor Diana Kornbrot Work University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK +44 (0) 170 728 4626 skype: kornbrotme Home 19 Elmhurst Avenue London N2 0LT, UK +44 (0) 208 444 2081 |
|
Diana, When you have the opportunity, I’d like the data, deidentified of course. Thanks. Brian From: SPSSX(r) Discussion [mailto: Dear friends Very sorry about this. Have absolutely no intention of running model with factor with 3219 levels. The comment was an ASIDE, as I was trying quickest way of getting separate p-values for 3219 tests. Please do NOT give me any further advice about this non-problem What I actually need is p-values corresponding to LL tests of binary logistic with just one 2-valued predictor. This is easily available with
The relevant values have to be extracted using EXCEL from SPSS tabular output, which is a pain but feasible. EXPORT in these procedures has good features, but does not seem to include p-values. This is perfectly sensible, as most people are not running these procedures with
a EXCEL has its uses, however much David hates it. It is good for data manipulation and graphs, where tailoring fro publication is much easier than with SPSS. Now it is also easier for the Benjamini-Hochberg procedure! http://wp.me/PYt7T-6M The problem I am investigating is to determine the most sensitive measure for varieties of Likert scale. Traditionally this is investigated by simulation. Target differences, e.g. group1 p= 6, group 2 p=.7, between two group are simulated , say 10,000 times. then the number
of significant p-values is counted. this can be compared with the number when both groups have p=.65. My aim is to repeat this with REAL data. I know there is an effect of unit and that it is, of course, a random effect. Each unit is self-contained and wants to know
if there is any improvement. Head office is uninterested in the average effect. It wants to know, for each unit, was there a change and if so in what direction. This needs the BH adjustment. IT turns out that proportion(favourable) is not a very sensitive
measure. No doubt YOU have all worked out the best measure. Data on request. Fear it will be news to the satisfaction/likert community who routinely use proportion(agree,strongly agree) as their preferred measure best Diana ___________ Professor Diana Kornbrot Work College Lane, Hatfield,
+44 (0) 170 728 4626 skype:
kornbrotme Home +44 (0) 208 444 2081
|
|
Administrator
|
In reply to this post by Kornbrot, Diana
"The relevant values have to be extracted using EXCEL from SPSS tabular output, which is a pain but feasible. EXPORT in these procedures has good features, but does not seem to include p-values. " ???????? Utterly Painless with OMS! /*<Data simulation>*/. MATRIX. SAVE TRUNC(UNIFORM(100000,2)*2) / OUTFILE * /VARIABLES X1 X2. END MATRIX. COMPUTE Group=TRUNC(UNIFORM(1000)). SORT CASES BY Group. /*</Data simulation> */. * OMS. SPLIT FILE BY Group. DATASET DECLARE Chi. OMS /SELECT TABLES /IF COMMANDS=['Crosstabs'] /DESTINATION VIEWER=NO. OMS /SELECT TABLES /IF COMMANDS=['Crosstabs'] SUBTYPES=['Chi Square Tests'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='Chi' VIEWER=NO. CROSSTABS X1 BY X2 / STATISTICS CHI. OMSEND. DATASET ACTIVATE Chi . DELETE VARIABLES TableNumber_ Command_ Subtype_ Label_ . SELECT IF (Var2 EQ "Likelihood Ratio").
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
For a more complete example, I combine that with the two macros David posted earlier in the thread. (I've renamed them by appending v1 and v2 so that both can be used.)
DEFINE !FDRv1 ( p = !CHAREND('/') !DEFAULT(p) / FDR = !CHAREND('/') !DEFAULT(.05) / FDRsig !CHAREND('/') !DEFAULT(FDRsig ) / FDRcv !CMDEND !DEFAULT(FDRcv ) ). SORT CASES BY !p (A). COMPUTE @i=$casenum. SORT CASES BY @i (D). COMPUTE @m=MAX(@i,lag(@m)). COMPUTE !FDRcv=!FDR*@i/@m. COMPUTE !FDRsig=(!p LE !FDRcv). COMPUTE !FDRsig=MAX(!FDRsig , LAG(!FDRsig)). FORMATS !FDRsig(f5.0) !FDRcv(f8.6). VALUE LABELS !FDRsig 1 'Significant' 0 'Not Significant'. SORT CASES BY !p(A). DELETE VARIABLES @i @m. !ENDDEFINE. DEFINE !FDRv2 ( p = !CHAREND('/') !DEFAULT(p) / FDR = !CHAREND('/') !DEFAULT(.05) ). SORT CASES BY !p (A). COMPUTE @i=$casenum. SORT CASES BY @i (D). COMPUTE @m = MAX(@i,lag(@m)). !LET !fdrcv = !CONCAT('FDRcv_',!FDR,'_',!p) !LET !fdrsig = !CONCAT('FDRsig_',!FDR,'_',!p) COMPUTE !fdrcv = !FDR * @i/@m. COMPUTE !FDRsig=(!p LE !FDRcv). COMPUTE !FDRsig=MAX(!FDRsig , LAG(!FDRsig)). FORMATS !FDRsig(f5.0) !FDRcv(f8.6). VALUE LABELS !FDRsig 1 'Significant' 0 'Not Significant'. SORT CASES BY !p(A). DELETE VARIABLES @i @m. !ENDDEFINE. NEW FILE. DATASET CLOSE all. /*<Data simulation>*/. MATRIX. SAVE TRUNC(UNIFORM(100000,2)*2) / OUTFILE * /VARIABLES X1 X2. END MATRIX. COMPUTE Group=TRUNC(UNIFORM(1000)). SORT CASES BY Group. /*</Data simulation> */. * OMS. SPLIT FILE BY Group. DATASET DECLARE Chi. OMS /SELECT TABLES /IF COMMANDS=['Crosstabs'] /DESTINATION VIEWER=NO. OMS /SELECT TABLES /IF COMMANDS=['Crosstabs'] SUBTYPES=['Chi Square Tests'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='Chi' VIEWER=NO. CROSSTABS X1 BY X2 / STATISTICS CHI. OMSEND. DATASET ACTIVATE Chi . DELETE VARIABLES TableNumber_ Command_ Subtype_ Label_ . SELECT IF (Var2 EQ "Likelihood Ratio"). *EXECUTE. *SET MPRINT on. !FDRv1 p = Asymp.Sig.2sided . !FDRv2 p = Asymp.Sig.2sided . *SET MPRINT off. CROSSTABS FDRsig by FDRsig_.05_Asymp.Sig.2sided.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Kornbrot, Diana
I have a more general question about the FDR. How does one report this? Each p-value gets a "new" p-value using the FDR. I suspect there is something I am not understanding about this because when FDR is reported in the literature, it is something like this:
"Associations were considered important if the false discovery rate (FDR) q-value was <0.11" How does one come up with just one number for this if you have say, 10 p-values? |
| Free forum by Nabble | Edit this page |
