Goto

Collaborating Authors

 fdr


Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables

Zhu, Meiyi, Simeone, Osvaldo

arXiv.org Machine Learning

Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.





DeepPINK:reproduciblefeatureselectionindeep neuralnetworks

Neural Information Processing Systems

Analogously,if afinancial transaction is flagged to be fraudulent, then the security teams want to knowwhich activities orbehaviorsledtotheflagging.


fair_active_learning_neurips22 (2)

Romain Camilleri

Neural Information Processing Systems

Algorithm 1 BestSafe ArmIdentification ( BESIDE) 1: input: tolerance , confidence 2: dlog (20 )e, b i,0safe(z) 0, b 0(z) 0forallz 2 Z 3: for` =1 ,2,..., do 4: ` 20 2 ` Figure 7: Halfcircledataset.Figure 8: PrecisionFigure 9: Recall




DeepDRK: DeepDependencyRegularizedKnockoff forFeatureSelection

Neural Information Processing Systems

Since itsintroduction inparametric design, knockofftechniques haveevolvedto handle arbitrary data distributions using deep learning-based generative models.


AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis

Neural Information Processing Systems

High-dimensional mediation analysis is often associated with a multiple testing problem for detecting significant mediators. Assessing the uncertainty of this detecting process via false discovery rate (FDR) has garnered great interest. To control the FDR in multiple testing, two essential steps are involved: ranking and selection. Existing approaches either construct p-values without calibration or disregard the joint information across tests, leading to conservation in FDR control or non-optimal ranking rules for multiple hypotheses. In this paper, we develop an adaptive mediation detection procedure (referred to as AMDP) to identify relevant mediators while asymptotically controlling the FDR in high-dimensional mediation analysis. AMDP produces the optimal rule for ranking hypotheses and proposes a data-driven strategy to determine the threshold for mediator selection. This novel method captures information from the proportions of composite null hypotheses and the distribution of p-values, which turns the high dimensionality into an advantage instead of a limitation. The numerical studies on synthetic and real data sets illustrate the performances of AMDP compared with existing approaches.