Goto

Collaborating Authors

 FDA


Strategic Hypothesis Testing

Neural Information Processing Systems

We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold.


Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection

Neural Information Processing Systems

Out-of-distribution (OOD) node detection in graphs is a critical yet challenging task. Most existing approaches rely heavily on fine-grained labeled data to obtain a pretrained supervised classifier, inherently assuming the existence of a well-defined pretext classification task. However, when such a task is ill-defined or absent, their applicability becomes severely limited. To overcome this limitation, there is an urgent need to propose a more scalable OOD detection method that is independent of both pretext tasks and label supervision. We harness a new phenomenon called Feature Resonance, focusing on the feature space rather than the label space. We observe that, ideally, during the optimization of known ID samples, unknown ID samples undergo more significant representation changes than OOD samples, even when the model is trained to align arbitrary targets. The rationale behind it is that even without gold labels, the local manifold may still exhibit smooth resonance. Based on this, we further develop a novel graph OOD framework, dubbed Resonance-based Separation and Learning (RSL), which comprises two core modules: (i)-a more practical micro-level proxy of feature resonance that measures the movement of feature vectors in one training step.



Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health

Neural Information Processing Systems

This position paper argues that post-deployment monitoring in clinical AI is underdeveloped and proposes statistically valid and label-efficient testing frameworks as a principled foundation for ensuring reliability and safety in real-world deployment. A recent review found that only 9% of FDA-registered AI-based healthcare tools include a post-deployment surveillance plan [1]. Existing monitoring approaches are often manual, sporadic, and reactive, making them ill-suited for the dynamic environments in which clinical models operate. We contend that post-deployment monitoring should be grounded in label-efficient and statistically valid testing frameworks, offering a principled alternative to current practices. We use the term "statistically valid" to refer to methods that provide explicit guarantees on error rates (e.g., Type I/II error), enable formal inference under pre-defined assumptions, and support reproducibility--features that align with regulatory requirements. Specifically, we propose that the detection of changes in the data and model performance degradation should be framed as distinct statistical hypothesis testing problems. Grounding monitoring in statistical rigor ensures a reproducible and scientifically sound basis for maintaining the reliability of clinical AI systems. Importantly, it also opens new research directions for the technical community--spanning theory, methods, and tools for statistically principled detection, attribution, and mitigation of post-deployment model failures in real-world settings.


Automatic Auxiliary Task Selection and Adaptive Weighting Boost Molecular Property Prediction

Neural Information Processing Systems

Recent studies in Machine Learning (ML) for biological research focus on investigating molecular properties to accelerate drug discovery. However, limited labeled molecular data often hampers the performance of ML models. A common strategy to mitigate data scarcity is leveraging auxiliary learning tasks to provide additional supervision, but selecting effective auxiliary tasks requires substantial domain expertise and manual effort, and their inclusion does not always guarantee performance gains. To overcome these challenges, we introduce Automatic Auxiliary Task Selection (AUTAUT), a fully automated framework that seamlessly retrieves auxiliary tasks using large language models and adaptively integrates them through a novel gradient alignment weighting mechanism. By automatically emphasizing auxiliary tasks aligned with the primary objective, AUTAUT significantly enhances predictive accuracy while reducing negative impacts from irrelevant tasks. Extensive evaluations demonstrate that AUTAUT outperforms 10 auxiliary task-based approaches and 18 advanced molecular property prediction models.


AI Is Taking Over Hospitals

The Atlantic - Technology

This is health care's Uber moment. Every knowledge-based profession may one day reach the point when AI outperforms the human experts. In medicine, that day appeared to come in April. A group of primarily Harvard and Stanford researchers announced the results of a study that pitted ChatGPT against hundreds of physicians in a diagnostic obstacle course involving written medical mysteries and information from real-world patients. The bot had won, and the humans weren't entirely happy about it.


KeeA: Epistemic Exploratory A Search via Knowledge Calibration

Neural Information Processing Systems

In recent years, neural network-guided heuristic search algorithms, such as MonteCarlo tree search and A search, have achieved significant advancements across diverse practical applications. Due to the challenges stemming from high statespace complexity, sparse training datasets, and incomplete environmental modeling, heuristic estimations manifest uncontrolled inherent biases towards the actual expected evaluations, thereby compromising the decision-making quality of search algorithms. Sampling exploration enhanced A (SeeA) was proposed to improve the efficiency of A search by constructing an dynamic candidate subset through random sampling, from which the expanded node was selected.


This Young Advocate Is Fighting to Make Every School Allergy-Safe

TIME - Tech

Follow this author to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. In 2019, 6-year-old Zacky Muรฑoz was eating his usual lunch of pasta, salad, and breadsticks at his school cafeteria in Pasadena, California. "I suddenly felt a weird feeling--it was like a fight-or-flight response, an alarm inside my body telling me I was in danger," he says.


CGBENCH: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research

Neural Information Processing Systems

Variant and gene interpretation are fundamental to personalized medicine and translational biomedicine. However, traditional approaches are manual and labor-intensive. Generative language models (LMs) can facilitate this process, accelerating the translation of fundamental research into clinically-actionable insights. While existing benchmarks have attempted to quantify the capabilities of LMs for interpreting scientific data, these studies focus on narrow tasks that do not translate to real-world research. To meet these challenges, we introduce CGBENCH, a robust benchmark that tests reasoning capabilities of LMs on scientific publications.


CIDD: Collaborative Intelligence for Structure-Based Drug Design Empowered by LLMs

Neural Information Processing Systems

Structure-guided molecular generation is pivotal in early-stage drug discovery, enabling the design of compounds tailored to specific protein targets. However, despite recent advances in 3D generative modeling, particularly in improving docking scores, these methods often produce uncommon and intrinsically unreasonable molecular structures that deviate from drug-like chemical space. To quantify this issue, we propose a novel metric, the Molecule Reasonable Ratio (MRR), which measures structural rationality and reveals a critical gap between existing models and real-world approved drugs. To address this, we introduce the Collaborative Intelligence Drug Design (CIDD) framework, the first approach to unify the 3D interaction modeling capabilities of generative models with the general knowledge and reasoning power of large language models (LLMs). By leveraging LLMbased Chain-of-Thought reasoning, CIDD generates molecules that are not only compatible with protein pockets but also exhibit favorable drug-likeness, structural rationality, and synthetic accessibility. On the CrossDocked2020 benchmark, CIDD consistently improves drug-likeness metrics, including QED, SA, and MRR, across different base generative models, while maintaining competitive binding affinity. Notably, it raises the combined success rate (balancing drug-likeness and binding) from 15.72% to 34.59%, more than doubling previous results. These findings demonstrate the value of integrating knowledge reasoning with geometric generation to advance AI-driven drug design.3