Goto

Collaborating Authors

 confidence model



Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation

arXiv.org Artificial Intelligence

We propose a method for confidence estimation in retrieval-augmented generation (RAG) systems that aligns closely with the correctness of large language model (LLM) outputs. Confidence estimation is especially critical in high-stakes domains such as finance and healthcare, where the cost of an incorrect answer outweighs that of not answering the question. Our approach extends prior uncertainty quantification methods by leveraging raw feed-forward network (FFN) activations as auto-regressive signals, avoiding the information loss inherent in token logits and probabilities after projection and softmax normalization. We model confidence prediction as a sequence classification task, and regularize training with a Huber loss term to improve robustness against noisy supervision. Applied in a real-world financial industry customer-support setting with complex knowledge bases, our method outperforms strong baselines and maintains high accuracy under strict latency constraints. Experiments on Llama 3.1 8B model show that using activations from only the 16th layer preserves accuracy while reducing response latency. Our results demonstrate that activation-based confidence modeling offers a scalable, architecture-aware path toward trustworthy RAG deployment.



Group Ligands Docking to Protein Pockets

arXiv.org Artificial Intelligence

Molecular docking is a key task in computational biology that has attracted increasing interest from the machine learning community. While existing methods have achieved success, they generally treat each protein-ligand pair in isolation. Inspired by the biochemical observation that ligands binding to the same target protein tend to adopt similar poses, we propose \textsc{GroupBind}, a novel molecular docking framework that simultaneously considers multiple ligands docking to a protein. This is achieved by introducing an interaction layer for the group of ligands and a triangle attention module for embedding protein-ligand and group-ligand pairs. By integrating our approach with diffusion-based docking model, we set a new S performance on the PDBBind blind docking benchmark, demonstrating the effectiveness of our proposed molecular docking paradigm.


Large Language Model Confidence Estimation via Black-Box Access

arXiv.org Artificial Intelligence

Given the proliferation of deep learning over the last decade or so [5], uncertainty or confidence estimation of these models has been an active research area [4]. Predicting accurate confidences in the generations produced by a large language model (LLM) are crucial for eliciting trust in the model and is also helpful for benchmarking and ranking competing models [37]. Moreover, LLM hallucination detection and mitigation, which is one of the most pressing problems in artificial intelligence research today [33], can also benefit significantly from accurate confidence estimation as it would serve as a strong indicator of the faithfulness of a LLM response. This applies to even settings where strategies such as retrieval augmented generation (RAG) are used [3] to mitigate hallucinations. Methods for confidence estimation in LLMs assuming just black-box or query access have been explored only recently [14, 19] and this area of research is still largely in its infancy. However, effective solutions here could have significant impact given their low requirement (i.e.


FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation

arXiv.org Artificial Intelligence

Molecular docking is a pivotal process in drug discovery. While traditional techniques rely on extensive sampling and simulation governed by physical principles, these methods are often slow and costly. The advent of deep learning-based approaches has shown significant promise, offering increases in both accuracy and efficiency. Building upon the foundational work of FABind, a model designed with a focus on speed and accuracy, we present FABind+, an enhanced iteration that largely boosts the performance of its predecessor. We identify pocket prediction as a critical bottleneck in molecular docking and propose a novel methodology that significantly refines pocket prediction, thereby streamlining the docking process. Furthermore, we introduce modifications to the docking module to enhance its pose generation capabilities. In an effort to bridge the gap with conventional sampling/generative methods, we incorporate a simple yet effective sampling technique coupled with a confidence model, requiring only minor adjustments to the regression framework of FABind. Experimental results and analysis reveal that FABind+ remarkably outperforms the original FABind, achieves competitive state-of-the-art performance, and delivers insightful modeling strategies. This demonstrates FABind+ represents a substantial step forward in molecular docking and drug discovery. Our code is in https://github.com/QizhiPei/FABind.


Variational Inference of Parameters in Opinion Dynamics Models

arXiv.org Machine Learning

Despite the frequent use of agent-based models (ABMs) for studying social phenomena, parameter estimation remains a challenge, often relying on costly simulation-based heuristics. This work uses variational inference to estimate the parameters of an opinion dynamics ABM, by transforming the estimation problem into an optimization task that can be solved directly. Our proposal relies on probabilistic generative ABMs (PGABMs): we start by synthesizing a probabilistic generative model from the ABM rules. Then, we transform the inference process into an optimization problem suitable for automatic differentiation. In particular, we use the Gumbel-Softmax reparameterization for categorical agent attributes and stochastic variational inference for parameter estimation. Furthermore, we explore the trade-offs of using variational distributions with different complexity: normal distributions and normalizing flows. We validate our method on a bounded confidence model with agent roles (leaders and followers). Our approach estimates both macroscopic (bounded confidence intervals and backfire thresholds) and microscopic ($200$ categorical, agent-level roles) more accurately than simulation-based and MCMC methods. Consequently, our technique enables experts to tune and validate their ABMs against real-world observations, thus providing insights into human behavior in social systems via data-driven analysis.


Deep Confident Steps to New Pockets: Strategies for Docking Generalization

arXiv.org Artificial Intelligence

Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. We carefully analyze the scaling laws of ML-based docking and show that, by scaling data and model size, as well as integrating synthetic data strategies, we are able to significantly increase the generalization capacity and set new state-of-the-art performance across benchmarks. Understanding how small molecules and proteins interact, a task known as molecular docking, is at the heart of drug discovery. The conventional use of docking in the industry has led the field to focus on finding binding conformations when restricting the search to predefined pockets and evaluating these on a relatively limited set of protein families of commercial interest. For example, it would help us understand the mechanism of action of new drugs to accelerate their development [Schottlender et al., 2022], predict adverse side-effects of drugs before clinical trials [Luo et al., 2018], and discover the function of the vast number of enzymes and membrane proteins whose biology we do not yet know [Yi et al., 2015]. All these tasks critically require the docking methods to generalize beyond the relatively small class of well-studied proteins for which we have many available structures. Existing docking benchmarks are largely built on collections of similar binding modes and fail to rigorously assess the ability of docking methods to generalize across the proteome. Gathering diverse data for protein-ligand interactions is challenging because binding pockets tend to be evolutionarily well-conserved due to their critical biological functions. Therefore, a large proportion of known interactions fall into a relatively small set of common binding modes. The results show that increasing both data and model can give significant generalization improvements.


Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model

arXiv.org Artificial Intelligence

Transition state (TS) search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D TS structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here, we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures - reactant, TS, and product - in an elementary reaction. Provided reactant and product, this model generates a TS structure in seconds instead of hours, which is typically required when performing quantum chemistry-based optimizations. The generated TS structures achieve a median of 0.08 Å root mean square deviation compared to the true TS. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction barrier estimation (2.6 kcal/mol) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision the proposed approach useful in constructing large reaction networks with unknown mechanisms.


Personalized Predictive ASR for Latency Reduction in Voice Assistants

arXiv.org Artificial Intelligence

Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to prefetch and cache a response. If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, thus saving latency. In this paper, we extend this idea by introducing predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance. We introduce two personalization approaches and investigate the tradeoff between potential latency gains from successful predictions and the cost increase from failed predictions. We evaluate our methods on an internal voice assistant dataset as well as the public SLURP dataset.