Uncertainty
Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers
Varshney, Disha, Garg, Samarth, Tyagi, Sarthak, Varshney, Deeksha, Deep, Nayan, Ekbal, Asif
In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor harness the accessible protein 3D structural data, which is recognized as a decisive factor in dictating protein functions. To address this, we utilize protein residue graphs and introduce various forms of sequential or structural connections to capture enhanced spatial information. We adeptly combine Graph Neural Networks (GNNs) and Language Models (LMs), specifically utilizing a pre-trained transformer-based protein language model to encode amino acid sequences and employing message-passing mechanisms like GCN and R-GCN to capture geometric characteristics of protein structures. Employing convolution within a specific node's nearby region, including relations, we stack multiple con-volutional layers to efficiently learn combined insights from the protein's spatial graph, revealing intricate interconnections and dependencies in its structural To assess our model's performance, we employed the training dataset provided by NetSurfP-2.0, which outlines secondary structure in 3-and 8-states. Extensive experiments show that our proposed model, SSRGNet surpasses the baseline on f1-scores. Introduction Proteins serve as essential components within cells and are involved in various applications, spanning from therapeutics to materials. They are composed of a sequence of amino acids that fold into distinct shapes. With the development of affordable sequencing technologies [1, 2], a substantial number of novel protein sequences have been identified in recent times. However, annotating the functional properties of a newly discovered protein sequence is still a laborious and expensive process. Thus, there is a need for reliable and efficient computational methods to accurately predict and assign functions to proteins, thereby bridging the gap between sequence information and functional knowledge. The analysis of protein structure, particularly the tertiary structure, is highly significant for practical applications related to proteins, such as understanding their functions and designing drugs [3].
A Quantum Tensor Network-Based Viewpoint for Modeling and Analysis of Time Series Data
Vipulananthan, Pragatheeswaran, Premaratne, Kamal, Sarkar, Dilip, Murthi, Manohar N.
Accurate uncertainty quantification is a critical challenge in machine learning. While neural networks are highly versatile and capable of learning complex patterns, they often lack interpretability due to their ``black box'' nature. On the other hand, probabilistic ``white box'' models, though interpretable, often suffer from a significant performance gap when compared to neural networks. To address this, we propose a novel quantum physics-based ``white box'' method that offers both accurate uncertainty quantification and enhanced interpretability. By mapping the kernel mean embedding (KME) of a time series data vector to a reproducing kernel Hilbert space (RKHS), we construct a tensor network-inspired 1D spin chain Hamiltonian, with the KME as one of its eigen-functions or eigen-modes. We then solve the associated Schr{ö}dinger equation and apply perturbation theory to quantify uncertainty, thereby improving the interpretability of tasks performed with the quantum tensor network-based model. We demonstrate the effectiveness of this methodology, compared to state-of-the-art ``white box" models, in change point detection and time series clustering, providing insights into the uncertainties associated with decision-making throughout the process.
Practical Causal Evaluation Metrics for Biological Networks
Sato, Noriaki, Scutari, Marco, Kawano, Shuichi, Yamaguchi, Rui, Imoto, Seiya
Estimating causal networks from biological data is a critical step in systems biology. When evaluating the inferred network, assessing the networks based on their intervention effects is particularly important for downstream probabilistic reasoning and the identification of potential drug targets. In the context of gene regulatory network inference, biological databases are often used as reference sources. These databases typically describe relationships in a qualitative rather than quantitative manner. However, few evaluation metrics have been developed that take this qualitative nature into account. To address this, we developed a metric, the sign-augmented Structural Intervention Distance (sSID), and a weighted sSID that incorporates the net effects of the intervention. Through simulations and analyses of real transcriptomic datasets, we found that our proposed metrics could identify a different algorithm as optimal compared to conventional metrics, and the network selected by sSID had a superior performance in the classification task of clinical covariates using transcriptomic data. This suggests that sSID can distinguish networks that are structurally correct but functionally incorrect, highlighting its potential as a more biologically meaningful and practical evaluation metric.
ActiveGrasp: Information-Guided Active Grasping with Calibrated Energy-based Model
Lei, Boshu, Jiang, Wen, Daniilidis, Kostas
Grasping in a densely cluttered environment is a challenging task for robots. Previous methods tried to solve this problem by actively gathering multiple views before grasp pose generation. However, they either overlooked the importance of the grasp distribution for information gain estimation or relied on the projection of the grasp distribution, which ignores the structure of grasp poses on the SE(3) manifold. To tackle these challenges, we propose a calibrated energy-based model for grasp pose generation and an active view selection method that estimates information gain from grasp distribution. Our energy-based model captures the multi-modality nature of grasp distribution on the SE(3) manifold. The energy level is calibrated to the success rate of grasps so that the predicted distribution aligns with the real distribution. The next best view is selected by estimating the information gain for grasp from the calibrated distribution conditioned on the reconstructed environment, which could efficiently drive the robot to explore affordable parts of the target object. Experiments on simulated environments and real robot setups demonstrate that our model could successfully grasp objects in a cluttered environment with limited view budgets compared to previous state-of-the-art models. Our simulated environment can serve as a reproducible platform for future research on active grasping. The source code of our paper will be made public when the paper is released to the public.
A Multicollinearity-Aware Signal-Processing Framework for Cross-$β$ Identification via X-ray Scattering of Alzheimer's Tissue
Bashit, Abdullah Al, Nepal, Prakash, Makowski, Lee
X-ray scattering measurements of in situ human brain tissue encode structural signatures of pathological cross-$β$ inclusions, yet systematic exploitation of these data for automated detection remains challenging due to substrate contamination, strong inter-feature correlations, and limited sample sizes. This work develops a three-stage classification framework for identifying cross-$β$ structural inclusions-a hallmark of Alzheimer's disease-in X-ray scattering profiles of post-mortem human brain. Stage 1 employs a Bayes-optimal classifier to separate mica substrate from tissue regions on the basis of their distinct scattering signatures. Stage 2 introduces a multicollinearityaware, class-conditional correlation pruning scheme with formal guarantees on the induced Bayes risk and approximation error, thereby reducing redundancy while retaining class-discriminative information. Stage 3 trains a compact neural network on the pruned feature set to detect the presence or absence of cross-$β$ fibrillar ordering. The top-performing model, optimized with a composite loss combining Focal and Dice objectives, attains a test F1-score of 84.30% using 11 of 211 candidate features and 174 trainable parameters. The overall framework yields an interpretable, theory-grounded strategy for data-limited classification problems involving correlated, high-dimensional experimental measurements, exemplified here by X-ray scattering profiles of neurodegenerative tissue.
Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making
Asmar, Dylan M., Kochenderfer, Mykel J.
Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provide valuable guidance but inherently vary in reliability. Existing methods for incorporating such advice typically assume static and known suggester quality parameters, limiting practical deployment. We introduce a framework that dynamically learns and adapts to varying suggester reliability in partially observable environments. First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions through Bayesian inference over suggester types. Second, we introduce an explicit ``ask'' action allowing agents to strategically request suggestions at critical moments, balancing informational gains against acquisition costs. Experimental evaluation demonstrates robust performance across varying suggester qualities, adaptation to changing reliability, and strategic management of suggestion requests. This work provides a foundation for adaptive human-agent collaboration by addressing suggestion uncertainty in uncertain environments.
More Than Irrational: Modeling Belief-Biased Agents
Zhu, Yifan, Katt, Sammie, Kaski, Samuel
Despite the explosive growth of AI and the technologies built upon it, predicting and inferring the sub-optimal behavior of users or human collaborators remains a critical challenge. In many cases, such behaviors are not a result of irrationality, but rather a rational decision made given inherent cognitive bounds and biased beliefs about the world. In this paper, we formally introduce a class of computational-rational (CR) user models for cognitively-bounded agents acting optimally under biased beliefs. The key novelty lies in explicitly modeling how a bounded memory process leads to a dynamically inconsistent and biased belief state and, consequently, sub-optimal sequential decision-making. We address the challenge of identifying the latent user-specific bound and inferring biased belief states from passive observations on the fly. We argue that for our formalized CR model family with an explicit and parameterized cognitive process, this challenge is tractable. To support our claim, we propose an efficient online inference method based on nested particle filtering that simultaneously tracks the user's latent belief state and estimates the unknown cognitive bound from a stream of observed actions. We validate our approach in a representative navigation task using memory decay as an example of a cognitive bound. With simulations, we show that (1) our CR model generates intuitively plausible behaviors corresponding to different levels of memory capacity, and (2) our inference method accurately and efficiently recovers the ground-truth cognitive bounds from limited observations ($\le 100$ steps). We further demonstrate how this approach provides a principled foundation for developing adaptive AI assistants, enabling adaptive assistance that accounts for the user's memory limitations.
Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size?
Chen, Xuanyu, Yang, Nan, Wang, Shuai, Yuan, Dong
The recent success of large language models (LLMs) has sparked a growing interest in training large-scale models. As the model size continues to scale, concerns are growing about the depletion of high-quality, well-curated training data. This has led practitioners to explore training approaches like Federated Learning (FL), which can leverage the abundant data on edge devices while maintaining privacy. However, the decentralization of training datasets in FL introduces challenges to scaling large models, a topic that remains under-explored. This paper fills this gap and provides qualitative insights on generalizing the previous model scaling experience to federated learning scenarios. Specifically, we derive a P AC-Bayes (Probably Approximately Correct Bayesian) upper bound for the generalization error of models trained with stochastic algorithms in federated settings and quantify the impact of distributed training data on the optimal model size by finding the analytic solution of model size that minimizes this bound. Our theoretical results demonstrate that the optimal model size has a negative power law relationship with the number of clients if the total training compute is unchanged. Besides, we also find that switching to FL with the same training compute will inevitably reduce the upper bound of generalization performance that the model can achieve through training, and that estimating the optimal model size in federated scenarios should depend on the average training compute across clients. Furthermore, we also empirically validate the correctness of our results with extensive training runs on different models, network settings, and datasets.
Diffusion Models: A Mathematical Introduction
Maleki, Sepehr, Pourmoazemi, Negar
We present a concise, self-contained derivation of diffusion-based generative models. Starting from basic properties of Gaussian distributions (densities, quadratic expectations, re-parameterisation, products, and KL divergences), we construct denoising diffusion probabilistic models from first principles. This includes the forward noising process, its closed-form marginals, the exact discrete reverse posterior, and the related variational bound. This bound simplifies to the standard noise-prediction goal used in practice. We then discuss likelihood estimation and accelerated sampling, covering DDIM, adversarially learned reverse dynamics (DDGAN), and multi-scale variants such as nested and latent diffusion, with Stable Diffusion as a canonical example. A continuous-time formulation follows, in which we derive the probability-flow ODE from the diffusion SDE via the continuity and Fokker-Planck equations, introduce flow matching, and show how rectified flows recover DDIM up to a time re-parameterisation. Finally, we treat guided diffusion, interpreting classifier guidance as a posterior score correction and classifier-free guidance as a principled interpolation between conditional and unconditional scores. Throughout, the focus is on transparent algebra, explicit intermediate steps, and consistent notation, so that readers can both follow the theory and implement the corresponding algorithms in practice.
Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking
Heck, Linus, Macák, Filip, Češka, Milan, Junges, Sebastian
The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfi-ability solvers to handle the combinatorial nature of the problem and probabilistic model checking algorithms to handle the analysis of MDPs. Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis and the competitiveness with state-of-the-art methods for various fragments of the problem.