Bayesian Learning
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
Lee, Kyeongwon, Lin, Lizhen, Park, Jaewoo, Jeong, Seonghyun
This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.
Riemannian generative decoder
Bjerregaard, Andreas, Hauberg, Sรธren, Krogh, Anders
Riemannian representation learning typically relies on approximating densities on chosen manifolds. This involves optimizing difficult objectives, potentially harming models. To completely circumvent this issue, we introduce the Riemannian generative decoder which finds manifold-valued maximum likelihood latents with a Riemannian optimizer while training a decoder network. By discarding the encoder, we vastly simplify the manifold constraint compared to current approaches which often only handle few specific manifolds. We validate our approach on three case studies -- a synthetic branching diffusion process, human migrations inferred from mitochondrial DNA, and cells undergoing a cell division cycle -- each showing that learned representations respect the prescribed geometry and capture intrinsic non-Euclidean structure. Our method requires only a decoder, is compatible with existing architectures, and yields interpretable latent spaces aligned with data geometry.
On the efficacy of old features for the detection of new bots
De Nicola, Rocco, Petrocchi, Marinella, Pratelli, Manuel
For more than a decade now, academicians and online platform administrators have been studying solutions to the problem of bot detection. Bots are computer algorithms whose use is far from being benign: malicious bots are purposely created to distribute spam, sponsor public characters and, ultimately, induce a bias within the public opinion. To fight the bot invasion on our online ecosystem, several approaches have been implemented, mostly based on (supervised and unsupervised) classifiers, which adopt the most varied account features, from the simplest to the most expensive ones to be extracted from the raw data obtainable through the Twitter public APIs. In this exploratory study, using Twitter as a benchmark, we compare the performances of four state-of-art feature sets in detecting novel bots: one of the output scores of the popular bot detector Botometer, which considers more than 1,000 features of an account to take a decision; two feature sets based on the account profile and timeline; and the information about the Twitter client from which the user tweets. The results of our analysis, conducted on six recently released datasets of Twitter accounts, hint at the possible use of general-purpose classifiers and cheap-to-compute account features for the detection of evolved bots.
Tagged for Direction: Pinning Down Causal Edge Directions with Precision
Busch, Florian Peter, Willig, Moritz, Guldan, Florian, Kersting, Kristian, Dhami, Devendra Singh
Not every causal relation between variables is equal, and this can be leveraged for the task of causal discovery. Recent research shows that pairs of variables with particular type assignments induce a preference on the causal direction of other pairs of variables with the same type. Although useful, this assignment of a specific type to a variable can be tricky in practice. We propose a tag-based causal discovery approach where multiple tags are assigned to each variable in a causal graph. Existing causal discovery approaches are first applied to direct some edges, which are then used to determine edge relations between tags. Then, these edge relations are used to direct the undirected edges. Doing so improves upon purely type-based relations, where the assumption of type consistency lacks robustness and flexibility due to being restricted to single types for each variable. Our experimental evaluations show that this boosts causal discovery and that these high-level tag relations fit common knowledge.
Low-Cost Infrastructure-Free 3D Relative Localization with Sub-Meter Accuracy in Near Field
Gao, Qiangsheng, Cheng, Ka Ho, Qiu, Li, Gong, Zijun
--Relative localization in the near-field scenario is critically important for unmanned vehicle (UxV) applications. Although related works addressing 2D relative localization problem have been widely studied for unmanned ground vehicles (UGVs), the problem in 3D scenarios for unmanned aerial vehicles (UA Vs) involves more uncertainties and remains to be investigated. Inspired by the phenomenon that animals can achieve swarm behaviors solely based on individual perception of relative information, this study proposes an infrastructure-free 3D relative localization framework that relies exclusively on onboard ultra-wideband (UWB) sensors. Leveraging 2D relative positioning research, we conducted feasibility analysis, system modeling, simulations, performance evaluation, and field tests using UWB sensors. The key contributions of this work include: derivation of the Cram er-Rao lower bound (CRLB) and geometric dilution of precision (GDOP) for near-field scenarios; development of two localization algorithms - one based on Euclidean distance matrix (EDM) and another employing maximum likelihood estimation (MLE); comprehensive performance comparison and computational complexity analysis against state-of-the-art methods; simulation studies and field experiments; a novel sensor deployment strategy inspired by animal behavior, enabling single-sensor implementation within the proposed framework for UxV applications. The theoretical, simulation, and experimental results demonstrate strong generalizability to other 3D near-field localization tasks, with significant potential for a cost-effective cross-platform UxV collaborative system. I. INTRODUCTION Precise localization is essential in diverse domains, including multi-agent robotic systems, the Internet of Things, intelligent vehicular networks, and logistics [1]-[3].
Bayesian Evolutionary Swarm Architecture: A Formal Epistemic System Grounded in Truth-Based Competition
We introduce a mathematically rigorous framework for an artificial intelligence system composed of probabilistic agents evolving through structured competition and belief revision. The architecture, grounded in Bayesian inference, measure theory, and population dynamics, defines agent fitness as a function of alignment with a fixed external oracle representing ground truth. Agents compete in a discrete-time environment, adjusting posterior beliefs through observed outcomes, with higher-rated agents reproducing and lower-rated agents undergoing extinction. Ratings are updated via pairwise truth-aligned utility comparisons, and belief updates preserve measurable consistency and stochastic convergence. We introduce hash-based cryptographic identity commitments to ensure traceability, alongside causal inference operators using do-calculus. Formal theorems on convergence, robustness, and evolutionary stability are provided. The system establishes truth as an evolutionary attractor, demonstrating that verifiable knowledge arises from adversarial epistemic pressure within a computable, self-regulating swarm.
Flatness After All?
Shoham, Neta, Mor-Yosef, Liron, Avron, Haim
Recent literature has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized networks. A key observation is that "flat" minima tend to generalize better than "sharp" minima. While this idea is supported by empirical evidence, it has also been shown that deep networks can generalize even with arbitrary sharpness, as measured by either the trace or the spectral norm of the Hessian. In this paper, we argue that generalization could be assessed by measuring flatness using a soft rank measure of the Hessian. We show that when the common neural network model (neural network with exponential family negative log likelihood loss) is calibrated, and its prediction error and its confidence in the prediction are not correlated with the first and the second derivatives of the network's output, our measure accurately captures the asymptotic expected generalization gap. For non-calibrated models, we connect our flatness measure to the well-known Takeuchi Information Criterion and show that it still provides reliable estimates of generalization gaps for models that are not overly confident. Experimental results indicate that our approach offers a robust estimate of the generalization gap compared to baselines.
Predicting Stock Market Crash with Bayesian Generalised Pareto Regression
This paper develops a Bayesian Generalised Pareto Regression (GPR) model to forecast extreme losses in Indian equity markets, with a focus on the Nifty 50 index. Extreme negative returns, though rare, can cause significant financial disruption, and accurate modelling of such events is essential for effective risk management. Traditional Generalised Pareto Distribution (GPD) models often ignore market conditions; in contrast, our framework links the scale parameter to covariates using a log-linear function, allowing tail risk to respond dynamically to market volatility. We examine four prior choices for Bayesian regularisation of regression coefficients: Cauchy, Lasso (Laplace), Ridge (Gaussian), and Zellner's g-prior. Simulation results suggest that the Cauchy prior delivers the best trade-off between predictive accuracy and model simplicity, achieving the lowest RMSE, AIC, and BIC values. Empirically, we apply the model to large negative returns (exceeding 5%) in the Nifty 50 index. Volatility measures from the Nifty 50, S&P 500, and gold are used as covariates to capture both domestic and global risk drivers. Our findings show that tail risk increases significantly with higher market volatility. In particular, both S&P 500 and gold volatilities contribute meaningfully to crash prediction, highlighting global spillover and flight-to-safety effects. The proposed GPR model offers a robust and interpretable approach for tail risk forecasting in emerging markets. It improves upon traditional EVT-based models by incorporating real-time financial indicators, making it useful for practitioners, policymakers, and financial regulators concerned with systemic risk and stress testing.
Quantifying Uncertainty in the Presence of Distribution Shifts
Slavutsky, Yuli, Blei, David M.
Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for uncertainty estimation that explicitly accounts for covariate shifts. While conventional approaches rely on fixed priors, the key idea of our method is an adaptive prior, conditioned on both training and new covariates. This prior naturally increases uncertainty for inputs that lie far from the training distribution in regions where predictive performance is likely to degrade. To efficiently approximate the resulting posterior predictive distribution, we employ amortized variational inference. Finally, we construct synthetic environments by drawing small bootstrap samples from the training data, simulating a range of plausible covariate shift using only the original dataset. We evaluate our method on both synthetic and real-world data. It yields substantially improved uncertainty estimates under distribution shifts.
SING: SDE Inference via Natural Gradients
Hu, Amber, Smith, Henry, Linderman, Scott
Latent stochastic differential equation (SDE) models are important tools for the unsupervised discovery of dynamical systems from data, with applications ranging from engineering to neuroscience. In these complex domains, exact posterior inference of the latent state path is typically intractable, motivating the use of approximate methods such as variational inference (VI). However, existing VI methods for inference in latent SDEs often suffer from slow convergence and numerical instability. Here, we propose SDE Inference via Natural Gradients (SING), a method that leverages natural gradient VI to efficiently exploit the underlying geometry of the model and variational posterior. SING enables fast and reliable inference in latent SDE models by approximating intractable integrals and parallelizing computations in time. We provide theoretical guarantees that SING will approximately optimize the intractable, continuous-time objective of interest. Moreover, we demonstrate that better state inference enables more accurate estimation of nonlinear drift functions using, for example, Gaussian process SDE models. SING outperforms prior methods in state inference and drift estimation on a variety of datasets, including a challenging application to modeling neural dynamics in freely behaving animals. Altogether, our results illustrate the potential of SING as a tool for accurate inference in complex dynamical systems, especially those characterized by limited prior knowledge and non-conjugate structure.