mdn
Multimodal Scientific Learning Beyond Diffusions and Flows
Guilhoto, Leonardo Ferreira, Kaushal, Akshat, Perdikaris, Paris
Scientific machine learning (SciML) increasingly requires models that capture multimodal conditional uncertainty arising from ill-posed inverse problems, multistability, and chaotic dynamics. While recent work has favored highly expressive implicit generative models such as diffusion and flow-based methods, these approaches are often data-hungry, computationally costly, and misaligned with the structured solution spaces frequently found in scientific problems. We demonstrate that Mixture Density Networks (MDNs) provide a principled yet largely overlooked alternative for multimodal uncertainty quantification in SciML. As explicit parametric density estimators, MDNs impose an inductive bias tailored to low-dimensional, multimodal physics, enabling direct global allocation of probability mass across distinct solution branches. This structure delivers strong data efficiency, allowing reliable recovery of separated modes in regimes where scientific data is scarce. We formalize these insights through a unified probabilistic framework contrasting explicit and implicit distribution networks, and demonstrate empirically that MDNs achieve superior generalization, interpretability, and sample efficiency across a range of inverse, multistable, and chaotic scientific regression tasks.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Energy (0.46)
- Government > Regional Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Bayesian Neural Networks vs. Mixture Density Networks: Theoretical and Empirical Insights for Uncertainty-Aware Nonlinear Modeling
Ghosh, Riddhi Pratim, Barnett, Ian
Modeling complex, non-linear, and uncertain relationships between input and output variables remains a central challenge in modern statistical learning and artificial intelligence. Traditional neural networks, trained via point estimation, have demonstrated remarkable success in a variety of domains but inherently provide deterministic predictions - that is, single-valued outputs without accompanying measures of uncertainty. This limitation becomes critical in domains characterized by limited, noisy, or ambiguous data, such as medicine, climate science, or finance, where quantifying uncertainty is as important as producing accurate predictions (Gal & Ghahramani, 2016; Kendall & Gal, 2017; Abdar et al., 2021). Bayesian Neural Networks (BNNs) provide a probabilistic extension of standard neural networks by treating weights and biases as random variables endowed with prior distributions (MacKay, 1992; Neal, 2012). Through Bayes' theorem, BNNs infer a posterior distribution over weights, allowing predictions to reflect epistemic uncertainty - the uncertainty arising from limited data and model knowledge. However, the exact posterior is analytically intractable for deep models, motivating approximate inference methods such as variational inference (Graves, 2011; Blundell et al., 2015) and Monte Carlo dropout (Gal & Ghahramani, 2016). Despite their appeal, these approaches may yield biased or overconfident posteriors due to restrictive variational families (Hern andez-Lobato & Adams, 2015a; Osband et al., 2023), often resulting in over-smoothed predictive distributions. An alternative paradigm for probabilistic modeling is the Mixture Density Network (MDN), introduced by Bridle (1990) and developed further by Jacobs et al. (1991).
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
Zhu, Yuchen, Guo, Wei, Choi, Jaemoo, Liu, Guan-Horng, Chen, Yongxin, Tao, Molei
We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $π\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose $\textbf{M}$asked $\textbf{D}$iffusion $\textbf{N}$eural $\textbf{S}$ampler ($\textbf{MDNS}$), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. We validate the efficiency and scalability of MDNS through extensive experiments on various distributions with distinct statistical properties, where MDNS learns to accurately sample from the target distributions despite the extremely high problem dimensions and outperforms other learning-based baselines by a large margin. A comprehensive study of ablations and extensions is also provided to demonstrate the efficacy and potential of the proposed framework.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
A Comprehensive Framework for Uncertainty Quantification of Voxel-wise Supervised Models in IVIM MRI
Casali, Nicola, Brusaferri, Alessandro, Baselli, Giuseppe, Fumagalli, Stefano, Micotti, Edoardo, Forloni, Gianluigi, Hussein, Riaz, Rizzo, Giovanna, Mastropietro, Alfonso
Accurate estimation of intravoxel incoherent motion (IVIM) parameters from diffusion-weighted MRI remains challenging due to the ill-posed nature of the inverse problem and high sensitivity to noise, particularly in the perfusion compartment. In this work, we propose a probabilistic deep learning framework based on Deep Ensembles (DE) of Mixture Density Networks (MDNs), enabling estimation of total predictive uncertainty and decomposition into aleatoric (AU) and epistemic (EU) components. The method was benchmarked against non probabilistic neural networks, a Bayesian fitting approach and a probabilistic network with single Gaussian parametrization. Supervised training was performed on synthetic data, and evaluation was conducted on both simulated and an in vivo dataset. The reliability of the quantified uncertainties was assessed using calibration curves, output distribution sharpness, and the Continuous Ranked Probability Score (CRPS). MDNs produced more calibrated and sharper predictive distributions for the diffusion coefficient D and fraction f parameters, although slight overconfidence was observed in pseudo-diffusion coefficient D*. The Robust Coefficient of Variation (RCV) indicated smoother in vivo estimates for D* with MDNs compared to Gaussian model. Despite the training data covering the expected physiological range, elevated EU in vivo suggests a mismatch with real acquisition conditions, highlighting the importance of incorporating EU, which was allowed by DE. Overall, we present a comprehensive framework for IVIM fitting with uncertainty quantification, which enables the identification and interpretation of unreliable estimates. The proposed approach can also be adopted for fitting other physical models through appropriate architectural and simulation adjustments.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Italy (0.05)
- Europe > Switzerland (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.69)
- Health & Medicine > Therapeutic Area > Neurology (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Interaction Techniques that Encourage Longer Prompts Can Improve Psychological Ownership when Writing with AI
Writing longer prompts for an AI assistant to generate a short story increases psychological ownership, a user's feeling that the writing belongs to them. To encourage users to write longer prompts, we evaluated two interaction techniques that modify the prompt entry interface of chat-based generative AI assistants: pressing and holding the prompt submission button, and continuously moving a slider up and down when submitting a short prompt. A within-subjects experiment investigated the effects of such techniques on prompt length and psychological ownership, and results showed that these techniques increased prompt length and led to higher psychological ownership than baseline techniques. A second experiment further augmented these techniques by showing AI-generated suggestions for how the prompts could be expanded. This further increased prompt length, but did not lead to improvements in psychological ownership. Our results show that simple interface modifications like these can elicit more writing from users and improve psychological ownership.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.14)
- North America > United States > New York > New York County > New York City (0.06)
- (12 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)
Expectations, Explanations, and Embodiment: Attempts at Robot Failure Recovery
Yadollahi, Elmira, Dogan, Fethiye Irmak, Zhang, Yujing, Nogueira, Beatriz, Guerreiro, Tiago, Tzedek, Shelly Levy, Leite, Iolanda
Expectations critically shape how people form judgments about robots, influencing whether they view failures as minor technical glitches or deal-breaking flaws. This work explores how high and low expectations, induced through brief video priming, affect user perceptions of robot failures and the utility of explanations in HRI. We conducted two online studies ( N = 600 total participants); each replicated two robots with different embodiments, Furhat and Pepper. In our first study, grounded in expectation theory, participants were divided into two groups, one primed with positive and the other with negative expectations regarding the robot's performance, establishing distinct expectation frameworks. This validation study aimed to verify whether the videos could reliably establish low and high-expectation profiles. In the second study, participants were primed using the validated videos and then viewed a new scenario in which the robot failed at a task. Half viewed a version where the robot explained its failure, while the other half received no explanation. We found that explanations significantly improved user perceptions of Furhat, especially when participants were primed to have lower expectations. Explanations boosted satisfaction and enhanced the robot's perceived expressiveness, indicating that effectively communicat-Authors contributed equally. By contrast, Pepper's explanations produced minimal impact on user attitudes, suggesting that a robot's embodiment and style of interaction could determine whether explanations can successfully offset negative impressions. Together, these findings underscore the need to consider users' expectations when tailoring explanation strategies in HRI. When expectations are initially low, a cogent explanation can make the difference between dismissing a failure and appreciating the robot's transparency and effort to communicate. Keywords: Expectations, Explanations, Explainability, Human-Robot Interaction, Priming 1. Introduction When robots operate in human environments, user expectations play a crucial role in shaping human-robot interaction (HRI) (Lohse, 2009; Horstmann and Kr amer, 2020; Dogan et al., 2025). However, there is often a mismatch between these expectations and the actual capabilities of social robots (Ros en et al., 2022), which can lead to disappointment and, consequently, diminish the quality of interactions (Olson et al., 1996; Kruglanski and Sleeth-Keppler, 2007). For instance, a user might expect robots to function as proactive and autonomous assistants, yet when robots make mistakes due to their limited abilities, this mismatch can undermine the robot's perceived trustworthiness and competence (Salem et al., 2015; Cha et al., 2015).
- North America > United States > New York > New York County > New York City (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Hyperparameter Optimisation with Practical Interpretability and Explanation Methods in Probabilistic Curriculum Learning
Salt, Llewyn, Gallagher, Marcus
Hyperparameter optimisation (HPO) is crucial for achieving strong performance in reinforcement learning (RL), as RL algorithms are inherently sensitive to hyperparameter settings. Probabilistic Curriculum Learning (PCL) is a curriculum learning strategy designed to improve RL performance by structuring the agent's learning process, yet effective hyperparameter tuning remains challenging and computationally demanding. In this paper, we provide an empirical analysis of hyperparameter interactions and their effects on the performance of a PCL algorithm within standard RL tasks, including point-maze navigation and DC motor control. Using the AlgOS framework integrated with Optuna's Tree-Structured Parzen Estimator (TPE), we present strategies to refine hyperparameter search spaces, enhancing optimisation efficiency. Additionally, we introduce a novel SHAP-based interpretability approach tailored specifically for analysing hyperparameter impacts, offering clear insights into how individual hyperparameters and their interactions influence RL performance. Our work contributes practical guidelines and interpretability tools that significantly improve the effectiveness and computational feasibility of hyperparameter optimisation in reinforcement learning.
- Oceania > Australia > Queensland > Brisbane (0.04)
- Asia > Middle East > Jordan (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The basic idea of this paper is to replace the MCGSM (mixture of conditional Gaussian scale mixtures) of [38] with a version where a continuous-valued hidden state vector h is maintained in a LSTM. This is used as a model of natural images and assessed by a density estimation task (secs 3.2 and 3.3, Tables 1-3), and for texture synthesis and inpainting (sec 3.4). The model for p(x_ij h_ij) (l 150) is in fact not specified at all. Given that h is a continuous-valued vector (es per eq 6) we need to see some functional form. RNADE [41] is designed for fixed-length vectors.
Reviews: Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation
The most original part of the paper is Proposition 1, which is quite interesting. However, I have some doubts regarding the assumptions leading to formula (2). As explained in the appendix, this formula holds if q_theta is complex enough to make so that the KL distance is zero. Now, in a realistic example and with finite sample size, q_theta can't be very complex, otherwise it would over-fit. Hence, (2) holds only approximately.