AITopics

Robotic manipulation of deformable and fragile objects presents significant challenges, as excessive stress can lead to irreversible damage to the object. While existing solutions rely on accurate object models or specialized sensors and grippers, this adds complexity and often lacks generalization. To address this problem, we present a vision-based reinforcement learning approach that incorporates a stress-penalized reward to discourage damage to the object explicitly. In addition, to bootstrap learning, we incorporate offline demonstrations as well as a designed curriculum progressing from rigid proxies to deformables. We evaluate the proposed method in both simulated and real-world scenarios, showing that the policy learned in simulation can be transferred to the real world in a zero-shot manner, performing tasks such as picking up and pushing tofu. Our results show that the learned policies exhibit a damage-aware, gentle manipulation behavior, demonstrating their effectiveness by decreasing the stress applied to fragile objects by 36.5% while achieving the task goals, compared to vanilla RL policies.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2510.25405

Country: Europe (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zhang, Beiwen, Liang, Yongheng, Wu, Hejun

Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork

Multi-agent reinforcement learning (MARl) has achieved strong results in cooperative tasks but typically assumes fixed, fully controlled teams. Ad hoc teamwork (AHT) relaxes this by allowing collaboration with unknown partners, yet existing variants still presume shared conventions. We introduce Multil-party Ad Hoc Teamwork (MAHT), where controlled agents must coordinate with multiple mutually unfamiliar groups of uncontrolled teammates. To address this, we propose MARs, which builds a sparse skeleton graph and applies relational modeling to capture cross-group dvnamics. Experiments on MPE and starCralt ll show that MARs outperforms MARL and AHT baselines while converging faster.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2510.2534

Country:

North America > United States (0.94)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Scaling Up Bayesian DAG Sampling

Nikzad, Daniele, Zhilkin, Alexander, Harviainen, Juha, Kuipers, Jack, Moffa, Giusi, Koivisto, Mikko

Bayesian inference of Bayesian network structures is often performed by sampling directed acyclic graphs along an appropriately constructed Markov chain. We present two techniques to improve sampling. First, we give an efficient implementation of basic moves, which add, delete, or reverse a single arc. Second, we expedite summing over parent sets, an expensive task required for more sophisticated moves: we devise a preprocessing method to prune possible parent sets so as to approximately preserve the sums. Our empirical study shows that our techniques can yield substantial efficiency gains compared to previous methods.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2510.25254

Country: Europe > Finland (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Li, Zhufeng, Kilbertus, Niki

Graph Distance Based on Cause-Effect Estimands with Latents

Causal discovery aims to recover graphs that represent causal relations among given variables from observations, and new methods are constantly being proposed. Increasingly, the community raises questions about how much progress is made, because properly evaluating discovered graphs remains notoriously difficult, particularly under latent confounding. We propose a graph distance measure for acyclic directed mixed graphs (AD-MGs) based on the downstream task of cause-effect estimation under unobserved confounding. Our approach uses identification via fixing and a symbolic verifier to quantify how graph differences distort cause-effect esti-mands for different treatment-outcome pairs. We analyze the behavior of the measure under different graph perturbations and compare it against existing distance metrics.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2510.25037

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Software (0.67)

Ghosh, Riddhi Pratim, Barnett, Ian

Bayesian Neural Networks vs. Mixture Density Networks: Theoretical and Empirical Insights for Uncertainty-Aware Nonlinear Modeling

Modeling complex, non-linear, and uncertain relationships between input and output variables remains a central challenge in modern statistical learning and artificial intelligence. Traditional neural networks, trained via point estimation, have demonstrated remarkable success in a variety of domains but inherently provide deterministic predictions - that is, single-valued outputs without accompanying measures of uncertainty. This limitation becomes critical in domains characterized by limited, noisy, or ambiguous data, such as medicine, climate science, or finance, where quantifying uncertainty is as important as producing accurate predictions (Gal & Ghahramani, 2016; Kendall & Gal, 2017; Abdar et al., 2021). Bayesian Neural Networks (BNNs) provide a probabilistic extension of standard neural networks by treating weights and biases as random variables endowed with prior distributions (MacKay, 1992; Neal, 2012). Through Bayes' theorem, BNNs infer a posterior distribution over weights, allowing predictions to reflect epistemic uncertainty - the uncertainty arising from limited data and model knowledge. However, the exact posterior is analytically intractable for deep models, motivating approximate inference methods such as variational inference (Graves, 2011; Blundell et al., 2015) and Monte Carlo dropout (Gal & Ghahramani, 2016). Despite their appeal, these approaches may yield biased or overconfident posteriors due to restrictive variational families (Hern andez-Lobato & Adams, 2015a; Osband et al., 2023), often resulting in over-smoothed predictive distributions. An alternative paradigm for probabilistic modeling is the Mixture Density Network (MDN), introduced by Bridle (1990) and developed further by Jacobs et al. (1991).

artificial intelligence, bayesian inference, machine learning, (15 more...)

2510.25001

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

From Narrative to Action: A Hierarchical LLM-Agent Framework for Human Mobility Generation

Li, Qiumeng, Ji, Chunhou, Liu, Xinyue

Understanding and replicating human mobility requires not only spatial-temporal accuracy but also an awareness of the cognitive hierarchy underlying real-world travel decisions. Traditional agent-based or deep learning models can reproduce statistical patterns of movement but fail to capture the semantic coherence and causal logic of human behavior. Large language models (LLMs) show potential, but struggle to balance creative reasoning with strict structural compliance. This study proposes a Hierarchical LLM-Agent Framework, termed Narrative-to-Action, that integrates high-level narrative reasoning, mid-level reflective planning, and low-level behavioral execution within a unified cognitive hierarchy. At the macro level, one agent is employed as a "creative writer" to produce diary-style narratives rich in motivation and context, then uses another agent as a "structural parser" to convert narratives into machine-readable plans. A dynamic execution module further grounds agents in geographic environments and enables adaptive behavioral adjustments guided by a novel occupation-aware metric, Mobility Entropy by Occupation (MEO), which captures heterogeneous schedule flexibility across different occupational personalities. At the micro level, the agent executes concrete actions-selecting locations, transportation modes, and time intervals-through interaction with an environmental simulation. By embedding this multi-layer cognitive process, the framework produces not only synthetic trajectories that align closely with real-world patterns but also interpretable representations of human decision logic. This research advances synthetic mobility generation from a data-driven paradigm to a cognition-driven simulation, providing a scalable pathway for understanding, predicting, and synthesizing complex urban mobility behaviors through hierarchical LLM agents.

large language model, machine learning, natural language, (18 more...)

2510.24802

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Infrastructure & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Decoding non-invasive brain activity with novel deep-learning approaches

Csaky, Richard

This thesis delves into the world of non-invasive electrophysiological brain signals like electroencephalography (EEG) and magnetoencephalography (MEG), focusing on modelling and decoding such data. The research aims to investigate what happens in the brain when we perceive visual stimuli or engage in covert speech (inner speech) and enhance the decoding performance of such stimuli. The thesis is divided into two main sections, methodological and experimental work. A central concern in both sections is the large variability present in electrophysiological recordings, whether it be within-subject or between-subject variability, and to a certain extent between-dataset variability. In the methodological sections, we explore the potential of deep learning for brain decoding. We present advancements in decoding visual stimuli using linear models at the individual subject level. We then explore how deep learning techniques can be employed for group decoding, introducing new methods to deal with between-subject variability. Finally, we also explores novel forecasting models of MEG data based on convolutional and Transformer-based architectures. In particular, Transformer-based models demonstrate superior capabilities in generating signals that closely match real brain data, thereby enhancing the accuracy and reliability of modelling the brain's electrophysiology. In the experimental section, we present a unique dataset containing high-trial inner speech EEG, MEG, and preliminary optically pumped magnetometer (OPM) data. Our aim is to investigate different types of inner speech and push decoding performance by collecting a high number of trials and sessions from a few participants. However, the decoding results are found to be mostly negative, underscoring the difficulty of decoding inner speech.

artificial intelligence, high between-subject variability, machine learning, (18 more...)

2510.24733

Country:

North America > United States > Minnesota (0.27)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.17)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Liu, Zhaocheng, Yu, Zhiwen, Liu, Xiaoqing

Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning

The heterogeneity of multimodal data leads to inconsistencies and imbalance, allowing a dominant modality to steer gradient updates. Existing solutions mainly focus on optimization- or data-based strategies but rarely exploit the information inherent in multimodal imbalance or conduct its quantitative analysis. To address this gap, we propose a novel quantitative analysis framework for Multimodal Imbalance and design a sample-level adaptive loss function. We define the Modality Gap as the Softmax score difference between modalities for the correct class and model its distribution using a bimodal Gaussian Mixture Model(GMM), representing balanced and imbalanced samples. Using Bayes' theorem, we estimate each sample's posterior probability of belonging to these two groups. Based on this, our adaptive loss (1) minimizes the overall Modality Gap, (2) aligns imbalanced samples with balanced ones, and (3) adaptively penalizes each according to its imbalance degree. A two-stage training strategy-warm-up and adaptive phases,yields state-of-the-art performance on CREMA-D (80.65%), AVE (70.40%), and KineticSound (72.42%). Fine-tuning with high-quality samples identified by the GMM further improves results, highlighting their value for effective multimodal fusion.

artificial intelligence, machine learning, modality, (18 more...)

2510.21797

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Köbschall, Kirsten, Buschjäger, Sebastian, Fischer, Raphael, Hartung, Lisa, Kramer, Stefan

Lift What You Can: Green Online Learning with Heterogeneous Ensembles

Ensemble methods for stream mining necessitate managing multiple models and updating them as data distributions evolve. Considering the calls for more sustainability, established methods are however not sufficiently considerate of ensemble members' computational expenses and instead overly focus on predictive capabilities. To address these challenges and enable green online learning, we propose heterogeneous online ensembles (HEROS). For every training step, HEROS chooses a subset of models from a pool of models initialized with diverse hyperparameter choices under resource constraints to train. We introduce a Markov decision process to theoretically capture the trade-offs between predictive performance and sustainability constraints. Based on this framework, we present different policies for choosing which models to train on incoming data. Most notably, we propose the novel $ζ$-policy, which focuses on training near-optimal models at reduced costs. Using a stochastic model, we theoretically prove that our $ζ$-policy achieves near optimal performance while using fewer resources compared to the best performing policy. In our experiments across 11 benchmark datasets, we find empiric evidence that our $ζ$-policy is a strong contribution to the state-of-the-art, demonstrating highly accurate performance, in some cases even outperforming competitors, and simultaneously being much more resource-friendly.

artificial intelligence, data mining, machine learning, (15 more...)

2509.18962

Country:

North America > United States (0.68)
Europe > Germany (0.68)
South America > Brazil (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting > Online (1.00)
Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.54)
(2 more...)

Agrawal, Priyank, Agrawal, Shipra, Azati, Azmat

Q-learning with Posterior Sampling

Bayesian posterior sampling techniques have demonstrated superior empirical performance in many exploration-exploitation settings. However, their theoretical analysis remains a challenge, especially in complex settings like reinforcement learning. In this paper, we introduce Q-Learning with Posterior Sampling (PSQL), a simple Q-learning-based algorithm that uses Gaussian posteriors on Q-values for exploration, akin to the popular Thompson Sampling algorithm in the multi-armed bandit setting. We show that in the tabular episodic MDP setting, PSQL achieves a regret bound of $\tilde O(H^2\sqrt{SAT})$, closely matching the known lower bound of $Ω(H\sqrt{SAT})$. Here, S, A denote the number of states and actions in the underlying Markov Decision Process (MDP), and $T=KH$ with $K$ being the number of episodes and $H$ being the planning horizon. Our work provides several new technical insights into the core challenges in combining posterior sampling with dynamic programming and TD-learning-based RL algorithms, along with novel ideas for resolving those difficulties. We hope this will form a starting point for analyzing this efficient and important algorithmic technique in even more complex RL settings.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2506.00917

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)