AITopics

2506.22004

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > California > Los Angeles County (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.47)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Zuffer, Amara, Burke, Michael, Harandi, Mehrtash

Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review

arXiv.org Artificial IntelligenceJun-30-2025

The diversity of tasks and dynamic nature of reinforcement learning (RL) require RL agents to be able to learn sequentially and continuously, a learning paradigm known as continuous reinforcement learning. This survey reviews how continual learning transforms RL agents into dynamic continual learners. This enables RL agents to acquire and retain useful and reusable knowledge seamlessly. The paper delves into fundamental aspects of continual reinforcement learning, exploring key concepts, significant challenges, and novel methodologies. Special emphasis is placed on recent advancements in continual reinforcement learning within robotics, along with a succinct overview of evaluation environments utilized in prominent research, facilitating accessibility for newcomers to the field. The review concludes with a discussion on limitations and promising future directions, providing valuable insights for researchers and practitioners alike.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2506.21899

Country:

Oceania > Australia > Victoria (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Oceania > New Zealand (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Chen, Shuze, Peng, Tianyi

Multi-agent Markov Entanglement

arXiv.org Machine LearningJun-30-2025

Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state $(s_1,s_2,\ldots,s_N)$ is often approximated as the sum of local functions: $V(s_1,s_2,\ldots,s_N)\approx\sum_{i=1}^N V_i(s_i)$. This approach traces back to the index policy in restless multi-armed bandit problems and has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a multi-agent Markov decision process (MDP) permits value decomposition if and only if its transition matrix is not "entangled" -- a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from how physicists measure quantum entanglement, we introduce how to measure the "Markov entanglement" for multi-agent MDPs and show that this measure can be used to bound the decomposition error in general multi-agent MDPs. Using the concept of Markov entanglement, we proved that a widely-used class of index policies is weakly entangled and enjoys a sublinear $\mathcal O(\sqrt{N})$ scale of decomposition error for $N$-agent systems. Finally, we show how Markov entanglement can be efficiently estimated in practice, providing practitioners with an empirical proxy for the quality of value decomposition.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Machine Learning

2506.02385

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Kenya (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceJun-30-2025

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Sun, Qiushi, Liu, Zhoumianze, Ma, Chang, Ding, Zichen, Xu, Fangzhi, Yin, Zhangyue, Zhao, Haiteng, Wu, Zhenyu, Cheng, Kanzhi, Liu, Zhaoyang, Wang, Jianing, Li, Qintong, Tang, Xiangru, Xie, Tianbao, Feng, Xiachong, Li, Xiang, Kao, Ben, Wang, Wenhai, Qi, Biqing, Kong, Lingpeng, Wu, Zhiyong

Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers' workflows. Recognizing the transformative potential of these agents, we introduce ScienceBoard, which encompasses two complementary contributions: (i) a realistic, multi-domain environment featuring dynamic and visually rich scientific workflows with integrated professional software, where agents can autonomously interact via different interfaces to accelerate complex research tasks and experiments; and (ii) a challenging benchmark of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. Extensive evaluations of agents with state-of-the-art backbones (e.g., GPT-4o, Claude 3.7, UI-TARS) show that, despite some promising results, they still fall short of reliably assisting scientists in complex workflows, achieving only a 15% overall success rate. In-depth analysis further provides valuable insights for addressing current agent limitations and more effective design principles, paving the way to build more capable agents for scientific discovery. Our code, environment, and benchmark are at https://qiushisun.github.io/ScienceBoard-Home/.

large language model, machine learning, natural language, (21 more...)

2505.19897

Country: Asia > China (0.46)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation

Sun, Chenkai, Zhang, Denghui, Zhai, ChengXiang, Ji, Heng

Given the growing influence of language model-based agents on high-stakes societal decisions, from public policy to healthcare, ensuring their beneficial impact requires understanding the far-reaching implications of their suggestions. We propose a proof-of-concept framework that projects how model-generated advice could propagate through societal systems on a macroscopic scale over time, enabling more robust alignment. To assess the long-term safety awareness of language models, we also introduce a dataset of 100 indirect harm scenarios, testing models' ability to foresee adverse, non-obvious outcomes from seemingly harmless user prompts. Our approach achieves not only over 20% improvement on the new dataset but also an average win rate exceeding 70% against strong baselines on existing safety benchmarks (AdvBench, SafeRLHF, WildGuardMix), suggesting a promising direction for safer agents.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2506.20949

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zurek, Matthew, Chen, Yudong

Faster Fixed-Point Methods for Multichain MDPs

arXiv.org Machine LearningJun-27-2025

We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2506.2091

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Titsias, Michalis K., Alexopoulos, Angelos, Liu, Siran, Dellaportas, Petros

Gaussian Invariant Markov Chain Monte Carlo

arXiv.org Machine LearningJun-27-2025

We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

2506.21511

Country:

Europe > Greece (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Raja, Sanjeev, Šípka, Martin, Psenka, Michael, Kreiman, Tobias, Pavelka, Michal, Krishnapriyan, Aditi S.

Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional

Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific approaches in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model's original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability. Code is available at github.com/ASK-Berkeley/OM-TPS.

artificial intelligence, machine learning, natural language, (14 more...)

2504.18506

Country: North America > United States (0.46)

Genre: Research Report (0.63)

Industry:

Energy (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Education (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Zhu, Hao, Yan, Shengchao, Hoffmann, Jasper, Boedecker, Joschka

Multi-convex Programming for Discrete Latent Factor Models Prototyping

Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In this paper, we propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs, including both regression and classification models, within a very short script. Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors, such that the users can easily prototype the DLFM structure according to their dataset and application scenario. We introduce our open-source Python implementation and illustrate the framework in several examples.

artificial intelligence, machine learning, optimization problem, (17 more...)

2504.01431

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Lu, Ziyang, Gursoy, M. Cenk, Mohan, Chilukuri K., Varshney, Pramod K.

Learning-Based Resource Management in Integrated Sensing and Communication Systems

-- In this paper, we tackle the task of adaptive time allocation in integrated sensing and communication systems equipped with radar and communication units. The dual-functional radar-communication system's task involves allocating dwell times for tracking multiple targets and utilizing the remaining time for data transmission towards estimated target locations. We introduce a novel constrained deep reinforcement learning (CDRL) approach, designed to optimize resource allocation between tracking and communication under time budget constraints, thereby enhancing target communication quality. Our numerical results demonstrate the efficiency of our proposed CDRL framework, confirming its ability to maximize communication quality in highly dynamic environments while adhering to time constraints. A. Background 1) Cognitive Radar: Radar technology, integral to various applications in environmental sensing, space exploration, navigation, and traffic control, has become increasingly important with the emergence of autonomous vehicles and drones.

communication, machine learning, reinforcement learning, (19 more...)

2506.20849

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)