AITopics

2501.17991

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Ishfaq, Haque, Wang, Guangyuan, Islam, Sami Nur, Precup, Doina

Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

arXiv.org Artificial IntelligenceJan-29-2025

Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample efficiency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based $Q$ updates, parallel tempering for exploring multiple modes of the posterior of the $Q$ function, and diffusion synthesized state-action samples regularized with $Q$ action gradients. Our extensive experiments demonstrate that LSAC outperforms or matches the performance of mainstream model-free RL algorithms for continuous control tasks. Notably, LSAC marks the first successful application of an LMC based Thompson sampling in continuous control tasks with continuous action spaces.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2501.17827

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Fuentes, Erick, Strader, Jared, Fahnestock, Ethan, Roy, Nicholas

Belief Roadmaps with Uncertain Landmark Evanescence

arXiv.org Artificial IntelligenceJan-29-2025

We would like a robot to navigate to a goal location while minimizing state uncertainty. To aid the robot in this endeavor, maps provide a prior belief over the location of objects and regions of interest. To localize itself within the map, a robot identifies mapped landmarks using its sensors. However, as the time between map creation and robot deployment increases, portions of the map can become stale, and landmarks, once believed to be permanent, may disappear. We refer to the propensity of a landmark to disappear as landmark evanescence. Reasoning about landmark evanescence during path planning, and the associated impact on localization accuracy, requires analyzing the presence or absence of each landmark, leading to an exponential number of possible outcomes of a given motion plan. To address this complexity, we develop BRULE, an extension of the Belief Roadmap. During planning, we replace the belief over future robot poses with a Gaussian mixture which is able to capture the effects of landmark evanescence. Furthermore, we show that belief updates can be made efficient, and that maintaining a random subset of mixture components is sufficient to find high quality solutions. We demonstrate performance in simulated and real-world experiments. Software is available at https://bit.ly/BRULE.

artificial intelligence, belief revision, machine learning, (16 more...)

2501.17982

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Boucher, Rémy Hosseinkhan, Semeraro, Onofrio, Mathelin, Lionel

Increasing Information for Model Predictive Control with Semi-Markov Decision Processes

arXiv.org Artificial IntelligenceJan-28-2025

Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2501.17256

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Boucher, Rémy Hosseinkhan, Semeraro, Onofrio, Mathelin, Lionel

Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning

arXiv.org Artificial IntelligenceJan-28-2025

The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.

artificial intelligence, machine learning, proceedings, (14 more...)

2501.17115

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.65)

arXiv.org Artificial IntelligenceJan-28-2025

Dream to Drive with Predictive Individual World Model

Gao, Yinfeng, Zhang, Qichao, Ding, Da-wei, Zhao, Dongbin

It is still a challenging topic to make reactive driving behaviors in complex urban environments as road users' intentions are unknown. Model-based reinforcement learning (MBRL) offers great potential to learn a reactive policy by constructing a world model that can provide informative states and imagination training. However, a critical limitation in relevant research lies in the scene-level reconstruction representation learning, which may overlook key interactive vehicles and hardly model the interactive features among vehicles and their long-term intentions. Therefore, this paper presents a novel MBRL method with a predictive individual world model (PIWM) for autonomous driving. PIWM describes the driving environment from an individual-level perspective and captures vehicles' interactive relations and their intentions via trajectory prediction task. Meanwhile, a behavior policy is learned jointly with PIWM. It is trained in PIWM's imagination and effectively navigates in the urban driving scenes leveraging intention-aware latent states. The proposed method is trained and evaluated on simulation environments built upon real-world challenging interactive scenarios. Compared with popular model-free and state-of-the-art model-based reinforcement learning methods, experimental results show that the proposed method achieves the best performance in terms of safety and efficiency.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

doi: 10.1109/TIV.2024.3408830.

2501.16733

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.90)
Automobiles & Trucks (0.90)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Neural Information Processing SystemsJan-27-2025, 16:55:42 GMT

Reviews: Pseudo-Extended Markov chain Monte Carlo

Update: I have read the author response and am satisfied with the commitment to elaborate on \beta and \pi and to simplify the Stan PE code with a "pseudo-extended" function. This paper presents a new MCMC sampling method called pseudo-extended MCMC that uses an instrumental distribution to projects the data into a higher-dimensional space where the modes are connected, making it easier for the sampler to mix. A default instrumental distribution based on tempering is provided. The method is compared to existing baselines showing efficacy on three benchmark datasets. The paper is well-placed within the existing literature.

experiment, instrumental distribution, pseudo-extended markov chain monte carlo, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Neural Information Processing SystemsJan-27-2025, 16:55:30 GMT

Reviews: Pseudo-Extended Markov chain Monte Carlo

Reviewers reached consensus that the paper makes a valuable contribution for MCMC. There are specific suggestions for improving the experiments that we ask the authors to seriously consider.

pseudo-extended markov chain monte carlo

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Neural Information Processing SystemsJan-27-2025, 09:29:53 GMT

Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms

Implementations of the exponential mechanism in differential privacy often require sampling from intractable distributions. When approximate procedures like Markov chain Monte Carlo (MCMC) are used, the end result incurs costs to both privacy and accuracy. Existing work has examined these effects asymptotically, but implementable finite sample results are needed in practice so that users can specify privacy budgets in advance and implement samplers with exact privacy guarantees. In this paper, we use tools from ergodic theory and perfect simulation to design exact finite runtime sampling algorithms for the exponential mechanism by introducing an intermediate modified target distribution using artificial atoms. We propose an additional modification of this sampling algorithm that maintains its \epsilon -DP guarantee and has improved runtime at the cost of some utility.

exact privacy guarantee, exponential mechanism, markov chain implementation, (4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)

Neural Information Processing SystemsJan-27-2025, 05:13:44 GMT

Review for NeurIPS paper: Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Summary and Contributions: The paper presents the concept of shared equilibrium in certain kinds of multi agent stochastic games with a restricted form of partial observability. The formalism includes the notion of supertypes (different distributions of agents) and types (where each agents is given a true type each episode). The agent's type influences the rewards available as does the joint state of the system and joint action over all agents. One key constraint is that all agents of the same type follow the same policy from an egocentric perspective (where they themselves are the focal agent and all other agents are interchangeable). They define a policy gradient approach for individual agents, also present a higher order learning rule that shifts the distribution over supertypes at a slower timescale.

agent, observable markov game, supertype, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)