AITopics

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Specific Architectures (0.40)

Neural Information Processing SystemsFeb-12-2026, 03:26:09 GMT

4f284803bd0966cc24fa8683a34afc6e-AuthorFeedback.pdf

bacon, optimization technique, policy gradient, (16 more...)

Technology: Information Technology > Artificial Intelligence (0.33)

De Cooman, Bram, Suykens, Johan

Learning to Drive Safely with Hybrid Options

arXiv.org Artificial IntelligenceOct-29-2025

That is surprising, as this framework is naturally suited for hierarchical control applications in general, and autonomous driving tasks in specific. Therefore, in this work the options framework is applied and tailored to autonomous driving tasks on highways. More specifically, we define dedicated options for longitudinal and lateral manoeuvres with embedded safety and comfort constraints. This way, prior domain knowledge can be incorporated into the learning process and the learned driving behaviour can be constrained more easily. We propose several setups for hierarchical control with options and derive practical algorithms following state-of-the-art reinforcement learning techniques. By separately selecting actions for longitudinal and lateral control, the introduced policies over combined and hybrid options obtain the same expressiveness and flexibility that human drivers have, while being easier to interpret than classical policies over continuous actions. Of all the investigated approaches, these flexible policies over hybrid options perform the best under varying traffic conditions, outperforming the baseline policies over actions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.24674

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.88)

Shangtong Zhang, Shimon Whiteson

DAC: The Double Actor-Critic Architecture for Learning Options

Neural Information Processing SystemsOct-2-2025, 17:03:26 GMT

Temporal abstraction (i.e., hierarchy) is a key component in reinforcement learning (RL).

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Specific Architectures (0.40)

Neural Information Processing SystemsOct-2-2025, 17:03:12 GMT

4f284803bd0966cc24fa8683a34afc6e-AuthorFeedback.pdf

artificial intelligence, bacon, policy gradient, (17 more...)

Technology: Information Technology > Artificial Intelligence (0.33)

arXiv.org Artificial IntelligenceSep-4-2025

Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning

Wu, Zida, Lauriere, Mathieu, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur

Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the population is subject to common noise. In this paper, we introduce an efficient deep reinforcement learning (DRL) algorithm designed to achieve population-dependent Nash equilibria without relying on averaging or historical sampling, inspired by Munchausen RL and Online Mirror Descent. The resulting policy is adaptable to various initial distributions and sources of common noise. Through numerical experiments on seven canonical examples, we demonstrate that our algorithm exhibits superior convergence properties compared to state-of-the-art algorithms, particularly a DRL version of Fictitious Play for population-dependent policies. The performance in the presence of common noise underscores the robustness and adaptability of our approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2509.0303

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Galland, Lucie, Pelachaud, Catherine, Pecune, Florian

Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager

arXiv.org Artificial IntelligenceJul-9-2025

In this work, we propose a novel framework that integrates large language models (LLMs) with an RL-based dialogue manager for open-ended dialogue with a specific goal. By leveraging hierarchical reinforcement learning to model the structured phases of dialogue and employ meta-learning to enhance adaptability across diverse user profiles, our approach enhances adaptability and efficiency, enabling the system to learn from limited data, transition fluidly between dialogue phases, and personalize responses to heterogeneous patient needs. We apply our framework to Motivational Interviews, aiming to foster behavior change, and demonstrate that the proposed dialogue manager outperforms a state-of-the-art LLM baseline in terms of reward, showing a potential benefit of conditioning LLMs to create open-ended dialogue systems with specific goals.

artificial intelligence, large language model, natural language, (15 more...)

2506.19652

Country: Europe > France (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Health & Medicine > Consumer Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

arXiv.org Artificial IntelligenceOct-24-2024

Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense

Singh, Aditya Vikram, Rathbun, Ethan, Graham, Emma, Oakley, Lisa, Boboila, Simona, Oprea, Alina, Chin, Peter

Recent advances in multi-agent reinforcement learning (MARL) have created opportunities to solve complex real-world tasks. Cybersecurity is a notable application area, where defending networks against sophisticated adversaries remains a challenging task typically performed by teams of security operators. In this work, we explore novel MARL strategies for building autonomous cyber network defenses that address challenges such as large policy spaces, partial observability, and stealthy, deceptive adversarial strategies. To facilitate efficient and generalized learning, we propose a hierarchical Proximal Policy Optimization (PPO) architecture that decomposes the cyber defense task into specific sub-tasks like network investigation and host recovery. Our approach involves training sub-policies for each sub-task using PPO enhanced with domain expertise. These sub-policies are then leveraged by a master defense policy that coordinates their selection to solve complex network defense tasks. Furthermore, the sub-policies can be fine-tuned and transferred with minimal cost to defend against shifts in adversarial behavior or changes in network settings. We conduct extensive experiments using CybORG Cage 4, the state-of-the-art MARL environment for cyber defense. Comparisons with multiple baselines across different adversaries show that our hierarchical learning approach achieves top performance in terms of convergence speed, episodic return, and several interpretable metrics relevant to cybersecurity, including the fraction of clean machines on the network, precision, and false positives on recoveries.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2410.17351

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
(2 more...)

Li, Malio, Piccoli, Elia, Lomonaco, Vincenzo, Bacciu, Davide

I Know How: Combining Prior Policies to Solve New Tasks

arXiv.org Artificial IntelligenceJun-14-2024

Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios. However, this goal is challenging to achieve due to the phenomenon of catastrophic forgetting and the high demand of computational resources. Learning from scratch for each new task is not a viable or sustainable option, and thus agents should be able to collect and exploit prior knowledge while facing new problems. While several methodologies have attempted to address the problem from different perspectives, they lack a common structure. In this work, we propose a new framework, I Know How (IKH), which provides a common formalization. Our methodology focuses on modularity and compositionality of knowledge in order to achieve and enhance agent's ability to learn and adapt efficiently to dynamic environments. To support our framework definition, we present a simple application of it in a simulated driving environment and compare its performance with that of state-of-the-art approaches.

agent, pre-trained policy, scenario, (14 more...)

2406.09835

Country: North America > United States > Indiana > Marion County > Indianapolis (0.04)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Leisure & Entertainment (0.48)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Wu, Zida, Lauriere, Mathieu, Chua, Samuel Jia Cong, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

arXiv.org Artificial IntelligenceMar-6-2024

Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.

algorithm, initial distribution, iteration, (14 more...)

2403.03552

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)