AITopics

2103.10897

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Curi, Sebastian, Bogunovic, Ilija, Krause, Andreas

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

arXiv.org Artificial IntelligenceMar-18-2021

In real-world tasks, reinforcement learning (RL) agents frequently encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. The robust RL framework addresses this challenge via a worst-case optimization between an agent and an adversary. Previous robust RL algorithms are either sample inefficient, lack robustness guarantees, or do not scale to large problems. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem while attaining near-optimal sample complexity guarantees. RH-UCRL is a model-based reinforcement learning (MBRL) algorithm that effectively distinguishes between epistemic and aleatoric uncertainty and efficiently explores both the agent and adversary decision spaces during policy learning. We scale RH-UCRL to complex tasks via neural networks ensemble models as well as neural network policies. Experimentally, we demonstrate that RH-UCRL outperforms other robust deep RL algorithms in a variety of adversarial environments.

algorithm, combining pessimism, rh-ucrl, (15 more...)

2103.10369

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Martiradonna, Sergio, Abrardo, Andrea, Moretti, Marco, Piro, Giuseppe, Boggia, Gennaro

Deep Reinforcement Learning-Aided RAN Slicing Enforcement for B5G Latency Sensitive Services

arXiv.org Artificial IntelligenceMar-18-2021

The combination of cloud computing capabilities at the network edge and artificial intelligence promise to turn future mobile networks into service- and radio-aware entities, able to address the requirements of upcoming latency-sensitive applications. In this context, a challenging research goal is to exploit edge intelligence to dynamically and optimally manage the Radio Access Network Slicing (that is a less mature and more complex technology than fifth-generation Network Slicing) and Radio Resource Management, which is a very complex task due to the mostly unpredictably nature of the wireless channel. This paper presents a novel architecture that leverages Deep Reinforcement Learning at the edge of the network in order to address Radio Access Network Slicing and Radio Resource Management optimization supporting latency-sensitive applications. The effectiveness of our proposal against baseline methodologies is investigated through computer simulation, by considering an autonomous-driving use-case.

agent, bandwidth, requirement, (15 more...)

2103.10277

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > Italy > Apulia > Bari (0.04)

Genre: Research Report (0.50)

Industry:

Telecommunications (1.00)
Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Baram, Nir, Tennenholtz, Guy, Mannor, Shie

Maximum Entropy Reinforcement Learning with Mixture Policies

arXiv.org Artificial IntelligenceMar-18-2021

Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closed-form expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, low-variance mixture-entropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.

entropy, reinforcement, reinforcement learning, (13 more...)

2103.10176

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.63)

#artificialintelligenceMar-17-2021, 12:53:50 GMT

Quantum Computing and Reinforcement Learning Are Joining Forces to Make Faster AI

Deep reinforcement learning is having a superstar moment. Trouncing physicians at medical diagnoses and crushing humanity's best gamers at Go and Atari. While far from achieving the flexible, quick thinking that comes naturally to humans, this powerful machine learning idea seems unstoppable as a harbinger of better thinking machines. Except there's a massive roadblock: they take forever to run. Because the concept behind these algorithms is based on trial and error, a reinforcement learning AI "agent" only learns after being rewarded for its correct decisions.

ai agent, computing, reinforcement, (9 more...)

#artificialintelligence

Country:

Europe > Austria > Vienna (0.05)
Europe > Austria > Tyrol > Innsbruck (0.05)

Industry: Health & Medicine (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Hayes, Conor F., Rădulescu, Roxana, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa M., Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai A., Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik M.

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

ground transportation, objective, renewable energy, (24 more...)

2103.09568

Country:

North America > United States (0.45)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
Europe > Belgium (0.14)
(5 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine (1.00)
Law (0.67)
Government (0.67)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Ammanabrolu, Prithviraj, Riedl, Mark O.

Situated Language Learning via Interactive Narratives

This paper provides a roadmap that explores the question of how to imbue learning agents with the ability to understand and generate contextually relevant natural language in service of achieving a goal. We hypothesize that two key components in creating such agents are interactivity and environment grounding, shown to be vital parts of language learning in humans, and posit that interactive narratives should be the environments of choice for such training these agents. These games are simulations in which an agent interacts with the world through natural language -- "perceiving", "acting upon", and "talking to" the world using textual descriptions, commands, and dialogue -- and as such exist at the intersection of natural language processing, storytelling, and sequential decision making. We discuss the unique challenges a text games' puzzle-like structure combined with natural language state-and-action spaces provides: knowledge representation, commonsense reasoning, and exploration. Beyond the challenges described so far, progress in the realm of interactive narratives can be applied in adjacent problem domains. These applications provide interesting challenges of their own as well as extensions to those discussed so far. We describe three of them in detail: (1) evaluating AI system's commonsense understanding by automatically creating interactive narratives; (2) adapting abstract text-based policies to include other modalities such as vision; and (3) enabling multi-agent and human-AI collaboration in shared, situated worlds.

agent, bottleneck, interactive narrative, (12 more...)

2103.09977

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.41)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Chen, Lin, Scherrer, Bruno, Bartlett, Peter L.

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $d\gamma^{2}>1$, where $d$ is the dimension of the feature vector and $\gamma$ is the discount rate. In this regime, for any $q\in[\gamma^{2},1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$. Note that the lower bound of the sample complexity is exponential in $d$. If $q=\gamma^{2}$, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left\{ \frac{\left\Vert \theta^{\pi}\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}{\delta}\right)\right\} \right)$ samples ($\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$.

function approximation, linear function approximation, reinforcement, (12 more...)

2103.09847

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.83)

Pacchiano, Aldo, Lee, Jonathan, Bartlett, Peter, Nachum, Ofir

Near Optimal Policy Optimization via REPS

Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms. While REPS is commonly known in the community, there exist no guarantees on its performance when using stochastic and gradient-based solvers. In this paper we aim to fill this gap by providing guarantees and convergence rates for the sub-optimality of a policy learned using first-order optimization methods applied to the REPS objective. We first consider the setting in which we are given access to exact gradients and demonstrate how near-optimality of the objective translates to near-optimality of the policy. We then consider the practical setting of stochastic gradients, and introduce a technique that uses \emph{generative} access to the underlying Markov decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.

objective, optimal policy optimization, probability, (11 more...)

2103.09756

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Artificial IntelligenceMar-16-2021

Learning Symbolic Rules for Interpretable Deep Reinforcement Learning

Ma, Zhihao, Zhuang, Yuzheng, Weng, Paul, Zhuo, Hankz Hankui, Li, Dong, Liu, Wulong, Hao, Jianye

Recent progress in deep reinforcement learning (DRL) can be largely attributed to the use of neural networks. However, this black-box approach fails to explain the learned policy in a human understandable way. To address this challenge and improve the transparency, we propose a Neural Symbolic Reinforcement Learning framework by introducing symbolic logic into DRL. This framework features a fertilization of reasoning and learning modules, enabling end-to-end learning with prior symbolic knowledge. Moreover, interpretability is achieved by extracting the logical rules learned by the reasoning module in a symbolic rule space. The experimental results show that our framework has better interpretability, along with competing performance in comparison to state-of-the-art approaches.

agent, interpretability, predicate, (15 more...)

2103.08228

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games (0.46)
Transportation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)