AITopics

This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.

agent, algorithm, focal agent, (15 more...)

2103.06846

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Todi, Kashyap, Bailly, Gilles, Leiva, Luis A., Oulasvirta, Antti

Adapting User Interfaces with Model-based Reinforcement Learning

Adapting an interface requires taking into account both the positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs to the user -- for example, due to surprise or relearning effort -- or "trap" the process to a suboptimal design immaturely. However, effects on users are hard to predict as they depend on factors that are latent and evolve over the course of interaction. We propose a novel approach for adaptive user interfaces that yields a conservative adaptation policy: It finds beneficial changes when there are such and avoids changes when there are none. Our model-based reinforcement learning method plans sequences of adaptations and consults predictive HCI models to estimate their effects. We present empirical and simulation results from the case of adaptive menus, showing that the method outperforms both a non-adaptive and a frequency-based policy.

adaptation, application, proceedings, (13 more...)

doi: 10.1145/3411764.3445497

2103.06807

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
(16 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Cardozo, Nicolás, Dusparic, Ivana

Auto-COP: Adaptation Generation in Context-Oriented Programming using Reinforcement Learning Options

Self-adaptive software systems continuously adapt in response to internal and external changes in their execution environment, captured as contexts. The COP paradigm posits a technique for the development of self-adaptive systems, capturing their main characteristics with specialized programming language constructs. COP adaptations are specified as independent modules composed in and out of the base system as contexts are activated and deactivated in response to sensed circumstances from the surrounding environment. However, the definition of adaptations, their contexts and associated specialized behavior, need to be specified at design time. In complex CPS this is intractable due to new unpredicted operating conditions. We propose Auto-COP, a new technique to enable generation of adaptations at run time. Auto-COP uses RL options to build action sequences, based on the previous instances of the system execution. Options are explored in interaction with the environment, and the most suitable options for each context are used to generate adaptations exploiting COP. To validate Auto-COP, we present two case studies exhibiting different system characteristics and application domains: a driving assistant and a robot delivery system. We present examples of Auto-COP code generated at run time, to illustrate the types of circumstances (contexts) requiring adaptation, and the corresponding generated adaptations for each context. We confirm that the generated adaptations exhibit correct system behavior measured by domain-specific performance metrics, while reducing the number of required execution/actuation steps by a factor of two showing that the adaptations are regularly selected by the running system as adaptive behavior is more appropriate than the execution of primitive actions.

adaptation, auto-cop, primitive action, (17 more...)

2103.06757

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
Asia > Singapore (0.04)
(7 more...)

Genre: Workflow (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Weissenbacher, Matthias, Kawahara, Yoshinobu

A Quadratic Actor Network for Model-Free Reinforcement Learning

In this work we discuss the incorporation of quadratic neurons into policy networks in the context of model-free actor-critic reinforcement learning. Quadratic neurons admit an explicit quadratic function approximation in contrast to conventional approaches where the the non-linearity is induced by the activation functions. We perform empiric experiments on several MuJoCo continuous control tasks and find that when quadratic neurons are added to MLP policy networks those outperform the baseline MLP whilst admitting a smaller number of parameters. The top returned reward is in average increased by $5.8\%$ while being about $21\%$ more sample efficient. Moreover, it can maintain its advantage against added action and observation noise.

algorithm, quadratic neuron, sac algorithm, (12 more...)

2103.06617

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nikou, Alexandros, Mujumdar, Anusha, Orlic, Marin, Feljan, Aneta Vulgarakis

Symbolic Reinforcement Learning for Safe RAN Control

In order to express desired (SRL) architecture for safe control in Radio Access Network (RAN) specifications to the network into consideration, LTL is used applications. In our automated tool, a user can select a high-level (see [2, 10, 12, 13]), due to the fact that it provides a powerful mathematical safety specifications expressed in Linear Temporal Logic (LTL) to formalism for such purpose. Our proposed demonstration shield an RL agent running in a given cellular network with aim exhibits the following attributes: of optimizing network performance, as measured through certain (1) a general automatic framework from LTL specification user Key Performance Indicators (KPIs). In the proposed architecture, input to the derivation of the policy that fulfills it; at the same network safety shielding is ensured through model-checking techniques time, blocking the control actions that violate the specification; over combined discrete system models (automata) that are (2) novel system dynamics abstraction to companions Markov Decision abstracted through reinforcement learning. We demonstrate the Processes (MDP) which is computationally efficient; user interface (UI) helping the user set intent specifications to the (3) UI development that allows the user to graphically access all architecture and inspect the difference in allowed and blocked actions.

rl agent, specification, symbolic reinforcement learning, (11 more...)

2103.06602

Country: North America > United States > California > Los Angeles County > Los Angeles (0.15)

Genre: Research Report (0.50)

Industry: Telecommunications (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Anwar, Aqeel, Raychowdhury, Arijit

Multi-Task Federated Reinforcement Learning with Adversaries

Reinforcement learning algorithms, just like any other Machine learning algorithm pose a serious threat from adversaries. The adversaries can manipulate the learning algorithm resulting in non-optimal policies. In this paper, we analyze the Multi-task Federated Reinforcement Learning algorithms, where multiple collaborative agents in various environments are trying to maximize the sum of discounted return, in the presence of adversarial agents. We argue that the common attack methods are not guaranteed to carry out a successful attack on Multi-task Federated Reinforcement Learning and propose an adaptive attack method with better attack performance. Furthermore, we modify the conventional federated reinforcement learning algorithm to address the issue of adversaries that works equally well with and without the adversaries. Experimentation on different small to mid-size reinforcement learning problems show that the proposed attack method outperforms other general attack methods and the proposed modification to federated reinforcement learning algorithm was able to achieve near-optimal policies in the presence of adversarial agents.

adversary, agent, attack method, (14 more...)

2103.06473

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Texas (0.04)
Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Generalizable Episodic Memory for Deep Reinforcement Learning

Hu, Hao, Ye, Jianing, Ren, Zhizhou, Zhu, Guangxiang, Zhang, Chongjie

Episodic memory-based methods can rapidly latch onto past successful strategies by a non-parametric memory and improve sample efficiency of traditional reinforcement learning. However, little effort is put into the continuous domain, where a state is never visited twice and previous episodic methods fail to efficiently aggregate experience across trajectories. To address this problem, we propose Generalizable Episodic Memory (GEM), which effectively organizes the state-action values of episodic memory in a generalizable manner and supports implicit planning on memorized trajectories. GEM utilizes a double estimator to reduce the overestimation bias induced by value propagation in the planning process. Empirical evaluation shows that our method significantly outperforms existing trajectory-based methods on various MuJoCo continuous control tasks. To further show the general applicability, we evaluate our method on Atari games with discrete action space, which also shows significant improvement over baseline algorithms.

algorithm, generalizable episodic memory, trajectory, (9 more...)

2103.06469

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Yedidsion, Harel, Suriadinata, Jennifer, Xu, Zifan, Debruyn, Stefan, Stone, Peter

A Scavenger Hunt for Service Robots

Creating robots that can perform general-purpose service tasks in a human-populated environment has been a longstanding grand challenge for AI and Robotics research. One particularly valuable skill that is relevant to a wide variety of tasks is the ability to locate and retrieve objects upon request. This paper models this skill as a Scavenger Hunt (SH) game, which we formulate as a variation of the NP-hard stochastic traveling purchaser problem. In this problem, the goal is to find a set of objects as quickly as possible, given probability distributions of where they may be found. We investigate the performance of several solution algorithms for the SH problem, both in simulation and on a real mobile robot. We use Reinforcement Learning (RL) to train an agent to plan a minimal cost path, and show that the RL agent can outperform a range of heuristic algorithms, achieving near optimal performance. In order to stimulate research on this problem, we introduce a publicly available software stack and associated website that enable users to upload scavenger hunts which robots can download, perform, and learn from to continually improve their performance on future hunts.

algorithm, node, robot, (16 more...)

2103.05225

Country: North America > United States > Texas > Travis County > Austin (0.05)

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceMar-10-2021, 23:00:10 GMT

Getting started with Reinforcement Learning

Today, Artificial Intelligence (AI) has undergone impressive advancements. Right now, thanks to Machine Learning, we have been able to achieve good competency at the Narrow AI level. Reinforcement Learning, is now considered to be the most promising technique in order to move to the next level in the AI paradigm (Figure 1). One of the reasons why Reinforcement Learning has gained so much interest today, is its interdisciplinarity. The core concepts of this area, follow in fact basic game theory, evolutionary and neuroscience principles.

algorithm, learning, reinforcement learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Journal of Artificial Intelligence ResearchMar-10-2021

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

Furelos-Blanco, Daniel (Imperial College London) | Law, Mark (Imperial College London) | Jonsson, Anders (Universitat Pompeu Fabra) | Broda, Krysia | Russo, Alessandra

In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task’s subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding. A state-of-the-art inductive logic programming system is used to learn a subgoal automaton that covers the traces of high-level events observed by the RL agent. When the currently exploited automaton does not correctly recognize a trace, the automaton learner induces a new automaton that covers that trace. The interleaving process guarantees the induction of automata with the minimum number of states, and applies a symmetry breaking mechanism to shrink the search space whilst remaining complete. We evaluate ISA in several gridworld and continuous state space problems using different RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning performance in terms of the traces, the symmetry breaking and specific restrictions imposed on the final learnable automaton. For each class of RL problem, we show that the learned automata can be successfully exploited to learn policies that reach the goal, achieving an average reward comparable to the case where automata are not learned but handcrafted and given beforehand.

automata, automaton, induction and exploitation, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12372

AI Access Foundation

12372

Journal of Artificial Intelligence Research

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)