AITopics

Design of new drug compounds with target properties is a key area of research in generative modeling. We present a small drug molecule design pipeline based on graph-generative models and a comparison study of two state-of-the-art graph generative models for designing COVID-19 targeted drug candidates: 1) a variational autoencoder-based approach (VAE) that uses prior knowledge of molecules that have been shown to be effective for earlier coronavirus treatments and 2) a deep Q-learning method (DQN) that generates optimized molecules without any proximity constraints. We evaluate the novelty of the automated molecule generation approaches by validating the candidate molecules with drug-protein binding affinity models. The VAE method produced two novel molecules with similar structures to the antiretroviral protease inhibitor Indinavir that show potential binding affinity for the SARS-CoV-2 protein target 3-chymotrypsin-like protease (3CL-protease).

jt-vae, molecule, pic 50, (15 more...)

2102.04977

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Aumjaud, Pierre, McAuliffe, David, Lera, Francisco Javier Rodríguez, Cardiff, Philip

rl_reach: Reproducible Reinforcement Learning Experiments for Robotic Reaching Tasks

Training reinforcement learning agents at solving a given task is highly dependent on identifying optimal sets of hyperparameters and selecting suitable environment input / output configurations. This tedious process could be eased with a straightforward toolbox allowing its user to quickly compare different training parameter sets. We present rl_reach, a self-contained, open-source and easy-to-use software package designed to run reproducible reinforcement learning experiments for customisable robotic reaching tasks. MIT License Code versioning system used git Software code language used Python 3 Compilation requirements & dependencies Docker OR Python 3, Conda, CUDA (optional) Link to developer documentation/manual https://rl-reach.readthedocs.io/en/latest/index.html Support email for questions pierre.aumjaud@ucd.ie Industrial processes have seen their productivity and efficiency increase considerably in recent decades thanks to the automation of repetitive tasks, notably with the advances in robotics. This productivity can be further improved by enabling robotic agents to solve tasks independently, without being explicitly programmed by humans.

experiment, learning, training environment, (14 more...)

2102.04916

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Spain > Castile and León > León Province > León (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Reverb: A Framework For Experience Replay

Cassirer, Albin, Barth-Maron, Gabriel, Brevdo, Eugene, Ramos, Sabela, Boyd, Toby, Sottiaux, Thibault, Kroiss, Manuel

A central component of training in Reinforcement Learning (RL) is Experience: the data used for training. The mechanisms used to generate and consume this data have an important effect on the performance of RL algorithms. In this paper, we introduce Reverb: an efficient, extensible, and easy to use system designed specifically for experience replay in RL. Reverb is designed to work efficiently in distributed configurations with up to thousands of concurrent clients. The flexible API provides users with the tools to easily and accurately configure the replay buffer. It includes strategies for selecting and removing elements from the buffer, as well as options for controlling the ratio between sampled and inserted elements. This paper presents the core design of Reverb, gives examples of how it can be applied, and provides empirical results of Reverb's performance characteristics.

experience replay, reverb, server, (13 more...)

2102.04736

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Sengupta, Ayan, Mohammad, Yasser, Nakadai, Shinji

An Autonomous Negotiating Agent Framework with Reinforcement Learning Based Strategies and Adaptive Strategy Switching Mechanism

Despite abundant negotiation strategies in literature, the complexity of automated negotiation forbids a single strategy from being dominant against all others in different negotiation scenarios. To overcome this, one approach is to use mixture of experts, but at the same time, one problem of this method is the selection of experts, as this approach is limited by the competency of the experts selected. Another problem with most negotiation strategies is their incapability of adapting to dynamic variation of the opponent's behaviour within a single negotiation session resulting in poor performance. This work focuses on both, solving the problem of expert selection and adapting to the opponent's behaviour with our Autonomous Negotiating Agent Framework. This framework allows real-time classification of opponent's behaviour and provides a mechanism to select, switch or combine strategies within a single negotiation session. Additionally, our framework has a reviewer component which enables self-enhancement capability by deciding to include new strategies or replace old ones with better strategies periodically. We demonstrate an instance of our framework by implementing maximum entropy reinforcement learning based strategies with a deep learning based opponent classifier. Finally, we evaluate the performance of our agent against state-of-the-art negotiators under varied negotiation scenarios.

agent, classifier, negotiator, (14 more...)

2102.03588

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Berlin (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(13 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Huang, Zhenhan, Tanaka, Fumihide

A Modularized and Scalable Multi-Agent Reinforcement Learning-based System for Financial Portfolio Management

Financial Portfolio Management is one of the most applicable problems in Reinforcement Learning (RL) by its sequential decision-making nature. Existing RL-based approaches, while inspiring, often lack scalability, reusability, or profundity of intake information to accommodate the ever-changing capital markets. In this paper, we design and develop MSPM, a novel Multi-agent Reinforcement learning-based system with a modularized and scalable architecture for portfolio management. MSPM involves two asynchronously updated units: Evolving Agent Module (EAM) and Strategic Agent Module (SAM). A self-sustained EAM produces signal-comprised information for a specific asset using heterogeneous data inputs, and each EAM possesses its reusability to have connections to multiple SAMs. A SAM is responsible for the assets reallocation of a portfolio using profound information from the EAMs connected. With the elaborate architecture and the multi-step condensation of the volatile market information, MSPM aims to provide a customizable, stable, and dedicated solution to portfolio management that existing approaches do not. We also tackle data-shortage issue of newly-listed stocks by transfer learning, and validate the necessity of EAM. Experiments on 8-year U.S. stock markets data prove the effectiveness of MSPM in profits accumulation by its outperformance over existing benchmarks.

agent module, eam, portfolio, (12 more...)

2102.03502

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)

Genre: Research Report (0.40)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hammond, Lewis, Abate, Alessandro, Gutierrez, Julian, Wooldridge, Michael

Multi-Agent Reinforcement Learning with Temporal Logic Specifications

In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabilities allows for practical applications in large, stochastic, unknown environments. The existing work in this area is, however, limited. Of the frameworks that consider full linear temporal logic or have correctness guarantees, all methods thus far consider only the case of a single temporal logic specification and a single agent. In order to overcome this limitation, we develop the first multi-agent reinforcement learning technique for temporal logic specifications, which is also novel in its ability to handle multiple specifications. We provide correctness and convergence guarantees for our main algorithm - ALMANAC (Automaton/Logic Multi-Agent Natural Actor-Critic) - even when using function approximation. Alongside our theoretical results, we further demonstrate the applicability of our technique via a set of preliminary experiments.

proceedings, reinforcement learning, specification, (16 more...)

2102.00582

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(18 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zugarini, Andrea, Pasqualini, Luca, Melacci, Stefano, Maggini, Marco

Generate and Revise: Reinforcement Learning in Neural Poetry

Developing machines that reproduce artistic behaviours and learn to be creative is a long-standing goal of the scientific community in the context of Artificial Intelligence [1, 2]. Recently, several researches focused on the case of the noble art of Poetry, motivated by success of Deep Learning approaches to Natural Language Processing (NLP) and, more specifically, to Natural Language Generation [3, 4, 5, 6, 7, 8]. However, existing Machine Learning-based poem generators do not model the natural way poems are created by humans, i.e., poets usually do not create their compositions all in one breath. Usually a poet revisits, rephrases, adjusts a poetry many times, before reaching a text that perfectly conveys their intended meanings and emotions. In particular, a typical feature of poems is that the composition has also to formally respect predefined meter and rhyming schemes. With the aim of developing an artificial agent that learns to mimic this behaviour, we design a framework to generate poems that are repeatedly revisited and corrected, in order to improve the overall quality of the poem.

poem, reinforcement learning, representation, (14 more...)

2102.04114

Country:

Europe > Italy (0.05)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Wei, Chen-Yu, Lee, Chung-Wei, Zhang, Mengxiao, Luo, Haipeng

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent's best response when it uses a stationary policy), convergent (converging to the set of Nash equilibria under self-play), agnostic (no need to know the actions played by the opponent), symmetric (players taking symmetric roles in the algorithm), and enjoying a finite-time last-iterate convergence guarantee, all of which are desirable properties of decentralized algorithms.

algorithm, dist 2, onvergence, (16 more...)

2102.0454

Country:

North America > United States > California (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Lyu, Xueguang, Xiao, Yuchen, Daley, Brett, Amato, Christopher

Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. However, the implications of using a centralized critic in this context are not fully discussed and understood even though it is the standard choice of many algorithms. We therefore formally analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice. Because our theory makes unrealistic assumptions, we also empirically compare the centralized and decentralized critic methods over a wide set of environments to validate our theories and to provide practical advice. We show that there exist misconceptions regarding centralized critics in the current literature and show that the centralized critic design is not strictly beneficial, but rather both centralized and decentralized critics have different pros and cons that should be taken into account by algorithm designers.

agent, decentralized critic, variance, (15 more...)

2102.04402

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Flet-Berliac, Yannis, Ferret, Johan, Pietquin, Olivier, Preux, Philippe, Geist, Matthieu

Adversarially Guided Actor-Critic

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper introduces a third protagonist: the adversary. While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions. This novel objective stimulates the actor to follow strategies that could not have been correctly predicted from previous trajectories, making its behavior innovative in tasks where the reward is extremely rare. Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks. Research in deep reinforcement learning (RL) has proven to be successful across a wide range of problems (Silver et al., 2014; Schulman et al., 2016; Lillicrap et al., 2016; Mnih et al., 2016).

agent, conference paper, international conference, (11 more...)

2102.04376

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.46)
Education > Focused Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)