AITopics

2102.01985

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Towards a reinforcement learning de novo genome assembler

Padovani, Kleber, Xavier, Roberto, Carvalho, Andre, Reali, Anna, Chateau, Annie, Alves, Ronnie

The use of reinforcement learning has proven to be very promising for solving complex activities without human supervision during their learning process. However, their successful applications are predominantly focused on fictional and entertainment problems - such as games. Based on the above, this work aims to shed light on the application of reinforcement learning to solve this relevant real-world problem, the genome assembly. By expanding the only approach found in the literature that addresses this problem, we carefully explored the aspects of intelligent agent learning, performed by the Q-learning algorithm, to understand its suitability to be applied in scenarios whose characteristics are more similar to those faced by real genome projects. The improvements proposed here include changing the previously proposed reward system and including state space exploration optimization strategies based on dynamic pruning and mutual collaboration with evolutionary computing. These investigations were tried on 23 new environments with larger inputs than those used previously. All these environments are freely available on the internet for the evolution of this research by the scientific community. The results suggest consistent performance progress using the proposed improvements, however, they also demonstrate the limitations of them, especially related to the high dimensionality of state and action spaces. We also present, later, the paths that can be traced to tackle genome assembly efficiently in real scenarios considering recent, successfully reinforcement learning applications - including deep reinforcement learning - from other domains dealing with high-dimensional inputs.

agent, algorithm, reinforcement, (17 more...)

2102.02649

Country:

South America > Brazil > São Paulo (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
South America > Brazil > Pará > Belém (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Taylor, Matthew E., Nissen, Nicholas, Wang, Yuan, Navidi, Neda

Improving Reinforcement Learning with Human Assistance: An Argument for Human Subject Studies with HIPPO Gym

Reinforcement learning (RL) is a popular machine learning paradigm for game playing, robotics control, and other sequential decision tasks. However, RL agents often have long learning times with high data requirements because they begin by acting randomly. In order to better learn in complex tasks, this article argues that an external teacher can often significantly help the RL agent learn. OpenAI Gym is a common framework for RL research, including a large number of standard environments and agents, making RL research significantly more accessible. This article introduces our new open-source RL framework, the Human Input Parsing Platform for Openai Gym (HIPPO Gym), and the design decisions that went into its creation. The goal of this platform is to facilitate human-RL research, again lowering the bar so that more researchers can quickly investigate different ways that human teachers could assist RL agents, including learning from demonstrations, learning from feedback, or curriculum learning.

agent, demonstration, student, (15 more...)

2102.02639

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota > Ramsey County > Saint Paul (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Information Technology > Services (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Ling, Chun Kai, Brown, Noam

Safe Search for Stackelberg Equilibria in Extensive-Form Games

Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower. In recent years, it has become a cornerstone of many security applications, including airport patrolling and wildlife poaching prevention. Even though many of these settings are sequential in nature, existing techniques pre-compute the entire solution ahead of time. In this paper, we present a theoretically sound and empirically effective way to apply search, which leverages extra online computation to improve a solution, to the computation of Stackelberg equilibria in general-sum games. Instead of the leader attempting to solve the full game upfront, an approximate "blueprint" solution is first computed offline and is then improved online for the particular subgames encountered in actual play. We prove that our search technique is guaranteed to perform no worse than the pre-computed blueprint strategy, and empirically demonstrate that it enables approximately solving significantly larger games compared to purely offline methods. We also show that our search operation may be cast as a smaller Stackelberg problem, making our method complementary to existing algorithms based on strategy generation.

follower, information, subgame, (17 more...)

2102.01775

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Yin, Ming, Bai, Yu, Wang, Yu-Xiang

We consider the problem of offline reinforcement learning (RL) -- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose Off-Policy Double Variance Reduction (OPDVR), a new variance reduction based algorithm for offline RL. Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon stationary transition setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best known upper bound by a factor of $H$. Moreover, we establish an information-theoretic lower bound of $\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.

algorithm, probability 1, sample complexity, (14 more...)

2102.01748

Country:

North America > United States (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Chen, Zaiwei, Maguluri, Siva Theja, Shakkottai, Sanjay, Shanmugam, Karthikeyan

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

arXiv.org Machine LearningFeb-2-2021

Reinforcement Learning (RL) is a promising approach to solve sequential decision making problems in complex and stochastic systems [38]. Despite the empirical successes of RL [32, 33], the convergence properties of RL algorithms are not well understood. In particular, even in the basic tabular setting (i.e., without using function approximation), finite-sample convergence guarantees of many popular RL algorithms are in general not established. Most of the value-based RL algorithms can be viewed as Stochastic Approximation (SA) algorithms for solving suitable Bellman's equations. Due to the nature of sampling in RL, many such algorithms inevitably perform the so-called asynchronous update. That is, in each iteration, only a subset of the components of the vector-valued iterate is updated. Moreover, the components being updated are usually selected in a stochastic manner along a single trajectory based on an underlying Markov chain. Handling such asynchronous updates is one of the main challenges in analyzing the behavior of RL algorithms.

algorithm, assumption 2, convergence, (15 more...)

arXiv.org Machine Learning

2102.01567

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Dutta, Hrishikesh, Biswas, Subir

Towards Multi-agent Reinforcement Learning for Wireless Network Protocol Synthesis

This paper proposes a multi-agent reinforcement learning based medium access framework for wireless networks. The access problem is formulated as a Markov Decision Process (MDP), and solved using reinforcement learning with every network node acting as a distributed learning agent. The solution components are developed step by step, starting from a single-node access scenario in which a node agent incrementally learns to control MAC layer packet loads for reining in self-collisions. The strategy is then scaled up for multi-node fully-connected scenarios by using more elaborate reward structures. It also demonstrates preliminary feasibility for more general partially connected topologies. It is shown that by learning to adjust MAC layer transmission probabilities, the protocol is not only able to attain theoretical maximum throughput at an optimal load, but unlike classical approaches, it can also retain that maximum throughput at higher loading conditions. Additionally, the mechanism is agnostic to heterogeneous loading while preserving that feature. It is also shown that access priorities of the protocol across nodes can be parametrically adjusted. Finally, it is also shown that the online learning feature of reinforcement learning is able to make the protocol adapt to time-varying loading conditions.

dmrl-mac, node, throughput, (14 more...)

2102.01611

Country:

Asia > China (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning

Cui, Kai, Koeppl, Heinz

The recent mean field game (MFG) formalism facilitates otherwise intractable computation of approximate Nash equilibria in many-agent settings. In this paper, we consider discrete-time finite MFGs subject to finite-horizon objectives. We show that all discrete-time finite MFGs with non-constant fixed point operators fail to be contractive as typically assumed in existing MFG literature, barring convergence via fixed point iteration. Instead, we incorporate entropy-regularization and Boltzmann policies into the fixed point iteration. As a result, we obtain provable convergence to approximate fixed points where existing methods fail, and reach the original goal of approximate Nash equilibria. All proposed methods are evaluated with respect to their exploitability, on both instructive examples with tractable exact solutions and high-dimensional problems where exact methods become intractable. In high-dimensional scenarios, we apply established deep reinforcement learning methods and empirically combine fictitious play with our approximations.

convergence, iteration, mean field, (15 more...)

2102.01585

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lan, Charline Le, Bellemare, Marc G., Castro, Pablo Samuel

Metrics and continuity in reinforcement learning

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.

continuity, metric, topology, (17 more...)

2102.01514

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New Jersey (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Mqirmi, Pierre El, Belardinelli, Francesco, León, Borja G.

An Abstraction-based Method to Verify Multi-Agent Deep Reinforcement-Learning Behaviours

Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL) and builds an abstract representation of the system to reduce the complexity of the verification step. This abstract model allows for model checking techniques to identify a set of abstract policies that meet the safety constraints expressed in PCTL. Then, the agents' behaviours are restricted according to these safe abstract policies. We provide formal guarantees that by using this method, the actions of the agents always meet the safety constraints, and provide a procedure to generate an abstract model automatically. We empirically evaluate and show the effectiveness of our method in a multi-agent environment.

agent, bisimulation, constraint, (17 more...)

2102.01434

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)