AITopics

1910.01708

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceOct-2-2019, 19:58:29 GMT

Best Deep Reinforcement Learning Research of 2019 So Far

The scale of Internet-connected systems has increased considerably, and these systems are being exposed to cyberattacks more than ever. The complexity and dynamics of cyberattacks require protecting mechanisms to be responsive, adaptive, and large-scale. Machine learning, or more specifically DRL, methods have been proposed widely to address these issues. By incorporating deep learning into traditional RL, DRL is highly capable of solving complex, dynamic, and especially high-dimensional cyber defense problems. This paper presents a survey of DRL approaches developed for cyber security.

best deep reinforcement learning research, cyber security, cyberattack, (1 more...)

#artificialintelligence

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

#artificialintelligenceOct-2-2019, 19:57:10 GMT

Ready, Set, Algorithms! Teams Learn AI by Racing Cars

Anyone with an Amazon Web Services account can participate in the league. Teams or individuals can compete online in "virtual" races or in person at events world-wide. Teams build and train AI algorithms using Amazon SageMaker software, deploy them to self-driving cars measuring about 10 inches, then race them around a track of roughly 17 feet by 26 feet. "It's actually having practical applications," said James Rhodes, chief technology officer of investment research firm Morningstar. Thanks to the training, the company expects to have dozens of projects based on reinforcement learning and other machine-learning techniques in deployment by the end of 2020, he said.

algorithm, reinforcement, team learn ai, (11 more...)

#artificialintelligence

Country: North America > United States > Texas > Travis County > Austin (0.06)

Industry:

Banking & Finance (1.00)
Leisure & Entertainment > Sports > Motorsports > Formula One (0.40)
Transportation > Passenger (0.37)
Information Technology > Software (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Preiss, James A., Arnold, Sébastien M. R., Wei, Chen-Yu, Kloft, Marius

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

arXiv.org Machine LearningOct-2-2019

We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.

algorithm, experiment, variance, (11 more...)

arXiv.org Machine Learning

1910.01249

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Machine LearningOct-2-2019

Formal Language Constraints for Markov Decision Processes

Quint, Eleanor, Xu, Dong, Dogan, Haluk, Hakguder, Zeynep, Scott, Stephen, Dwyer, Matthew

In order to satisfy safety conditions, a reinforcement learned (RL) agent maybe constrained from acting freely, e.g., to prevent trajectories that might cause unwanted behavior or physical damage in a robot. We propose a general framework for augmenting a Markov decision process (MDP) with constraints that are described in formal languages over sequences of MDP states and agent actions. Constraint enforcement is implemented by filtering the allowed action set or by applying potential-based reward shaping to implement hard and soft constraint enforcement, respectively. We instantiate this framework using deterministic finite automata to encode constraints and propose methods of augmenting MDP observations with the state of the constraint automaton for learning. We empirically evaluate these methods with a variety of constraints by training Deep Q-Networks in Atari games as well as Proximal Policy Optimization in MuJoCo environments. We experimentally find that our approaches are effective in significantly reducing or eliminating constraint violations with either minimal negative or, depending on the constraint, a clear positive impact on final performance.

constraint, constraint violation, violation, (15 more...)

arXiv.org Machine Learning

1910.01074

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningOct-2-2019

Unsupervised Doodling and Painting with Improved SPIRAL

Mellor, John F. J., Park, Eunbyung, Ganin, Yaroslav, Babuschkin, Igor, Kulkarni, Tejas, Rosenbaum, Dan, Ballard, Andy, Weber, Theophane, Vinyals, Oriol, Eslami, S. M. Ali

We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvements to the architectures of the agents and discriminators that lead to intriguing and at times surprising results. We find that when sufficiently constrained, generative agents can learn to produce images with a degree of visual abstraction, despite having only ever seen real photographs (no human brush strokes). And given enough time with the painting environment, they can produce images with considerable realism. These results show that, under the right circumstances, some aspects of human drawing can emerge from simulated embodiment, without the need for external supervision, imitation or social cues. Finally, we note the framework's potential for use in creative applications.

agent, discriminator, spiral, (17 more...)

arXiv.org Machine Learning

1910.01007

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)

Schwartz, Erez, Tennenholtz, Guy, Tessler, Chen, Mannor, Shie

Natural Language State Representation for Reinforcement Learning

Recent advances in Reinforcement Learning have highlighted the difficulties in learning within complex high dimensional domains. We argue that one of the main reasons that current approaches do not perform well, is that the information is represented sub-optimally. A natural way to describe what we observe, is through natural language. In this paper, we implement a natural language state representation to learn and complete tasks. Our experiments suggest that natural language based agents are more robust, converge faster and perform better than vision based agents, showing the benefit of using natural language representations for Reinforcement Learning.

agent, natural language, representation, (14 more...)

1910.02789

Country: Europe > Hungary (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Deep Reinforcement Learning for Single-Shot Diagnosis and Adaptation in Damaged Robots

Verma, Shresth, Nair, Haritha S., Agarwal, Gaurav, Dhar, Joydip, Shukla, Anupam

Robotics has proved to be an indispensable tool in many industrial as well as social applications, such as warehouse automation, manufacturing, disaster robotics, etc. In most of these scenarios, damage to the agent while accomplishing mission-critical tasks can result in failure. To enable robotic adaptation in such situations, the agent needs to adopt policies which are robust to a diverse set of damages and must do so with minimum computational complexity. We thus propose a damage aware control architecture which diagnoses the damage prior to gait selection while also incorporating domain randomization in the damage space for learning a robust policy. To implement damage awareness, we have used a Long Short Term Memory based supervised learning network which diagnoses the damage and predicts the type of damage. The main novelty of this approach is that only a single policy is trained to adapt against a wide variety of damages and the diagnosis is done in a single trial at the time of damage.

learning, reinforcement learning, robot, (11 more...)

1910.0124

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.51)

Industry: Energy > Power Industry > Utilities > Nuclear (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Task-Relevant Adversarial Imitation Learning

Zolna, Konrad, Reed, Scott, Novikov, Alexander, Colmenarej, Sergio Gomez, Budden, David, Cabi, Serkan, Denil, Misha, de Freitas, Nando, Wang, Ziyu

We show that a critical problem in adversarial imitation from high-dimensional sensory data is the tendency of discriminator networks to distinguish agent and expert behaviour using task-irrelevant features beyond the control of the agent. We analyze this problem in detail and propose a solution as well as several baselines that outperform standard Generative Adversarial Imitation Learning (GAIL). Our proposed solution, Task-Relevant Adversarial Imitation Learning (TRAIL), uses a constrained optimization objective to overcome task-irrelevant features. Comprehensive experiments show that TRAIL can solve challenging manipulation tasks from pixels by imitating human operators, where other agents such as behaviour cloning (BC), standard GAIL, improved GAIL variants including our newly proposed baselines, and Deterministic Policy Gradients from Demonstrations (DPGfD) fail to find solutions, even when the other agents have access to task reward.

arxiv preprint arxiv, demonstration, discriminator, (11 more...)

1910.01077

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada (0.04)
Europe > Poland (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Tessler, Chen, Merlis, Nadav, Mannor, Shie

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.

algorithm, arxiv preprint arxiv, evaluation, (12 more...)

1910.01062

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)