AITopics

2103.03662

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Okawa, Yoshihiro, Sasaki, Tomotake, Iwane, Hidenao

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

arXiv.org Artificial IntelligenceMar-5-2021

In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. Specifically, our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value as well as adjusts the variance-covariance matrix used in the Gaussian policy for exploration. We also show that our exploration process adjustment method theoretically guarantees the satisfaction of the constraints with the pre-specified probability, that is, the satisfaction of a joint chance constraint at every time. Finally, we illustrate the validity and the effectiveness of our method through numerical simulation.

adjustment method, chance constraint, constraint, (16 more...)

2103.03656

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-5-2021

Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach

Truong, Giang, Le, Huu, Suter, David, Zhang, Erchuan, Gilani, Syed Zulqarnain

Robust model fitting is a core algorithm in a large number of computer vision applications. Solving this problem efficiently for datasets highly contaminated with outliers is, however, still challenging due to the underlying computational complexity. Recent literature has focused on learning-based algorithms. However, most approaches are supervised which require a large amount of labelled training data. In this paper, we introduce a novel unsupervised learning framework that learns to directly solve robust model fitting. Unlike other methods, our work is agnostic to the underlying input features, and can be easily generalized to a wide variety of LP-type problems with quasi-convex residuals. We empirically show that our method outperforms existing unsupervised learning approaches, and achieves competitive results compared to traditional methods on several important computer vision problems.

agent, application, estimation, (14 more...)

2103.03501

Country:

Oceania > Australia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

#artificialintelligenceMar-4-2021, 18:19:01 GMT

Universal Trading for Order Execution with Oracle Policy Distillation - Microsoft Research

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the shift from the analytical view with model-based market assumptions to model-free perspective, i.e., reinforcement learning, due to its nature of sequential decision optimization. However, the noisy and yet imperfect market information that can be leveraged by the policy has made it quite challenging to build up sample efficient reinforcement learning methods to achieve effective order execution. In this paper, we propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. Particularly, this framework leverages a policy distillation method that can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information to approximate the optimal trading strategy.

microsoft research, oracle policy distillation, universal trading, (3 more...)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

#artificialintelligenceMar-4-2021, 14:56:02 GMT

Reinforcement learning and reasoning

Reinforcement learning has seen a lot of progress in recent years. From DeepMind success with teaching machines how to play Atari games, then AlphaGo beating world champions in Go to recent OpenAI's progress on Dota 2, a multiplayer game where players divided into two teams compete with each other. The common thread is an artificial agent operating in a virtual world, where the prize is clear (e.g. On the other hand people are experimenting with AI agents operating in real-world. Each clip of Boston Dynamics gets a lot of press, showing robots performing amazing stunts, as you can see yourself here or here.

mathematics, reasoning, theorem, (14 more...)

Industry:

Leisure & Entertainment > Games > Computer Games (0.55)
Leisure & Entertainment > Games > Go (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)

#artificialintelligenceMar-4-2021, 14:55:52 GMT

Required Reading: Breaking Down Conversational Artificial Intelligence

In scientific research, conversational AI is generally restricted to systems trained using statistical, data-driven methods such as reinforcement learning …

conversational artificial intelligence

Industry: Media > News (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

#artificialintelligenceMar-4-2021, 11:32:04 GMT

This AI Thrashes the Hardest Atari Games by Memorizing Its Best Moves

Learning from rewards seems like the simplest thing. I make coffee, I sip coffee, I'm happy. My brain registers "brewing coffee" as an action that leads to a reward. That's the guiding insight behind deep reinforcement learning, a family of algorithms that famously smashed most of Atari's gaming catalog and triumphed over humans in strategy games like Go. Here, an AI "agent" explores the game, trying out different actions and registering ones that let it win.

algorithm, go-explore, sparse reward, (14 more...)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Nekoei, Hadi, Badrinaaraayanan, Akilesh, Courville, Aaron, Chandar, Sarath

Continuous Coordination As a Realistic Scenario for Lifelong Learning

arXiv.org Artificial IntelligenceMar-4-2021

Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2103.03216

Country: North America > Canada > Quebec > Montreal (0.04)

Genre:

Instructional Material (0.57)
Research Report (0.50)
Overview (0.46)

Industry:

Education > Educational Setting > Continuing Education (0.83)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Olin-Ammentorp, Wilkie, Sokolov, Yury, Bazhenov, Maxim

A Dual-Memory Architecture for Reinforcement Learning on Neuromorphic Platforms

arXiv.org Artificial IntelligenceMar-4-2021

Reinforcement learning (RL) is a foundation of learning in biological systems and provides a framework to address numerous challenges with real-world artificial intelligence applications. Efficient implementations of RL techniques could allow for agents deployed in edge-use cases to gain novel abilities, such as improved navigation, understanding complex situations and critical decision making. Towards this goal, we describe a flexible architecture to carry out reinforcement learning on neuromorphic platforms. This architecture was implemented using an Intel neuromorphic processor and demonstrated solving a variety of tasks using spiking dynamics. Our study proposes a usable energy efficient solution for real-world RL applications and demonstrates applicability of the neuromorphic platforms for RL problems.

implementation, information, representation, (17 more...)

2103.0478

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment (0.68)
Information Technology (0.68)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Azize, Achraf, Gaizi, Othman

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

arXiv.org Machine LearningMar-4-2021

Reinforcement Learning (RL) has been able to solve hard problems such as playing Atari games or solving the game of Go, with a unified approach. Yet modern deep RL approaches are still not widely used in real-world applications. One reason could be the lack of guarantees on the performance of the intermediate executed policies, compared to an existing (already working) baseline policy. In this paper, we propose an online model-free algorithm that solves conservative exploration in the policy optimization problem. We show that the regret of the proposed approach is bounded by $\tilde{\mathcal{O}}(\sqrt{T})$ for both discrete and continuous parameter spaces.

algorithm, conservative optimistic policy optimization, multiple importance sampling, (8 more...)

arXiv.org Machine Learning

2103.03307

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)