AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Taming an autonomous surface vehicle for path following and collision avoidance using deep reinforcement learning

Meyer, Eivind, Robinson, Haakon, Rasheed, Adil, San, Omer

arXiv.org Artificial IntelligenceDec-18-2019

Eivind Meyer is currently working on his Master's thesis, completing his five-year integrated Master's degree in Cybernetics and Robotics at the Norwegian University of Science and Technology (NTNU) in Trondheim. Having specialized in Real Time Systems, his research interests focus on adopting state-of-the-art Artificial Intelligence methods for Autonomous Vehicle Control. Haakon Robinson is a PhD candidate at the Norwegian University of Science and Technology (NTNU). He received a Bachelors degree in Physics in 2015 and completed a Masters degree in Cybernetics and Robotics in 2019, both at NTNU. His current work investigates the overlap between modern machine learning techniques and established methods within modelling and control, with a focus on improving the interpretability and be-E Meyer et al.: Preprint submitted to Elsevier Page 15 of 16 Taming an ASV for path following and collision avoidance using DRL havioural guarantees of hybrid models that combine first principle models and data-driven components.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1912.08578

Country:

North America > United States > Oklahoma (0.28)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.24)

Genre: Research Report (0.82)

Industry:

Transportation (1.00)
Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Higher Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

[1905.08233v1] Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

#artificialintelligenceDec-17-2019, 19:55:28 GMT

Which authors of this paper are endorsers? Disable MathJax (What is MathJax?)

few-shot adversarial learning, head model, realistic neural, (2 more...)

#artificialintelligence

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Add feedback

AI experts urge machine learning researchers to tackle climate change

#artificialintelligenceDec-17-2019, 17:48:27 GMT

At the Tackling Climate Change workshop at this year's NeurIPS conference, some of the top minds in machine learning came together to discuss the effects of climate change on life on Earth, how AI can tackle the urgent problem, and why and how the machine learning community should join the fight. The panel included Yoshua Bengio, MILA director and University of Montreal professor; Jeff Dean, Google's AI chief; Andrew Ng, cofounder of Google Brain and founder of Landing.ai; and Cornell University professor and Institute for Computational Sustainability director Carla Gomes. The Tackling Climate Change workshop explored a wide range of topics, from the use of deep reinforcement learning to improve performance for ride-hailing services like Uber and Lyft to the application of deep learning to predict wildfire risk, detect avalanche deposits, improve plane efficiency with better wind forecasts, and conduct a global census of solar farms. The workshop is put together by Climate Change AI, a group that hosts workshops at AI research conferences and a forum for collaboration between machine learning practitioners and people from other fields. One essential step in better addressing the world's pressing challenges, says Bengio, is changing the way AI research is valued.

climate change, research community, workshop, (13 more...)

#artificialintelligence

Country:

North America > Canada > Quebec > Montreal (0.25)
Asia > India (0.05)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)

Genre: Personal (0.47)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Rule of thumb: Which AI / ML algorithms to apply to business problems

#artificialintelligenceDec-17-2019, 05:00:14 GMT

Supervised learning: You know how to classify the input data and the type of behavior you want to predict, but you need the algorithm to calculate it for you on new data Unsupervised learning: You do not know how to classify the data, and you want the algorithm to find patterns and classify the data for you Reinforcement learning: An algorithm which learns by trial and error by interacting with the environment. You use it when you don't have a lot of training data; you cannot clearly define the ideal end state; or the only way to learn about the environment is to interact with it Reinforcement learning: An algorithm which learns by trial and error by interacting with the environment. You use it when you don't have a lot of training data; you cannot clearly define the ideal end state; or the only way to learn about the environment is to interact with it

ai ml algorithm, algorithm, business problem, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

r/MachineLearning - [R] Provably Efficient Exploration in Policy Optimization

#artificialintelligenceDec-16-2019, 22:50:02 GMT

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an "optimistic version" of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves O (\sqrt{d 3 H 3 T}) regret. Here d is the feature dimension, H is the episode horizon, and T is the total number of steps.

machinelearning, policy optimization, provably efficient exploration, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Managing your Cryptofolio - science2innovation

#artificialintelligenceDec-16-2019, 13:28:13 GMT

Portfolio management is the act of making decisions to allocate your funds to a collection of assets for optimal dollar results. When those assets are cryptocurrencies the question is that of allocating funds to digital assets in order to maximise some crypto investment goal, for example, accumulate Bitcoin. In this paper, a reinforcement machine learning approach is built using historical data from the crypto exchange website Polonix with the goal of optimising investor gains over a set period. This model is then benchmarked against standard portfolio strategies used by traders such as buy and hold. The results show that the reinforcement learning approach is extremely effective as an investment optimisation strategy; but the authors warn that historical data is not always a valid way to predict the market.

cryptofolio, historical data

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Reflections on NeurIPs 2019

#artificialintelligenceDec-16-2019, 09:38:02 GMT

There is a huge push among the researchers here for accountability. I was presenting a poster on "Objective Mismatch in Model-based Reinforcement Learning" at the Deep RL Workshop, and the crowd was very receptive to the idea that some of our underlying assumptions of how RL works may be flawed. I also happened to be presenting my poster next to a researcher at Google pushing for more metrics of reliability in RL algorithms. This means: how consistent is the performance papers propose when they claim a new "state-of-the-art" across environments and random seeds. This realistic robustness may be the key to getting these algorithms to be more useful on real applications (such as robotics which I will always bring up as a great interpretable platform for RL).

algorithm, neurip 2019, reflection

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

Self-Play Learning Without a Reward Metric

Schmidt, Dan, Moran, Nick, Rosenfeld, Jonathan S., Rosenthal, Jonathan, Yedidia, Jonathan

arXiv.org Machine LearningDec-16-2019

The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the AlphaZero algorithm that requires only a total ordering over game outcomes, obviating the need to perform any quantitative balancing of reward components. We demonstrate that this system learns optimal play in a comparable amount of time to AlphaZero on a sample game.

algorithm, game outcome, reward function, (13 more...)

arXiv.org Machine Learning

1912.07557

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Coordination in Adversarial Sequential Team Games via Multi-Agent Deep Reinforcement Learning

Celli, Andrea, Ciccone, Marco, Bongo, Raffaele, Gatti, Nicola

arXiv.org Artificial IntelligenceDec-16-2019

Many real-world applications involve teams of agents that have to coordinate their actions to reach a common goal against potential adversaries. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and collusion in bidding. The possibility for the team members to communicate before gameplay---that is, coordinate their strategies ex ante---makes the use of behavioral strategies unsatisfactory. We introduce Soft Team Actor-Critic (STAC) as a solution to the team's coordination problem that does not require any prior domain knowledge. STAC allows team members to effectively exploit ex ante communication via exogenous signals that are shared among the team. STAC reaches near-optimal coordinated strategies both in perfectly observable and partially observable games, where previous deep RL algorithms fail to reach optimal coordinated behaviors.

coordination, learning, team member, (13 more...)

arXiv.org Artificial Intelligence

1912.07712

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)
Europe > Belgium (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

To Follow or not to Follow: Selective Imitation Learning from Observations

Lee, Youngwoon, Hu, Edward S., Yang, Zhengyu, Lim, Joseph J.

arXiv.org Artificial IntelligenceDec-16-2019

Learning from demonstrations is a useful way to transfer a skill from one agent to another. While most imitation learning methods aim to mimic an expert skill by following the demonstration step-by-step, imitating every step in the demonstration often becomes infeasible when the learner and its environment are different from the demonstration. In this paper, we propose a method that can imitate a demonstration composed solely of observations, which may not be reproducible with the current agent. Our method, dubbed selective imitation learning from observations (SILO), selects reachable states in the demonstration and learns how to reach the selected states. Our experiments on both simulated and real robot environments show that our method reliably performs a new task by following a demonstration. Videos and code are available at https://clvrai.com/silo .

demonstration, low-level policy, meta policy, (16 more...)

arXiv.org Artificial Intelligence

1912.0767

Country:

North America > United States > California (0.14)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback