AITopics

2011.0729

Country:

Europe > Belgium (0.04)
Europe > Ireland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Education (0.68)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceNov-13-2020, 17:50:35 GMT

OpenAI proposes using reciprocity to encourage AI agents to work together

Many real-world problems require complex coordination between multiple agents -- e.g., people or algorithms. A machine learning technique called multi-agent reinforcement learning (MARL) has shown success with respect to this, mainly in two-team games like Go, DOTA 2, Starcraft, hide-and-seek, and capture the flag. But the human world is far messier than games. That's because humans face social dilemmas at multiple scales, from the interpersonal to the international, and they must decide not only how to cooperate but when to cooperate. To address this challenge, researchers at OpenAI propose training AI agents with what they call randomized uncertain social preferences (RUSP), an augmentation that expands the distribution of environments in which reinforcement learning agents train.

agent, encourage ai agent, reciprocity, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

#artificialintelligenceNov-13-2020, 14:21:11 GMT

Gym Tutorial: The Frozen Lake

In this article, we are going to learn how to create and explore the Frozen Lake environment using the Gym library, an open source project created by OpenAI used for reinforcement learning experiments. The Gym library defines a uniform interface for environments what makes the integration between algorithms and environment easier for developers. Among many ready-to-use environments, the default installation includes a text-mode version of the Frozen Lake game, used as example in our last post. The first step to create the game is to import the Gym library and create the environment. The next line calls the method gym.make() to create the Frozen Lake environment and then we call the method env.reset() to put it on its initial state.

frozen lake environment, frozen lake game, reinforcement, (12 more...)

Genre: Instructional Material (0.37)

Industry: Leisure & Entertainment (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Towards Human-Level Learning of Complex Physical Puzzles

Ota, Kei, Jha, Devesh K., Romeres, Diego, van Baar, Jeroen, Smith, Kevin A., Semitsu, Takayuki, Oiki, Tomoaki, Sullivan, Alan, Nikovski, Daniel, Tenenbaum, Joshua B.

Humans quickly solve tasks in novel systems with complex dynamics, without requiring much interaction. While deep reinforcement learning algorithms have achieved tremendous success in many complex tasks, these algorithms need a large number of samples to learn meaningful policies. In this paper, we present a task for navigating a marble to the center of a circular maze. While this system is very intuitive and easy for humans to solve, it can be very difficult and inefficient for standard reinforcement learning algorithms to learn meaningful policies. We present a model that learns to move a marble in the complex environment within minutes of interacting with the real system. Learning consists of initializing a physics engine with parameters estimated using data from the real system. The error in the physics engine is then corrected using Gaussian process regression, which is used to model the residual between real observations and physics engine simulations. The physics engine equipped with the residual model is then used to control the marble in the maze environment using a model-predictive feedback over a receding horizon. We contrast the learning behavior against the time taken by humans to solve the problem to show comparable behavior. To the best of our knowledge, this is the first time that a hybrid model consisting of a full physics engine along with a statistical function approximator has been used to control a complex physical system in real-time using nonlinear model-predictive control (NMPC). Codes for the simulation environment can be downloaded here https://www.merl.com/research/license/CME . A video describing our method could be found here https://youtu.be/xaxNCXBovpc .

artificial intelligence, computer game, physics engine, (17 more...)

2011.07193

Country:

North America > United States > Massachusetts (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas (0.67)
Leisure & Entertainment > Games > Computer Games (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhou, Wenxuan, Bajracharya, Sujay, Held, David

PLAS: Latent Action Space for Offline Reinforcement Learning

The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. This setting will be an increasingly more important paradigm for real-world applications of reinforcement learning such as robotics, in which data collection is slow and potentially dangerous. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. This leads to the challenge of constraining the policy to select actions within the support of the dataset during training. We propose to simply learn the Policy in the Latent Action Space (PLAS) such that this requirement is naturally satisfied. We evaluate our method on continuous control benchmarks in simulation and a deformable object manipulation task with a physical robot. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets, outperforming existing offline reinforcement learning methods with explicit constraints. Videos and code are available at https://sites.google.com/view/latent-policy.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2011.07213

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Phoebe: Reuse-Aware Online Caching with Reinforcement Learning for Emerging Storage Models

Wu, Nan, Li, Pengcheng

With data durability, high access speed, low power efficiency and byte addressability, NVMe and SSD, which are acknowledged representatives of emerging storage technologies, have been applied broadly in many areas. However, one key issue with high-performance adoption of these technologies is how to properly define intelligent cache layers such that the performance gap between emerging technologies and main memory can be well bridged. To this end, we propose Phoebe, a reuse-aware reinforcement learning framework for the optimal online caching that is applicable for a wide range of emerging storage models. By continuous interacting with the cache environment and the data stream, Phoebe is capable to extract critical temporal data dependency and relative positional information from a single trace, becoming ever smarter over time. To reduce training overhead during online learning, we utilize periodical training to amortize costs. Phoebe is evaluated on a set of Microsoft cloud storage workloads. Experiment results show that Phoebe is able to close the gap of cache miss rate from LRU and a state-of-the-art online learning based cache policy to the Belady's optimal policy by 70.3% and 52.6%, respectively.

cache, information, reuse distance, (16 more...)

2011.0716

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(2 more...)

Genre: Research Report (0.70)

Industry:

Education > Educational Setting > Online (0.55)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Beattie, Charles, Köppe, Thomas, Duéñez-Guzmán, Edgar A., Leibo, Joel Z.

DeepMind Lab2D

We present DeepMind Lab2D, a scalable environment simulator for artificial intelligence research that facilitates researcher-led experimentation with environment design. DeepMind Lab2D was built with the specific needs of multi-agent deep reinforcement learning researchers in mind, but it may also be useful beyond that particular subfield.

agent, deepmind lab2d, learning, (14 more...)

2011.07027

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.69)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Nguyen, Phuong D. H., Eppe, Manfred, Wermter, Stefan

Robotic self-representation improves manipulation skills and transfer learning

Cognitive science suggests that the self-representation is critical for learning and problem-solving. However, there is a lack of computational methods that relate this claim to cognitively plausible robots and reinforcement learning. In this paper, we bridge this gap by developing a model that learns bidirectional action-effect associations to encode the representations of body schema and the peripersonal space from multisensory information, which is named multimodal BidAL. Through three different robotic experiments, we demonstrate that this approach significantly stabilizes the learning-based problem-solving under noisy conditions and that it improves transfer learning of robotic manipulation skills.

international conference, representation, robot, (15 more...)

2011.06985

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Hamburg (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)

#artificialintelligenceNov-12-2020, 22:46:07 GMT

Reinforcement Learning: 10 Real Reward & Punishment Applications

In Reinforcement Learning (RL), agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimize wrong moves and maximize the right ones. Various papers have proposed Deep Reinforcement Learning for autonomous driving. In self-driving cars, there are various aspects to consider, such as speed limits at various places, drivable zones, avoiding collisions -- just to mention a few.

application, reinforcement, reinforcement learning, (12 more...)

Country:

North America > United States > Ohio (0.05)
North America > United States > Maryland (0.05)
North America > United States > Colorado (0.05)
Asia > China (0.05)

Industry:

Transportation > Ground > Road (0.55)
Information Technology > Robotics & Automation (0.55)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceNov-12-2020, 17:35:53 GMT

Baseline for Policy Gradients that All Deep Learning Enthusists Must Know

Deep reinforcement learning has a variety of different algorithms that solves many types of complex problems in various situations, one class of these algorithms is policy gradient (PG), which applies to a wide range of problems in both discrete and continuous action spaces, but applying it naively is inefficient, because of its poor sample complexity and high variance, which result in slower learning, to mitigate this we can use a baseline. The cause of the high variance problem is the reward scale, we think of policy gradient as it increases the probability of taking good actions and decreases it for bad actions, but mostly this is not the case, imagine a situation where the "good" episode return was 10 and the "bad" one was 5, then both probabilities of the actions in those episodes will be increased, which is not what we want, this problem is what baselines can solve. Mathematically, a baseline is a function when added to an expectation, does not change the expected value (or does not introduce bias), but at the same time, it can significantly affect the variance. Following this definition, we want a baseline for the policy gradient that can reduce its high variance and does not change its direction, a natural thing to do is to take the actions that are better than average, increase their probability, and decrease the probability of the actions that are worse than average, this is implemented by calculating the average reward over the trajectory and subtract it from the reward at the current timestep, this kind of baselines is called the average reward baseline. Now, we will show how baselines do not change the expected value, and we can choose any baselines we want.

baseline, equation, policy gradient, (9 more...)

Country:

Africa > Sudan > Khartoum State > Khartoum (0.05)
Africa > Sudan > Khartoum (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)