Goto

Collaborating Authors

Reinforcement Learning


An experimental design perspective on model-based reinforcement learning

AIHub

We evaluate BARL on the TQRL setting in 5 environments which span a variety of reward function types, dimensionalities, and amounts of required data. In this evaluation, we estimate the minimum amount of data an algorithm needs to learn a controller. The evaluation environments include the standard underactuated pendulum swing-up task, a cartpole swing-up task, the standard 2-DOF reacher task, a navigation problem where the agent must find a path across pools of lava, and a simulated nuclear fusion control problem where the agent is tasked with modulating the power injected into the plasma to achieve a target pressure. To assess the performance of BARL in solving MDPs quickly, we assembled a group of reinforcement learning algorithms that represent the state of the art in solving continuous MDPs. We compare against model-based algorithms PILCO [7], PETS [2], model-predictive control with a GP (MPC), and uncertainty sampling with a GP (), as well as model-free algorithms SAC [3], TD3 [8], and PPO [9].


Should I use offline RL or imitation learning?

AIHub

Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches. Offline reinforcement learning allows learning policies from previously collected data, which has profound implications for applying RL in domains where running trial-and-error learning is impractical or dangerous, such as safety-critical settings like autonomous driving or medical treatment planning. In such scenarios, online exploration is simply too risky, but offline RL methods can learn effective policies from logged data collected by humans or heuristically designed controllers. Prior learning-based control methods have also approached learning from existing data as imitation learning: if the data is generally "good enough," simply copying the behavior in the data can lead to good results, and if it's not good enough, then filtering or reweighting the data and then copying can work well. Several recent works suggest that this is a viable alternative to modern offline RL methods.


Designing societally beneficial Reinforcement Learning (RL) systems

Robohub

Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind's work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential for real world applications of RL should also come with a healthy dose of caution – for example RL policies are well known to be vulnerable to exploitation, and methods for safe and robust policy development are an active area of research. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine learning systems. The focus of these research efforts to date has been to account for shortcomings of datasets or supervised learning practices that can harm individuals.


Artificiel Inteligence Free Structure Metasurface Optimization

#artificialintelligence

Metasurface refers to a nano-optical device that achieves unprecedented properties of light using a structure much smaller than the wavelength of light. Nano-optical devices control the characteristics of light at the micro level, and can be used for LiDAR beam steering devices used for autonomous driving, ultra-high-resolution imaging technology, optical properties control of light emitting devices used in displays, and hologram generation. . Recently, as the expected performance of a nano-optical device increases, interest in optimizing a device having a free structure in order to achieve a performance far exceeding that of the device structure in the past is increasing. This is the first case of solving a problem with a large design space such as a free structure by applying reinforcement learning.


Deep reinforcement learning for self-tuning laser source of dissipative solitons - Scientific Reports

#artificialintelligence

Increasing complexity of modern laser systems, mostly originated from the nonlinear dynamics of radiation, makes control of their operation more and more challenging, calling for development of new approaches in laser engineering. Machine learning methods, providing proven tools for identification, control, and data analytics of various complex systems, have been recently applied to mode-locked fiber lasers with the special focus on three key areas: self-starting, system optimization and characterization. However, the development of the machine learning algorithms for a particular laser system, while being an interesting research problem, is a demanding task requiring arduous efforts and tuning a large number of hyper-parameters in the laboratory arrangements. It is not obvious that this learning can be smoothly transferred to systems that differ from the specific laser used for the algorithm development by design or by varying environmental parameters. Here we demonstrate that a deep reinforcement learning (DRL) approach, based on trials and errors and sequential decisions, can be successfully used for control of the generation of dissipative solitons in mode-locked fiber laser system. We have shown the capability of deep Q-learning algorithm to generalize knowledge about the laser system in order to find conditions for stable pulse generation. Region of stable generation was transformed by changing the pumping power of the laser cavity, while tunable spectral filter was used as a control tool. Deep Q-learning algorithm is suited to learn the trajectory of adjusting spectral filter parameters to stable pulsed regime relying on the state of output radiation. Our results confirm the potential of deep reinforcement learning algorithm to control a nonlinear laser system with a feed-back. We also demonstrate that fiber mode-locked laser systems generating data at high speed present a fruitful photonic test-beds for various machine learning concepts based on large datasets.


Learning Locomotion Skills Safely in the Real World

#artificialintelligence

Posted by Jimmy (Tsung-Yen) Yang, Student Researcher, Robotics at Google The promise of deep reinforcement learning (RL) in solving comp...


Machine learning program for games inspires development of groundbreaking scientific tool

#artificialintelligence

We learn new skills by repetition and reinforcement learning. Through trial and error, we repeat actions leading to good outcomes, try to avoid bad outcomes and seek to improve those in between. Researchers are now designing algorithms based on a form of artificial intelligence that uses reinforcement learning. They are applying them to automate chemical synthesis, drug discovery and even play games like chess and Go. Scientists at the U.S. Department of Energy's (DOE) Argonne National Laboratory have developed a reinforcement learning algorithm for yet another application.


Offline RL made easier: no TD learning, advantage reweighting, or transformers

AIHub

A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et al., 2021), or language descriptions of the task (Lynch and Sermanet, 2021). We find the simplicity of these methods quite appealing.


IBM's AutoAI Has The Smarts To Make Data Scientists A Lot More Productive – But What's Scary Is That It's Getting A Whole Lot Smarter

#artificialintelligence

I recently had the opportunity to discuss current IBM artificial intelligence developments with Dr. Lisa Amini, an IBM Distinguished Engineer and the Director of IBM Research Cambridge, home to the MIT-IBM Watson AI Lab. Dr. Amini was previously Director of Knowledge & Reasoning Research in the Cognitive Computing group at IBM's TJ Watson Research Center in New York. Dr. Amini earned her Ph.D. degree in Computer Science from Columbia University. Dr. Amini and her team are part of IBM Research tasked with creating the next generation of Automated AI and data science. I was interested in automation's impact on the lifecycles of artificial intelligence and machine learning and centered our discussion around next-generation capabilities for AutoAI. AutoAI automates the highly complex process of finding and optimizing the best ML model, features, and model hyperparameters for your data.


Deep Reinforcement Learning for Solving Rubik's Cube

#artificialintelligence

The Rubik's Cube is a famous 3-D puzzle toy. A regular Rubik's Cube has six faces, each of which has nine coloured stickers, and the puzzle is solved when each face has a united colour. If we count one quarter (90) turn as one move and two quarter turns (a "face" turn) as two moves, the best algorithms human-invented can solve any instance of the cube in 26 moves. My target is to let the computer learn how to solve the Rubik's Cube without feeding it any human knowledge like the symmetry of the cube. The most challenging part is the Rubik's Cube has 43,252,003,274,489,856,000 possible permutations.