Collaborating Authors


Machine learning program for games inspires development of groundbreaking scientific tool


We learn new skills by repetition and reinforcement learning. Through trial and error, we repeat actions leading to good outcomes, try to avoid bad outcomes and seek to improve those in between. Researchers are now designing algorithms based on a form of artificial intelligence that uses reinforcement learning. They are applying them to automate chemical synthesis, drug discovery and even play games like chess and Go. Scientists at the U.S. Department of Energy's (DOE) Argonne National Laboratory have developed a reinforcement learning algorithm for yet another application.

IBM's AutoAI Has The Smarts To Make Data Scientists A Lot More Productive – But What's Scary Is That It's Getting A Whole Lot Smarter


I recently had the opportunity to discuss current IBM artificial intelligence developments with Dr. Lisa Amini, an IBM Distinguished Engineer and the Director of IBM Research Cambridge, home to the MIT-IBM Watson AI Lab. Dr. Amini was previously Director of Knowledge & Reasoning Research in the Cognitive Computing group at IBM's TJ Watson Research Center in New York. Dr. Amini earned her Ph.D. degree in Computer Science from Columbia University. Dr. Amini and her team are part of IBM Research tasked with creating the next generation of Automated AI and data science. I was interested in automation's impact on the lifecycles of artificial intelligence and machine learning and centered our discussion around next-generation capabilities for AutoAI. AutoAI automates the highly complex process of finding and optimizing the best ML model, features, and model hyperparameters for your data.

Pieter Abbeel wins ACM Prize in Computing


Congratulations to Pieter Abbeel who has been awarded the ACM Prize in Computing for his contribution to robot learning, including learning from demonstrations and deep reinforcement learning for robotic control. Pieter's research has covered the following: Pieter Abbeel is a Professor of Computer Science and Electrical Engineering at the University of California, Berkeley and the Co-Founder, President and Chief Scientist at Covariant, an AI robotics company. He also hosts the The Robot Brains podcast. The ACM Prize in Computing recognizes an early- to mid-career fundamental, innovative contribution in computing that, through its depth, impact and broad implications, exemplifies the greatest achievements in the discipline. The award carries a prize of $250,000.

Combinatorial PurgedKFold Cross-Validation for Deep Reinforcement Learning


Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. This article is written by Berend Gort & Bruce Yang, core team members of the Open-Source project AI4Finance. This project is an open-source community sharing AI tools for finance, and a part of the Columbia University in New York. Our previous article described the Combinatorial PurgedKFold Cross-Validation method in detail for classifiers (or regressors) with regular predictions.

Q&A: Cathy Wu on developing algorithms to safely integrate robots into our world


Cathy Wu is the Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering and a member of the MIT Institute for Data, Systems, and Society. As an undergraduate, Wu won MIT's toughest robotics competition, and as a graduate student took the University of California at Berkeley's first-ever course on deep reinforcement learning. Now back at MIT, she's working to improve the flow of robots in Amazon warehouses under the Science Hub, a new collaboration between the tech giant and the MIT Schwarzman College of Computing. Outside of the lab and classroom, Wu can be found running, drawing, pouring lattes at home, and watching YouTube videos on math and infrastructure via 3Blue1Brown and Practical Engineering. She recently took a break from all of that to talk about her work.

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Journal of Artificial Intelligence Research

Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way intermediate states, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding intermediate rewards to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms.

How to Train your Decision-Making AIs


The combination of deep learning and decision learning has led to several impressive stories in decision-making AI research, including AIs that can play a variety of games (Atari video games, board games, complex real-time strategy game Starcraft II), control robots (in simulation and in the real world), and even fly a weather balloon. These are examples of sequential decision tasks, in which the AI agent needs to make a sequence of decisions to achieve its goal. Today, the two main approaches for training such agents are reinforcement learning (RL) and imitation learning (IL). In reinforcement learning, humans provide rewards for completing discrete tasks, with the rewards typically being delayed and sparse. For example, 100 points are given for solving the first room of Montezuma's revenge (Fig.1). In the imitation learning setting, humans can transfer knowledge and skills through step-by-step action demonstrations (Fig.2), and the agent then learns to mimic human actions.

Research advances technology of AI assistance for anesthesiologists


A new study by researchers at MIT and Massachusetts General Hospital (MGH) suggests the day may be approaching when advanced artificial intelligence systems could assist anesthesiologists in the operating room. In a special edition of Artificial Intelligence in Medicine, the team of neuroscientists, engineers, and physicians demonstrated a machine learning algorithm for continuously automating dosing of the anesthetic drug propofol. Using an application of deep reinforcement learning, in which the software's neural networks simultaneously learned how its dosing choices maintain unconsciousness and how to critique the efficacy of its own actions, the algorithm outperformed more traditional software in sophisticated, physiology-based simulations of patients. It also closely matched the performance of real anesthesiologists when showing what it would do to maintain unconsciousness given recorded data from nine real surgeries. The algorithm's advances increase the feasibility for computers to maintain patient unconsciousness with no more drug than is needed, thereby freeing up anesthesiologists for all the other responsibilities they have in the operating room, including making sure patients remain immobile, experience no pain, remain physiologically stable, and receive adequate oxygen, say co-lead authors Gabe Schamberg and Marcus Badgeley.

Efficient Policy Space Response Oracles Artificial Intelligence

Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games but suffers from two problems: (1) the computation inefficiency due to consistently evaluating current populations by simulations; and (2) the exploration inefficiency due to learning best responses against a fixed meta-strategy at each iteration. In this work, we propose Efficient PSRO (EPSRO) that largely improves the efficiency of the above two steps. Central to our development is the newly-introduced subroutine of minimax optimization on unrestricted-restricted (URR) games. By solving URR at each step, one can evaluate the current game and compute the best response in one forward pass with no need for game simulations. Theoretically, we prove that the solution procedures of EPSRO offer a monotonic improvement on exploitability. Moreover, a desirable property of EPSRO is that it is parallelizable, this allows for efficient exploration in the policy space that induces behavioral diversity. We test EPSRO on three classes of games, and report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.

PRIMA: Planner-Reasoner Inside a Multi-task Reasoning Agent Artificial Intelligence

We consider the problem of multi-task reasoning (MTR), where an agent can solve multiple tasks via (first-order) logic reasoning. This capability is essential for human-like intelligence due to its strong generalizability and simplicity for handling multiple tasks. However, a major challenge in developing effective MTR is the intrinsic conflict between reasoning capability and efficiency. An MTR-capable agent must master a large set of "skills" to tackle diverse tasks, but executing a particular task at the inference stage requires only a small subset of immediately relevant skills. How can we maintain broad reasoning capability and also efficient specific-task performance? To address this problem, we propose a Planner-Reasoner framework capable of state-of-the-art MTR capability and high efficiency. The Reasoner models shareable (first-order) logic deduction rules, from which the Planner selects a subset to compose into efficient reasoning paths. The entire model is trained in an end-to-end manner using deep reinforcement learning, and experimental studies over a variety of domains validate its effectiveness.