Collaborating Authors

Reinforcement Learning: AI-Alerts

Apple Secures Another Autonomous Vehicle Patent


Apple has secured another patent related to autonomous vehicles even as it remains tight-lipped about its AV plans. Patent number 11,243,532 from the U.S. Patent and Trademark Office relates to machine learning systems and algorithms for reasoning, decision-making and motion-planning for controlling the motion of autonomous or partially autonomous vehicles. First unearthed by Patently Apple, the patent, titled "evaluating varying-sized action spaces using reinforcement learning," details a system that evaluates actions using a reinforcement learning model to help direct the movements of a vehicle. "A set of actions corresponding to a particular state of the environment of a vehicle is identified. A respective encoding is generated for different actions of the set, using elements such as distinct colors to distinguish attributes such as target lane segments," the abstract reads.

Amazon Research Introduces Deep Reinforcement Learning For NLU Ranking Tasks


In recent years, voice-based virtual assistants such as Google Assistant and Amazon Alexa have grown popular. This has presented both potential and challenges for natural language understanding (NLU) systems. These devices' production systems are often trained by supervised learning and rely significantly on annotated data. But, data annotation is costly and time-consuming. Furthermore, model updates using offline supervised learning can take long and miss trending requests.

AI Weekly: AI research still has a reproducibility problem


The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Many systems like autonomous vehicle fleets and drone swarms can be modeled as Multi-Agent Reinforcement Learning (MARL) tasks, which deal with how multiple machines can learn to collaborate, coordinate, compete, and collectively learn. It's been shown that machine learning algorithms -- particularly reinforcement learning algorithms -- are well-suited to MARL tasks. But it's often challenging to efficiently scale them up to hundreds or even thousands of machines. One solution is a technique called centralized training and decentralized execution (CTDE), which allows an algorithm to train using data from multiple machines but make predictions for each machine individually (e.g., like when a driverless car should turn left).

Reinforcement learning competition pushes the boundaries of embodied AI


This highlights the complexity of human vision and agency. The next time you go to a supermarket, consider how easily you can find your way through aisles, tell the difference between different products, reach for and pick up different items, place them in your basket or cart, and choose your path in an efficient way. And you're doing all this without access to segmentation and depth maps and by reading items from a crumpled handwritten note in your pocket. Above: Experiments show hybrid AI models that combine reinforcement learning with symbolic planners are better suited to solving the ThreeDWorld Transport Challenge. The TDW-Transport Challenge is in the process of accepting submissions.

Novel deep learning framework for symbolic regression


Lawrence Livermore National Laboratory (LLNL) computer scientists have developed a new framework and an accompanying visualization tool that leverages deep reinforcement learning for symbolic regression problems, outperforming baseline methods on benchmark problems. The paper was recently accepted as an oral presentation at the International Conference on Learning Representations (ICLR 2021), one of the top machine learning conferences in the world. The conference takes place virtually May 3-7. In the paper, the LLNL team describes applying deep reinforcement learning to discrete optimization--problems that deal with discrete "building blocks" that must be combined in a particular order or configuration to optimize a desired property. The team focused on a type of discrete optimization called symbolic regression--finding short mathematical expressions that fit data gathered from an experiment.

Algorithm helps artificial intelligence systems dodge "adversarial" inputs


In a perfect world, what you see is what you get. If this were the case, the job of artificial intelligence systems would be refreshingly straightforward. Take collision avoidance systems in self-driving cars. If visual input to on-board cameras could be trusted entirely, an AI system could directly map that input to an appropriate action -- steer right, steer left, or continue straight -- to avoid hitting a pedestrian that its cameras see in the road. But what if there's a glitch in the cameras that slightly shifts an image by a few pixels?

AI smashes video game high scores by remembering its past success

New Scientist

Montezuma's Revenge is one of the most challenging Atari games An artificial intelligence that can remember its previous successes and use them to create new strategies has achieved record high scores on some of the hardest video games on classic Atari consoles. Many AI systems use reinforcement learning, in which an algorithm is given positive or negative feedback on its progress towards a particular goal after each step it takes, encouraging it towards a particular solution. This technique was used by AI firm DeepMind to train AlphaGo, which beat a world champion Go player in 2016. Adrien Ecoffet at Uber AI Labs and OpenAI in California and his colleagues hypothesised that such algorithms often stumble upon encouraging avenues but then jump to another area in the hunt for something more promising, leaving better solutions overlooked. "What do you do when you don't know anything about your task?" says Ecoffet. "If you just wave your arms around, it's unlikely that you're ever going to make a coffee."

Learning in PyTorch Modern Reinforcement Learning: Deep Q


You will then learn how to implement these in pythonic and concise PyTorch code, that can be extended to include any future deep Q learning algorithms. These algorithms will be used to solve a variety of environments from the Open AI gym's Atari library, including Pong, Breakout, and Bankheist. You will learn the key to making these Deep Q Learning algorithms work, which is how to modify the Open AI Gym's Atari library to meet the specifications of the original Deep Q Learning papers. Also included is a mini course in deep learning using the PyTorch framework. This is geared for students who are familiar with the basic concepts of deep learning, but not the specifics, or those who are comfortable with deep learning in another framework, such as Tensorflow or Keras.

Robots learn to get back up after a fall in an unfamiliar environment

New Scientist - News

Robots can pick themselves up after a fall, even in an unfamiliar environment, thanks to an artificially intelligent controller that can adapt to new scenarios. It could make four-legged robots more useful in responding to natural disasters, such as earthquakes. Zhibin (Alex) Li at the University of Edinburgh, UK and his colleagues used an AI technique called deep reinforcement learning to teach four-legged robots a set of basic skills, such as trotting, steering and fall recovery. This involves the robots experimenting with different ways of moving and being rewarded with a numerical score for achieving a certain goal, such as standing up after a fall, and penalised for failing. This lets the AI recognise which actions are desired and repeat them in the similar situations in the future.

A robot triumphs in a curling match against elite humans


A robot equipped with artificial intelligence (AI) can excel at the Olympic sport of curling -- and even beat top-level human teams. Success requires precision and strategy, but the game is less complex than other real-world applications of robotics. That makes curling a useful test case for AI technologies, which often perform well in simulations but falter in real-world scenarios with changing conditions. Using a method called adaptive deep reinforcement learning, Seong-Whan Lee and his colleagues at Korea University in Seoul created an algorithm that learns through trial and error to adjust a robot's throws to account for changing conditions, such as the ice surface and the positions of stones. The team's robot, nicknamed Curly, needed a few test throws to calibrate itself to the curling rink where it was to compete.