Integration of reinforcement learning with unmanned aerial vehicles (UAVs) to achieve autonomous flight has been an active research area in recent years. An important part focuses on obstacle detection and avoidance for UAVs navigating through an environment. Exploration in an unseen environment can be tackled with Deep Q-Network (DQN). However, value exploration with uniform sampling of actions may lead to redundant states, where often the environments inherently bear sparse rewards. To resolve this, we present two techniques for improving exploration for UAV obstacle avoidance. The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation. The second is a guidance-based approach using a Domain Network which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action. Performance and evaluation of these approaches were implemented in multiple 3-D simulation environments, with variation in complexity. The proposed approach demonstrates a two-fold improvement in average rewards compared to state of the art.
Original video games of the 1970s contained very little, if any, Artificial Intelligence (AI). Game code in these early days was made up of rather complex "if" statements that allowed for a fixed (and not always spontaneous) number of game choices and scenarios. Today's video games work using the same fundamental concepts that games created in the early 1980s and 1990s used; they're just scaled with more data and more processing power. That's not to say that the games themselves have not changed since 1982. Today's games have extraordinary graphics, sound, and stories compared to earlier trailblazers.
CSE Assistant Professor Rose Yu, who recently arrived from Northeastern University in Boston, is developing physics-guided machine learning techniques to model spatiotemporal data. She investigates traffic flows, human mobility and fluid dynamics, but her passion for computer science began more humbly. "I think it was because of my love for computer video games," said Yu. "I played a lot of World of Warcraft in high school." That pastime sparked an early interest in computers and later in machine learning. Yu earned her PhD at USC, where she was honored for best dissertation.
A course that will help you implement reinforcement learning in your projects!! In the last few years, we heard about Google's AlphaGo defeating the GO champion; we heard that the latest AIs are now playing Super Mario or Dota2, or even AI-powered self-driving cars (Tesla) have started carrying passengers without human assistance. If all this sounds crazy, then brace yourself for the future because development in AI is increasing at a pace like never before. Reinforcement learning is one such development in AI that has opened a whole new world. To help you learn this concept, we are set to launch an entire curation dedicated to Reinforcement Learning.
I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Deep reinforcement leaning(DRL) has been at the center of some of the most important artificial intelligence(AI) breakthroughs of the last decade. Given its dependency on interactions with an environment, DRL is regularly applied to many real world scenarios such as self-driving vehicles that operate in really complex environments.
Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.
We were delighted to be joined by Lex Fridman at the San Francisco edition of the Deep Learning Summit, taking part in both a'Deep Dive' session, allowing for a great amount of attendee interaction and collaboration, alongside a fireside chat with OpenAI Co-Founder & Chief Scientist, Ilya Sutskever. The MIT Researcher shared his thoughts on recent developments in AI and its current standing, highlighting its growth in recent years. Lex then referenced, Lee Sedol, the South Korean 9th Dan GO player, whom at this time is the only human to ever beat AI at a video game, which has since become somewhat of an impossible task, describing this feat as a seminal moment and one which changed the course of not only deep learning but also reinforcement learning, increasing the social belief in the subsection of AI. Since then, of course, we have seen video games and tactically based games, including Starcraft become imperative in the development of AI. The comparison of Reinforcement Learning to Human Learning is something which we often come across, referenced by Lex as something which needed addressing, with humans seemingly learning through "very few examples" as opposed to the heavy data sets needed in AI, but why is that?
Behavior Trees (BTs) were invented as a tool to enable modular AI in computer games, but have received an increasing amount of attention in the robotics community in the last decade. With rising demands on agent AI complexity, game programmers found that the Finite State Machines (FSM) that they used scaled poorly and were difficult to extend, adapt and reuse. In BTs, the state transition logic is not dispersed across the individual states, but organized in a hierarchical tree structure, with the states as leaves. This has a significant effect on modularity, which in turn simplifies both synthesis and analysis by humans and algorithms alike. These advantages are needed not only in game AI design, but also in robotics, as is evident from the research being done. In this paper we present a comprehensive survey of the topic of BTs in Artificial Intelligence and Robotic applications. The existing literature is described and categorized based on methods, application areas and contributions, and the paper is concluded with a list of open research challenges.
We consider the challenging problem of high speed autonomous racing in a realistic Formula One environment. DeepRacing is a novel end-to-end framework, and a virtual testbed for training and evaluating algorithms for autonomous racing. The virtual testbed is implemented using the realistic F1 series of video games, developed by Codemasters, which many Formula One drivers use for training. This virtual testbed is released under an open-source license both as a standalone C++ API and as a binding to the popular Robot Operating System 2 (ROS2) framework. This open-source API allows anyone to use the high fidelity physics and photo-realistic capabilities of the F1 game as a simulator, and without hacking any game engine code. We use this framework to evaluate several neural network methodologies for autonomous racing. Specifically, we consider several fully end-to-end models that directly predict steering and acceleration commands for an autonomous race car as well as a model that predicts a list of waypoints to follow in the car's local coordinate system, with the task of selecting a steering/throttle angle left to a classical control algorithm. We also present a novel method of autonomous racing by training a deep neural network to predict a parameterized representation of a trajectory rather than a list of waypoints. We evaluate these models performance in our open-source simulator and show that trajectory prediction far outperforms end-to-end driving. Additionally, we show that open-loop performance for an end-to-end model, i.e. root-mean-square error for a model's predicted control values, does not necessarily correlate with increased driving performance in the closed-loop sense, i.e. actual ability to race around a track. Finally, we show that our proposed model of parameterized trajectory prediction outperforms both end-to-end control and waypoint prediction.
A team of Uber AI researchers has developed a set of algorithms, Go-Explore, that reportedly beats any Atari 2600 game with "superhuman" scores, including ones where AI previously had trouble besting its organic rivals. The key is a system that takes care to remember promising states and returns to those states before it sets out exploring. Go-Explore saw improvement by "orders of magnitude" in some games. It was the first to beat every level in Montezuma's Revenge, and got a "near-perfect" Pitfall score -- both of these are particularly challenging for reinforcement learning systems like this. DeepMind's Agent57 reached a similar benchmark, according to the team's Jeff Clune, but through "entirely different methods."