Goto

Collaborating Authors

 Reinforcement Learning


Little Known Artificial Intelligence Secrets: What Unsupervised Learning Really Means - insideBIGDATA

#artificialintelligence

The ambiguity surrounding Artificial Intelligence is legion. The majority of enterprise proclamations of AI are simply applications of machine learning. Although this technology involves supervised learning, unsupervised learning, and reinforcement learning, misconceptions about these terms--and their use throughout the enterprise--abound. Many of these misapprehensions are attributed to the names of these forms of statistical AI. For example, some believe that simply using machine learning as a feedback loop is reinforcement learning.


New milestones in embodied AI

#artificialintelligence

To accomplish a task like checking to see whether you locked the front door or retrieving a cell phone that's ringing in an upstairs bedroom, AI assistants of the future must learn to plan their route, navigate effectively, look around their physical environment, listen to what's happening around them, and build memories of the 3D space. These smarter assistants will require new advances in embodied AI, which seeks to teach machines to understand and interact with the complexities of the physical world as people do. Today, we're announcing several new milestones that introduce important capabilities to push the limits of embodied agents even further. The first audio-visual platform for embodied AI. With this new platform, researchers can train AI agents in 3D environments with highly realistic acoustics.


AI Defeats Human Pilot in DARPA Organized Dogfight

#artificialintelligence

It's official: The robots are taking over. But in a significant development on August 20, an artificial intelligence (AI) program managed to defeat a human F-16 pilot in simulated dogfights. The AI program, designed by tech firm Heron Systems, was pitched against the human pilot in an environment resembling an elaborate video game during the third and final event of the AlphaDogfight trials organized by U.S. Defense Advanced Research Projects Agency (DARPA). Heron System's website notes that the program was based on deep reinforcement learning โ€“ an AI technique that combines insights from behavioral psychology with how the human cortex is structured and functions โ€“ along with unspecified innovations. The bested human operator, publicly known only by their callsign, "Banger," was reported to have been trained at the Weapons School at Nellis Air Force Base in Nevada.


Reimagining City Configuration: Automated Urban Planning via Adversarial Learning

arXiv.org Artificial Intelligence

Urban planning refers to the efforts of designing land-use configurations. Effective urban planning can help to mitigate the operational and social vulnerability of a urban system, such as high tax, crimes, traffic congestion and accidents, pollution, depression, and anxiety. Due to the high complexity of urban systems, such tasks are mostly completed by professional planners. But, human planners take longer time. The recent advance of deep learning motivates us to ask: can machines learn at a human capability to automatically and quickly calculate land-use configuration, so human planners can finally adjust machine-generated plans for specific needs? To this end, we formulate the automated urban planning problem into a task of learning to configure land-uses, given the surrounding spatial contexts. To set up the task, we define a land-use configuration as a longitude-latitude-channel tensor, where each channel is a category of POIs and the value of an entry is the number of POIs. The objective is then to propose an adversarial learning framework that can automatically generate such tensor for an unplanned area. In particular, we first characterize the contexts of surrounding areas of an unplanned area by learning representations from spatial graphs using geographic and human mobility data. Second, we combine each unplanned area and its surrounding context representation as a tuple, and categorize all the tuples into positive (well-planned areas) and negative samples (poorly-planned areas). Third, we develop an adversarial land-use configuration approach, where the surrounding context representation is fed into a generator to generate a land-use configuration, and a discriminator learns to distinguish among positive and negative samples.


DSP: A Differential Spatial Prediction Scheme for Comprehensive real industrial datasets

arXiv.org Machine Learning

Inverse Distance Weighted models (IDW) have been widely used for predicting and modeling multidimensional space in multimodal industrial processes. However, the more complex the structure of multidimensional space, the lower the performance of IDW models, and real industrial datasets tend to have more complex spatial structure. To solve this problem, a new framework for spatial prediction and modeling based on deep reinforcement learning network is proposed. In the proposed framework, the internal relationship between state and action is enhanced by reusing the state values in the Q network, and the convergence rate and stability of the deep reinforcement learning network are improved. The improved deep reinforcement learning network is then used to search for and learn the hyperparameters of each sample point in the inverse distance weighted model. These hyperparameters can reflect the spatial structure of the current industrial dataset to some extent. Then a spatial distribution of hyperparameters is constructed based on the learned hyperparameters. Each interpolation point obtains corresponding hyperparameters from the hyperparametric spatial distribution and brings them into the classical IDW models for prediction, thus achieving differential spatial prediction and modeling. The simulation results show that the proposed framework is suitable for real industrial datasets with complex spatial structure characteristics and is more accurate than current IDW models in spatial prediction.


Maze solver using Naive Reinforcement Learning for beginners

#artificialintelligence

Q-Learning is centered around the Bellman Equation and finding the q-value for each action at the current state. Finding an optimal policy involves recursively solving this equation multiple times. Only the main parts of the Bellman Equation relevant to this implementation will be explained in this article. Who wants to be in a 2D world anyway? Wellโ€ฆ lets put a smile on that face, shall we?


An AI Just Beat a Human F-16 Pilot In a Dogfight -- Again

#artificialintelligence

The never-ending saga of machines outperforming humans has a new chapter. An AI algorithm has again beaten a human fighter pilot in a virtual dogfight. The contest was the finale of the U.S. military's AlphaDogfight challenge, an effort to "demonstrate the feasibility of developing effective, intelligent autonomous agents capable of defeating adversary aircraft in a dogfight. Last August, Defense Advanced Research Project Agency, or DARPA, selected eight teams ranging from large, traditional defense contractors like Lockheed Martin to small groups like Heron Systems to compete in a series of trials in November and January. In the final, on Thursday, Heron Systems emerged as the victor against the seven other teams after two days of old school dogfights, going after each other using nose-aimed guns only. Heron then faced off against a human fighter pilot sitting in a simulator and wearing a virtual reality helmet, and won five rounds to zero. The other winner in Thursday's event was deep reinforcement learning, wherein artificial intelligence algorithms get to try out a task in a virtual environment over and over again, sometimes very quickly, until they develop something like understanding. Deep reinforcement played a key role in Heron System's agent, as well as Lockheed Martin's, the second runner up. Matt Tarascio, vice president of artificial intelligence, and Lee Ritholtz, director and chief architect of artificial intelligence, from Lockheed Martin told Defense One that trying to get an algorithm to perform well in air combat is very different than teaching software simply "to fly," or maintain a particular direction, altitude, and speed. Software will begin with a complete lack of understanding about even very basic flight tasks, explained Ritholtz, putting it at a disadvantage against any human, at first. "You don't have to teach a human [that] it shouldn't crash into the groundโ€ฆ They have basic instincts that the algorithm doesn't have," in terms of training. "That means dying a lot.


Model-Free Episodic Control with State Aggregation

arXiv.org Artificial Intelligence

Episodic control provides a highly sample-efficient method for reinforcement learning while enforcing high memory and computational requirements. This work proposes a simple heuristic for reducing these requirements, and an application to Model-Free Episodic Control (MFEC) is presented. Experiments on Atari games show that this heuristic successfully reduces MFEC computational demands while producing no significant loss of performance when conservative choices of hyperparameters are used. Consequently, episodic control becomes a more feasible option when dealing with reinforcement learning tasks.


Inverse Reinforcement Learning with Natural Language Goals

arXiv.org Artificial Intelligence

Humans generally use natural language to communicate task requirements amongst each other. It is desirable that this would be similar for autonomous machines (e.g. robots) such that humans can convey goals or assign tasks more easily. However, understanding natural language goals and mapping them to sequences of states and actions is challenging. Previous research has encountered difficulty generalizing learned policies to new natural language goals and environments. In this paper, we propose an adversarial inverse reinforcement learning algorithm that learns a language-conditioned policy and reward function. To improve the generalization of the learned policy and reward function, we use a variational goal generator that relabels trajectories and samples diverse goals during training. Our algorithm outperforms baselines by a large margin on a vision-based natural language instruction following dataset, demonstrating a promising advance in providing natural language instructions to agents without reliance on instruction templates.


Biomechanic Posture Stabilisation via Iterative Training of Multi-policy Deep Reinforcement Learning Agents

arXiv.org Artificial Intelligence

It is not until we become senior citizens do we recognise how much we took maintaining a simple standing posture for granted. It is truly fascinating to observe the magnitude of control the human brain exercises, in real time, to activate and deactivate the lower body muscles and solve a multi-link 3D inverted pendulum problem in order to maintain a stable standing posture. This realisation is even more apparent when training an artificial intelligence (AI) agent to maintain a standing posture of a digital musculoskeletal avatar due to the error propagation problem. In this work we address the error propagation problem by introducing an iterative training procedure for deep reinforcement learning which allows the agent to learn a finite set of actions and how to coordinate between them in order to achieve a stable standing posture. The proposed training approach allowed the agent to increase standing duration from 4 seconds using the traditional training method to 348 seconds using the proposed method. The proposed training method allowed the agent to generalise and accommodate perception and actuation noise for almost 108 seconds.