Agents
New milestones in embodied AI
To accomplish a task like checking to see whether you locked the front door or retrieving a cell phone that's ringing in an upstairs bedroom, AI assistants of the future must learn to plan their route, navigate effectively, look around their physical environment, listen to what's happening around them, and build memories of the 3D space. These smarter assistants will require new advances in embodied AI, which seeks to teach machines to understand and interact with the complexities of the physical world as people do. Today, we're announcing several new milestones that introduce important capabilities to push the limits of embodied agents even further. The first audio-visual platform for embodied AI. With this new platform, researchers can train AI agents in 3D environments with highly realistic acoustics.
iCVI-ARTMAP: Accelerating and improving clustering using adaptive resonance theory predictive mapping and incremental cluster validity indices
da Silva, Leonardo Enzo Brito, Rayapati, Nagasharath, Wunsch, Donald C. II
This paper presents an adaptive resonance theory predictive mapping (ARTMAP) model which uses incremental cluster validity indices (iCVIs) to perform unsupervised learning, namely iCVI-ARTMAP. Incorporating iCVIs to the decision-making and many-to-one mapping capabilities of ARTMAP can improve the choices of clusters to which samples are incrementally assigned. These improvements are accomplished by intelligently performing the operations of swapping sample assignments between clusters, splitting and merging clusters, and caching the values of variables when iCVI values need to be recomputed. Using recursive formulations enables iCVI-ARTMAP to considerably reduce the computational burden associated with cluster validity index (CVI)-based offline clustering. Depending on the iCVI and the data set, it can achieve running times up to two orders of magnitude shorter than when using batch CVI computations. In this work, the incremental versions of Calinski-Harabasz, WB-index, Xie-Beni, Davies-Bouldin, Pakhira-Bandyopadhyay-Maulik, and negentropy increment were integrated into fuzzy ARTMAP. Experimental results show that, with proper choice of iCVI, iCVI-ARTMAP outperformed fuzzy adaptive resonance theory (ART), dual vigilance fuzzy ART, kmeans, spectral clustering, Gaussian mixture models and hierarchical agglomerative clustering algorithms in most of the synthetic benchmark data sets. It also performed competitively on real world image benchmark data sets when clustering on projections and on latent spaces generated by a deep clustering model. Naturally, the performance of iCVI-ARTMAP is subject to the selected iCVI and its suitability to the data at hand; fortunately, it is a general model wherein other iCVIs can be easily embedded.
AI Slays Top F-16 Pilot In DARPA Dogfight Simulation
WASHINGTON: In a 5 to 0 sweep, an AI'pilot' developed by Heron Systems beat one of the Air Force's top F-16 fighter pilots in DARPA's simulated aerial dogfight contest today. "It's a giant leap," said DARPA's Justin (call sign "Glock") Mock, who served as a commentator on the trials. AI still has a long way to go before the Air Force pilots would be ready to hand over the stick to an artificial intelligence during combat, DARPA officials said during today's live broadcast of the AlphaDogfight trials. But the three-day trials show that AI systems can credibly maneuver an aircraft in a simple, one-on-one combat scenario and shoot its forward guns in a classic, WWII-style dogfight. On the other hand, they said, it was an impressive showing by an AI agent after only a year of development.
Robust and Efficient Swarm Communication Topologies for Hostile Environments
Mann, Vipul, Sivaram, Abhishek, Das, Laya, Venkatasubramanian, Venkat
Swarm Intelligence-based optimization techniques combine systematic exploration of the search space with information available from neighbors and rely strongly on communication among agents. These algorithms are typically employed to solve problems where the function landscape is not adequately known and there are multiple local optima that could result in premature convergence for other algorithms. Applications of such algorithms can be found in communication systems involving design of networks for efficient information dissemination to a target group, targeted drug-delivery where drug molecules search for the affected site before diffusing, and high-value target localization with a network of drones. In several of such applications, the agents face a hostile environment that can result in loss of agents during the search. Such a loss changes the communication topology of the agents and hence the information available to agents, ultimately influencing the performance of the algorithm. In this paper, we present a study of the impact of loss of agents on the performance of such algorithms as a function of the initial network configuration. We use particle swarm optimization to optimize an objective function with multiple sub-optimal regions in a hostile environment and study its performance for a range of network topologies with loss of agents. The results reveal interesting trade-offs between efficiency, robustness, and performance for different topologies that are subsequently leveraged to discover general properties of networks that maximize performance. Moreover, networks with small-world properties are seen to maximize performance under hostile conditions.
Audio-Visual Waypoints for Navigation
Chen, Changan, Majumder, Sagnik, Al-Halah, Ziad, Gao, Ruohan, Ramakrishnan, Santhosh Kumar, Grauman, Kristen
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements 1) audio-visual waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Both new ideas capitalize on the synergy of audio and visual data for revealing the geometry of an unmapped space. We demonstrate our approach on the challenging Replica environments of real-world 3D scenes. Our model improves the state of the art by a substantial margin, and our experiments reveal that learning the links between sights, sounds, and space is essential for audio-visual navigation.
Multi-Agent Reinforcement Learning with Graph Clustering
Zhou, Tianze, Zhang, Fubiao, Wang, Chenfei
In this paper, we introduce the group concept into multi-agent reinforcement learning. In this method, agents are divided into several groups and each group completes a specific subtask so that agents can cooperate to complete the main task. Existing methods use the communication vector to exchange information between agents. This may encounter communication redundancy. To solve this problem, we propose a MARL method based on graph clustering. It allows agents to adaptively learn group features and replaces the communication operation. In our method, agent features are divide into two types, including in-group features and individual features. They represent the generality and differences between agents, respectively. Based on the graph attention network(GAT), we introduce the graph clustering method as a punishment to optimize agent group feature. Then these features are used to generate individual Q value. To overcome the consistent problem brought by GAT, we introduce the split loss to distinguish agent features. Our method is easy to convert into the CTDE framework via using Kullback-Leibler divergence method. Empirical results are evaluated on a challenging set of StarCraft II micromanagement tasks. The result shows that our method outperforms existing multi-agent reinforcement learning methods and the performance increases with the number of agents increasing.
Algorithmic Transparency with Strategic Users
Wang, Qiaochu, Huang, Yan, Jasin, Stefanus, Singh, Param Vir
Should firms that apply machine learning algorithms in their decision-making make their algorithms transparent to the users they affect? Despite growing calls for algorithmic transparency, most firms have kept their algorithms opaque, citing potential gaming by users that may negatively affect the algorithm's predictive power. We develop an analytical model to compare firm and user surplus with and without algorithmic transparency in the presence of strategic users and present novel insights. We identify a broad set of conditions under which making the algorithm transparent benefits the firm. We show that, in some cases, even the predictive power of machine learning algorithms may increase if the firm makes them transparent. By contrast, users may not always be better off under algorithmic transparency. The results hold even when the predictive power of the opaque algorithm comes largely from correlational features and the cost for users to improve on them is close to zero. Overall, our results show that firms should not view manipulation by users as bad. Rather, they should use algorithmic transparency as a lever to motivate users to invest in more desirable features.
Learning excursion sets of vector-valued Gaussian random fields for autonomous ocean sampling
Fossum, Trygve Olav, Travelletti, Cédric, Eidsvik, Jo, Ginsbourger, David, Rajan, Kanna
Improving and optimizing oceanographic sampling is a crucial task for marine science and maritime resource management. Faced with limited resources in understanding processes in the water-column, the combination of statistics and autonomous systems provide new opportunities for experimental design. In this work we develop efficient spatial sampling methods for characterizing regions defined by simultaneous exceedances above prescribed thresholds of several responses, with an application focus on mapping coastal ocean phenomena based on temperature and salinity measurements. Specifically, we define a design criterion based on uncertainty in the excursions of vector-valued Gaussian random fields, and derive tractable expressions for the expected integrated Bernoulli variance reduction in such a framework. We demonstrate how this criterion can be used to prioritize sampling efforts at locations that are ambiguous, making exploration more effective. We use simulations to study and compare properties of the considered approaches, followed by results from field deployments with an autonomous underwater vehicle as part of a study mapping the boundary of a river plume. The results demonstrate the potential of combining statistical methods and robotic platforms to effectively inform and execute data-driven environmental sampling.
Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning
Zhao, Wenshuai, Queralta, Jorge Peña, Qingqing, Li, Westerlund, Tomi
Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.
Multi-Agent Deep Reinforcement Learning enabled Computation Resource Allocation in a Vehicular Cloud Network
Xu, Shilin, Guo, Caili, Hu, Rose Qingyang, Qian, Yi
In this paper, we investigate the computational resource allocation problem in a distributed Ad-Hoc vehicular network with no centralized infrastructure support. To support the ever increasing computational needs in such a vehicular network, the distributed virtual cloud network (VCN) is formed, based on which a computational resource sharing scheme through offloading among nearby vehicles is proposed. In view of the time-varying computational resource in VCN, the statistical distribution characteristics for computational resource are analyzed in detail. Thereby, a resource-aware combinatorial optimization objective mechanism is proposed. To alleviate the non-stationary environment caused by the typically multi-agent environment in VCN, we adopt a centralized training and decentralized execution framework. In addition, for the objective optimization problem, we model it as a Markov game and propose a DRL based multi-agent deep deterministic reinforcement learning (MADDPG) algorithm to solve it. Interestingly, to overcome the dilemma of lacking a real central control unit in VCN, the allocation is actually completed on the vehicles in a distributed manner. The simulation results are presented to demonstrate our scheme's effectiveness.