Dasgupta, Prithviraj
Transformer Guided Coevolution: Improved Team Formation in Multiagent Adversarial Games
Rajbhandari, Pranav, Dasgupta, Prithviraj, Sofge, Donald
Researchers have addressed the team selection problem in multiagent team formation using evolutionary computation-based approaches We consider the problem of team formation within multiagent adversarial [14, 31], albeit for non-adversarial settings like search and games. We propose BERTeam, a novel algorithm that uses reconnaissance. In this paper, we consider the use of a transformer a transformer-based deep neural network with Masked Language based neural network to predict the set of agents which form a team. Model training to select the best team of players from a trained population. We name this technique BERTeam, and investigate its suitability We integrate this with coevolutionary deep reinforcement for team formation in multiagent adversarial games.
Reward Shaping for Improved Learning in Real-time Strategy Game Play
Kliem, John, Dasgupta, Prithviraj
We investigate the effect of reward shaping in improving the performance of reinforcement learning in the context of the real-time strategy, capture-the-flag game. The game is characterized by sparse rewards that are associated with infrequently occurring events such as grabbing or capturing the flag, or tagging the opposing player. We show that appropriately designed reward shaping functions applied to different game events can significantly improve the player's performance and training times of the player's learning algorithm. We have validated our reward shaping functions within a simulated environment for playing a marine capture-the-flag game between two players. Our experimental results demonstrate that reward shaping can be used as an effective means to understand the importance of different sub-tasks during game-play towards winning the game, to encode a secondary objective functions such as energy efficiency into a player's game-playing behavior, and, to improve learning generalizable policies that can perform well against different skill levels of the opponent.
Divide and Repair: Using Options to Improve Performance of Imitation Learning Against Adversarial Demonstrations
Dasgupta, Prithviraj
We consider the problem of learning to perform a task from demonstrations given by teachers or experts, when some of the experts' demonstrations might be adversarial and demonstrate an incorrect way to perform the task. We propose a novel technique that can identify parts of demonstrated trajectories that have not been significantly modified by the adversary and utilize them for learning, using temporally extended policies or options. We first define a trajectory divergence measure based on the spatial and temporal features of demonstrated trajectories to detect and discard parts of the trajectories that have been significantly modified by an adversarial expert, and, could degrade the learner's performance, if used for learning, We then use an options-based algorithm that partitions trajectories and learns only from the parts of trajectories that have been determined as admissible. We provide theoretical results of our technique to show that repairing partial trajectories improves the sample efficiency of the demonstrations without degrading the learner's performance. We then evaluate the proposed algorithm for learning to play an Atari-like, computer-based game called LunarLander in the presence of different types and degrees of adversarial attacks of demonstrated trajectories. Our experimental results show that our technique can identify adversarially modified parts of the demonstrated trajectories and successfully prevent the learning performance from degrading due to adversarial demonstrations.
Synthetically Generating Human-like Data for Sequential Decision Making Tasks via Reward-Shaped Imitation Learning
Brandt, Bryan, Dasgupta, Prithviraj
We consider the problem of synthetically generating data that can closely resemble human decisions made in the context of an interactive human-AI system like a computer game. We propose a novel algorithm that can generate synthetic, human-like, decision making data while starting from a very small set of decision making data collected from humans. Our proposed algorithm integrates the concept of reward shaping with an imitation learning algorithm to generate the synthetic data. We have validated our synthetic data generation technique by using the synthetically generated data as a surrogate for human interaction data to solve three sequential decision making tasks of increasing complexity within a small computer game-like setup. Different empirical and statistical analyses of our results show that the synthetically generated data can substitute the human data and perform the game-playing tasks almost indistinguishably, with very low divergence, from a human performing the same tasks.
Improving Costs and Robustness of Machine Learning Classifiers Against Adversarial Attacks via Self Play of Repeated Bayesian Games
Dasgupta, Prithviraj (U.S. Naval Research Laboratory) | Collins, Joseph B. (U.S. Naval Research Laboratory ) | McCarrick, Michael (U.S. Naval Research Laboratory)
We consider the problem of adversarial machine learning where an adversary performs evasion attacks on a classifier-based learner by sending queries with adversarial data of different attack strengths to it. The learner is unaware whether a query sent to it is clean versus adversarial. The objective of the learner is to mitigate the adversary's attacks by reducing its classification errors of adversarial data. To address this problem, we propose a technique where the learner maintains multiple classifiers that are trained with clean as well as adversarial data of different attack strengths. We then describe a game theoretic framework based on a 2-player repeated Bayesian game called Repeated Bayesian Sequential Game with self play, that enables the learner to determine an appropriate classifier to deploy so that the likelihood of correctly classifying the query and preventing the evasion attack is not deteriorated, while reducing the costs to deploy the classifiers. Experimental results of our proposed approach with adversarial text data shows that our RBSG with self play-based technique maintains classifier accuracies comparable with that of an individual, powerful and costly classifier, while strategically using multiple, lower cost but less powerful classifiers to reduce the overall classification costs.
Multi-Robot Informative Path Planning in Unknown Environments Through Continuous Region Partitioning
Dutta, Ayan (University of North Florida) | Bhattacharya, Amitabh (University of North Florida) | Kreidl, O. Patrick (University of North Florida) | Ghosh, Anirban (University of North Florida) | Dasgupta, Prithviraj (University of Nebraska at Omaha)
Information collection is an important application of multi-robot systems especially in environments that are difficult to operate for humans. The objective of the robots is to maximize information collection from the environment while remaining in their path-length budgets. In this paper, we propose a novel multi-robot information collection algorithm that uses a continuous region partitioning approach to efficiently divide an unknown environment among the robots based on the discovered obstacles in the area, for better load-balancing. Our algorithm gracefully handles situations when some of the robots cannot communicate with other robots due to limited communication ranges.
Efficient Real-Time Robot Navigation Using Incremental State Discovery Via Clustering
Saha, Olimpiya (University of Nebraska, Omaha) | Dasgupta, Prithviraj (University of Nebraska, Omaha)
We consider the problem of robot path planning in an initially unknown environment where the robot does not have access to an a priori map of its environment but is aware of some common obstacle patterns along with the paths that enable it to circumnavigate around these obstacles. In order to autonomously improve its navigation performance, the robot should be able to identify significant obstacle patterns and learn corresponding obstacle avoidance maneuvers as it navigates through different environments in order to solve its tasks. To achieve this objective, we propose a novel online algorithm called Incremental State Discovery Via Clustering (ISDC) which enables a robot to dynamically determine important obstacle patterns in its environments and their best representations as combinations of initially available basic obstacle patterns. Our results show that ISDC, when combined with our previously proposed navigation technique, was able to identify significant obstacle patterns in different environments in a time effective manner which accelerated the overall path planning and navigation times for the robots.
Simultaneous Configuration Formation and Information Collection by Modular Robotic Systems
Dutta, Ayan, Dasgupta, Prithviraj
We consider the configuration formation problem in modular robotic systems where a set of singleton modules that are spatially distributed in an environment are required to assume appropriate positions so that they can configure into a new, user-specified target configuration, while simultaneously maximizing the amount of information collected while navigating from their initial to final positions. Each module has a limited energy budget to expend while moving from its initial to goal location. To solve this problem, we propose a budget-limited, heuristic search-based algorithm that finds a path that maximizes the entropy of the expected information along the path. We have analytically proved that our proposed approach converges within finite time. Experimental results show that our planning approach has lower run-time than an auction-based allocation algorithm for selecting modules' spots.
Fast Path Planning Using Experience Learning from Obstacle Patterns
Saha, Olimpiya (University of Nebraska at Omaha) | Dasgupta, Prithviraj (University of Nebraska at Omaha)
We consider the problem of robot path planning in an environment where the location and geometry of obstacles are initially unknown while reusing relevant knowledge about collision avoidance learned from robots’ previous navigational experience. Our main hypothesis in this paper is that the path planning times for a robot can be reduced if it can refer to previous maneuvers it used to avoid collisions with obstacles during earlier missions, and adapt that information to avoid obstacles during its current navigation. To verify this hypothesis,we propose an algorithm called LearnerRRT that first uses a feature matching algorithm called Sample ConsensusInitial Alignment (SAC-IA) to efficiently match currently encountered obstacle features with past obstacle features, and, then uses an experience based learning technique to adapt previously recorded robot obstacle avoidance trajectories corresponding to the matched feature, to the current scenario. The feature matching and machine learning techniques are integrated into the robot’s path planner so that the robot can rapidly and seamlessly update its path to circumvent an obstacle it encounters, in real-time, and continue to move towards its goal. We have conducted several experiments using a simulated Coroware Corobot robot within the Webots simulator to verify the performance of our proposed algorithm,with different start and goal locations, and different obstacle geometries and placements, as well as compared our approach to a state-of-the-art sampling based path planner. Our results show that the proposed algorithm LearnerRRT performs much better than InformedRRT*. When given the same time, our algorithm finished its task successfully whereas Informed RRT* could only achieve 10-20 percent of the optimal distance.