AITopics

2410.13769

Country:

North America > United States > Michigan (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-27-2023

Reward Shaping for Improved Learning in Real-time Strategy Game Play

Kliem, John, Dasgupta, Prithviraj

We investigate the effect of reward shaping in improving the performance of reinforcement learning in the context of the real-time strategy, capture-the-flag game. The game is characterized by sparse rewards that are associated with infrequently occurring events such as grabbing or capturing the flag, or tagging the opposing player. We show that appropriately designed reward shaping functions applied to different game events can significantly improve the player's performance and training times of the player's learning algorithm. We have validated our reward shaping functions within a simulated environment for playing a marine capture-the-flag game between two players. Our experimental results demonstrate that reward shaping can be used as an effective means to understand the importance of different sub-tasks during game-play towards winning the game, to encode a secondary objective functions such as energy efficiency into a player's game-playing behavior, and, to improve learning generalizable policies that can perform well against different skill levels of the opponent.

defender, machine learning, reinforcement learning, (16 more...)

2311.16339

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

arXiv.org Artificial IntelligenceJun-9-2023

Divide and Repair: Using Options to Improve Performance of Imitation Learning Against Adversarial Demonstrations

Dasgupta, Prithviraj

We consider the problem of learning to perform a task from demonstrations given by teachers or experts, when some of the experts' demonstrations might be adversarial and demonstrate an incorrect way to perform the task. We propose a novel technique that can identify parts of demonstrated trajectories that have not been significantly modified by the adversary and utilize them for learning, using temporally extended policies or options. We first define a trajectory divergence measure based on the spatial and temporal features of demonstrated trajectories to detect and discard parts of the trajectories that have been significantly modified by an adversarial expert, and, could degrade the learner's performance, if used for learning, We then use an options-based algorithm that partitions trajectories and learns only from the parts of trajectories that have been determined as admissible. We provide theoretical results of our technique to show that repairing partial trajectories improves the sample efficiency of the demonstrations without degrading the learner's performance. We then evaluate the proposed algorithm for learning to play an Atari-like, computer-based game called LunarLander in the presence of different types and degrees of adversarial attacks of demonstrated trajectories. Our experimental results show that our technique can identify adversarially modified parts of the demonstrated trajectories and successfully prevent the learning performance from degrading due to adversarial demonstrations.

machine learning, reinforcement learning, trajectory, (17 more...)

2306.04581

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Military (0.88)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceApr-14-2023

Synthetically Generating Human-like Data for Sequential Decision Making Tasks via Reward-Shaped Imitation Learning

Brandt, Bryan, Dasgupta, Prithviraj

We consider the problem of synthetically generating data that can closely resemble human decisions made in the context of an interactive human-AI system like a computer game. We propose a novel algorithm that can generate synthetic, human-like, decision making data while starting from a very small set of decision making data collected from humans. Our proposed algorithm integrates the concept of reward shaping with an imitation learning algorithm to generate the synthetic data. We have validated our synthetic data generation technique by using the synthetically generated data as a surrogate for human interaction data to solve three sequential decision making tasks of increasing complexity within a small computer game-like setup. Different empirical and statistical analyses of our results show that the synthetically generated data can substitute the human data and perform the game-playing tasks almost indistinguishably, with very low divergence, from a human performing the same tasks.

machine learning, reinforcement learning, trajectory, (18 more...)

2304.0728

Country: North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

AAAI ConferencesMay-16-2020

Improving Costs and Robustness of Machine Learning Classifiers Against Adversarial Attacks via Self Play of Repeated Bayesian Games

Dasgupta, Prithviraj (U.S. Naval Research Laboratory) | Collins, Joseph B. (U.S. Naval Research Laboratory ) | McCarrick, Michael (U.S. Naval Research Laboratory)

We consider the problem of adversarial machine learning where an adversary performs evasion attacks on a classifier-based learner by sending queries with adversarial data of different attack strengths to it. The learner is unaware whether a query sent to it is clean versus adversarial. The objective of the learner is to mitigate the adversary's attacks by reducing its classification errors of adversarial data. To address this problem, we propose a technique where the learner maintains multiple classifiers that are trained with clean as well as adversarial data of different attack strengths. We then describe a game theoretic framework based on a 2-player repeated Bayesian game called Repeated Bayesian Sequential Game with self play, that enables the learner to determine an appropriate classifier to deploy so that the likelihood of correctly classifying the query and preventing the evasion attack is not deteriorated, while reducing the costs to deploy the classifiers. Experimental results of our proposed approach with adversarial text data shows that our RBSG with self play-based technique maintains classifier accuracies comparable with that of an individual, powerful and costly classifier, while strategically using multiple, lower cost but less powerful classifiers to reduce the overall classification costs.

adversarial attack, cost and robustness, learning classifiers, (2 more...)

The Thirty-Third International Flairs Conference

Industry:

Information Technology > Security & Privacy (0.40)
Government > Military (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

AAAI ConferencesMay-15-2019

Multi-Robot Informative Path Planning in Unknown Environments Through Continuous Region Partitioning

Dutta, Ayan (University of North Florida) | Bhattacharya, Amitabh (University of North Florida) | Kreidl, O. Patrick (University of North Florida) | Ghosh, Anirban (University of North Florida) | Dasgupta, Prithviraj (University of Nebraska at Omaha)

Information collection is an important application of multi-robot systems especially in environments that are difficult to operate for humans. The objective of the robots is to maximize information collection from the environment while remaining in their path-length budgets. In this paper, we propose a novel multi-robot information collection algorithm that uses a continuous region partitioning approach to efficiently divide an unknown environment among the robots based on the discovered obstacles in the area, for better load-balancing. Our algorithm gracefully handles situations when some of the robots cannot communicate with other robots due to limited communication ranges.

artificial intelligence, planning & scheduling, robot, (18 more...)

The Thirty-Second International Flairs Conference

Country: North America > United States (0.14)

Industry: Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.86)

AAAI ConferencesMay-17-2018

Efficient Real-Time Robot Navigation Using Incremental State Discovery Via Clustering

Saha, Olimpiya (University of Nebraska, Omaha) | Dasgupta, Prithviraj (University of Nebraska, Omaha)

We consider the problem of robot path planning in an initially unknown environment where the robot does not have access to an a priori map of its environment but is aware of some common obstacle patterns along with the paths that enable it to circumnavigate around these obstacles. In order to autonomously improve its navigation performance, the robot should be able to identify significant obstacle patterns and learn corresponding obstacle avoidance maneuvers as it navigates through different environments in order to solve its tasks. To achieve this objective, we propose a novel online algorithm called Incremental State Discovery Via Clustering (ISDC) which enables a robot to dynamically determine important obstacle patterns in its environments and their best representations as combinations of initially available basic obstacle patterns. Our results show that ISDC, when combined with our previously proposed navigation technique, was able to identify significant obstacle patterns in different environments in a time effective manner which accelerated the overall path planning and navigation times for the robots.

clustering, efficient real-time robot navigation

The Thirty-First International Flairs Conference

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.73)

AAAI ConferencesOct-31-2017

Position Paper: Towards a Repeated Bayesian Stackelberg Game Model for Robustness Against Adversarial Learning

Dasgupta, Prithviraj (University of Nebraska at Omaha) | Collins, Joseph (Naval Research Laboratory)

In this position paper, we propose a game theoretic formulation of the adversarial learning problem called a RepeatedBayesian Stackelberg Game (RBSG) that can be used by aprediction mechanism to make itself robust against adversarial examples.

adversarial learning, position paper, repeated bayesian stackelberg game model, (1 more...)

2017 AAAI Fall Symposium Series

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

arXiv.org Artificial IntelligenceOct-18-2016

Simultaneous Configuration Formation and Information Collection by Modular Robotic Systems

Dutta, Ayan, Dasgupta, Prithviraj

We consider the configuration formation problem in modular robotic systems where a set of singleton modules that are spatially distributed in an environment are required to assume appropriate positions so that they can configure into a new, user-specified target configuration, while simultaneously maximizing the amount of information collected while navigating from their initial to final positions. Each module has a limited energy budget to expend while moving from its initial to goal location. To solve this problem, we propose a budget-limited, heuristic search-based algorithm that finds a path that maximizes the entropy of the expected information along the path. We have analytically proved that our proposed approach converges within finite time. Experimental results show that our planning approach has lower run-time than an auction-based allocation algorithm for selecting modules' spots.

artificial intelligence, configuration, module, (16 more...)

doi: 10.1109/ICRA.2016.7487729

1610.05731

Country:

North America > United States > Nebraska (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

AAAI ConferencesMar-16-2016

Fast Path Planning Using Experience Learning from Obstacle Patterns

Saha, Olimpiya (University of Nebraska at Omaha) | Dasgupta, Prithviraj (University of Nebraska at Omaha)

We consider the problem of robot path planning in an environment where the location and geometry of obstacles are initially unknown while reusing relevant knowledge about collision avoidance learned from robots’ previous navigational experience. Our main hypothesis in this paper is that the path planning times for a robot can be reduced if it can refer to previous maneuvers it used to avoid collisions with obstacles during earlier missions, and adapt that information to avoid obstacles during its current navigation. To verify this hypothesis,we propose an algorithm called LearnerRRT that first uses a feature matching algorithm called Sample ConsensusInitial Alignment (SAC-IA) to efficiently match currently encountered obstacle features with past obstacle features, and, then uses an experience based learning technique to adapt previously recorded robot obstacle avoidance trajectories corresponding to the matched feature, to the current scenario. The feature matching and machine learning techniques are integrated into the robot’s path planner so that the robot can rapidly and seamlessly update its path to circumvent an obstacle it encounters, in real-time, and continue to move towards its goal. We have conducted several experiments using a simulated Coroware Corobot robot within the Webots simulator to verify the performance of our proposed algorithm,with different start and goal locations, and different obstacle geometries and placements, as well as compared our approach to a state-of-the-art sampling based path planner. Our results show that the proposed algorithm LearnerRRT performs much better than InformedRRT*. When given the same time, our algorithm finished its task successfully whereas Informed RRT* could only achieve 10-20 percent of the optimal distance.

artificial intelligence, obstacle, planning & scheduling, (18 more...)

2016 AAAI Spring Symposium Series

Country: North America > United States > Nebraska (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)