AITopics

2410.01954

Country: North America > United States > Oregon > Lane County > Eugene (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

arXiv.org Artificial IntelligenceNov-26-2023

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

Chen, Changyu, Karunasena, Ramesha, Nguyen, Thanh Hong, Sinha, Arunesh, Varakantham, Pradeep

Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy -- the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state.

constraint, machine learning, reinforcement learning, (19 more...)

2311.15341

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-10-2023

Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning

Bui, The Viet, Mai, Tien, Nguyen, Thanh Hong

This paper concerns imitation learning (IL) (i.e, the problem of learning to mimic expert behaviors from demonstrations) in cooperative multi-agent systems. The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. In a single-agent setting, IL has proven to be done efficiently through an inverse soft-Q learning process given expert demonstrations. However, extending this framework to a multi-agent context introduces the need to simultaneously learn both local value functions to capture local observations and individual actions, and a joint value function for exploiting centralized learning. In this work, we introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions. A main advantage of this approach is that the weights of the mixing networks can be trained using information derived from global states. We further establish conditions for the mixing networks under which the multi-agent objective function exhibits convexity within the Q function space. We present extensive experiments conducted on some challenging competitive and cooperative multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2), which demonstrates the effectiveness of our proposed algorithm compared to existing state-of-the-art multi-agent IL algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2310.06801

Country: North America > United States > Oregon (0.14)

Genre:

Research Report (0.82)
Overview (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceAug-20-2023

Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games

Bui, The Viet, Mai, Tien, Nguyen, Thanh Hong

Training agents in multi-agent competitive games presents significant challenges due to their intricate nature. These challenges are exacerbated by dynamics influenced not only by the environment but also by opponents' strategies. Existing methods often struggle with slow convergence and instability. To address this, we harness the potential of imitation learning to comprehend and anticipate opponents' behavior, aiming to mitigate uncertainties with respect to the game dynamics. Our key contributions include: (i) a new multi-agent imitation learning model for predicting next moves of the opponents -- our model works with hidden opponents' actions and local observations; (ii) a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process; and (iii) extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2308.10188

Country: North America > United States > Oregon > Lane County > Eugene (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-22-2023

CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations

Guo, Hangzhi, Nguyen, Thanh Hong, Yadav, Amulya

This work presents CounterNet, a novel end-to-end learning framework which integrates Machine Learning (ML) model training and the generation of corresponding counterfactual (CF) explanations into a single end-to-end pipeline. Counterfactual explanations offer a contrastive case, i.e., they attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model on that instance to a predefined output. Prior techniques for generating CF explanations suffer from two major limitations: (i) all of them are post-hoc methods designed for use with proprietary ML models -- as a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations; and (ii) most of them rely on solving separate time-intensive optimization problems to find CF explanations for each input data point (which negatively impacts their runtime). This work makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by presenting CounterNet, an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model. We adopt a block-wise coordinate descent procedure which helps in effectively training CounterNet's network. Our extensive experiments on multiple real-world datasets show that CounterNet generates high-quality predictions, and consistently achieves 100% CF validity and low proximity scores (thereby achieving a well-balanced cost-invalidity trade-off) for any new input instance, and runs 3X faster than existing state-of-the-art baselines.

data mining, machine learning, natural language, (21 more...)

2109.07557

Country:

North America > United States > California (0.15)
North America > United States > Pennsylvania (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.93)
Law (0.67)
Information Technology > Security & Privacy (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

AAAI ConferencesApr-19-2016

Conquering Adversary Behavioral Uncertainty in Security Games: An Efficient Modeling Robust Based Algorithm

Nguyen, Thanh Hong (University of Southern California) | Sinha, Arunesh (University of Southern California) | Tambe, Milind (University of Southern California)

Stackelberg Security Games (SSG) have been widely applied for solving real-world security problems—with a significant research emphasis on modeling attackers’ behaviors to handle their bounded rationality. However, access to real-world data (used for learning an accurate behavioral model) is often limited, leading to uncertainty in attacker’s behaviors while modeling. This paper therefore focuses on addressing behavioral uncertainty in SSG with the following main contributions: 1) we present a new uncertainty game model that integrates uncertainty intervals into a behavioral model to capture behavioral uncertainty; 2) based on this game model, we propose a novel robust algorithm that approximately computes the defender’s optimal strategy in the worst-case scenario of uncertainty—with a bound guarantee on its solution quality.

behavioral uncertainty, computer game, game theory, (19 more...)

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.30)

Industry: Leisure & Entertainment > Games > Computer Games (0.63)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

AAAI ConferencesApr-12-2016

Protecting Wildlife under Imperfect Observation

Nguyen, Thanh Hong (University of Southern California) | Sinha, Arunesh (University of Southern California) | Gholami, Shahrzad (University of Southern California) | Plumptre, Andrew ( Wildlife Conservation Society ) | Joppa, Lucas ( Microsoft Research ) | Tambe, Milind (University of Southern California) | Driciru, Margaret ( Uganda Wildlife Authority ) | Wanyama, Fred ( Uganda Wildlife Authority ) | Rwetsiba, Aggrey ( Uganda Wildlife Authority ) | Critchlow, Rob ( The University of York ) | Beale, Colin ( The University of York )

Wildlife poaching presents a serious extinction threat to many animal species. In order to save wildlife in designated wildlife parks, park rangers conduct patrols over the park area to combat such illegal activities. An important aspect of the patrolling activity of the rangers is to anticipate where the poachers are likely to catch animals and then respond accordingly. Previous work has applied defender-attacker Stackelberg Security Games (SSGs) to solve the problem of wildlife protection, wherein attacker behavioral models are used to predict the behaviors of the poachers. However, these behavioral models have several limitations which limit their accuracy in predicting poachers' behavior. First, existing models fail to account for the rangers' imperfect observations w.r.t poaching activities (due to the limited capability of rangers to patrol thoroughly over a vast geographical area). Second, these models are built upon discrete choice models that assume a single agent choosing targets, while it is infeasible to obtain information about every single attacker in wildlife protection. Third, these models do not consider the effect of past poachers' actions on the current poachers' activities, one of the key factors affecting the poachers' behaviors. In this work, we attempt to address these limitations while providing three main contributions. First, we propose a novel hierarchical behavioral model, HiBRID, to predict the poachers' behaviors wherein the rangers' imperfect detection of poaching signs is taken into account --- a significant advance towards existing behavioral models in security games. Furthermore, HiBRID incorporates the temporal effect on the poachers' behaviors. The model also does not require a known number of attackers. Second, we provide two new heuristics: \textit{parameter separation} and \textit{target abstraction} to reduce the computational complexity in learning the model parameters. Finally, we use the real-world data collected in Queen Elizabeth National Park (QENP) in Uganda over 12 years to evaluate the prediction accuracy of our new model.

artificial intelligence, game theory, ranger, (17 more...)

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

AAAI ConferencesJul-14-2014

Regret-Based Optimization and Preference Elicitation for Stackelberg Security Games with Uncertainty

Nguyen, Thanh Hong (University of Southern California) | Yadav, Amulya (University of Southern California) | An, Bo (Nanyang Technological University) | Tambe, Milind (University of Southern California) | Boutilier, Craig (University of Toronto)

Stackelberg security games (SSGs) have been deployed in a number of real-world domains. One key challenge in these applications is the assessment of attacker payoffs, which may not be perfectly known. Previous work has studied SSGs with uncertain payoffs modeled by interval uncertainty and provided maximin-based robust solutions. In contrast, in this work we propose the use of the less conservative minimax regret decision criterion for such payoff-uncertain SSGs and present the first algorithms for computing minimax regret for SSGs. We also address the challenge of preference elicitation, using minimax regret to develop the first elicitation strategies for SSGs. Experimental results validate the effectiveness of our approaches.

computer game, constraint, game theory, (19 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Industry:

Leisure & Entertainment > Games > Computer Games (0.61)
Commercial Services & Supplies > Security & Alarm Services (0.61)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

AAAI ConferencesJul-9-2013

Analyzing the Effectiveness of Adversary Modeling in Security Games

Nguyen, Thanh Hong (University of Southern California) | Yang, Rong (University of Southern California) | Azaria, Amos (Bar-Ilan University) | Kraus, Sarit (Bar-Ilan University and University of Maryland) | Tambe, Milind (University of Southern California)

Recent deployments of Stackelberg security games (SSG) have led to two competing approaches to handle boundedly rational human adversaries: (1) integrating models of human (adversary) decision-making into the game-theoretic algorithms, and (2) applying robust optimization techniques that avoid adversary modeling. A recent algorithm (MATCH) based on the second approach was shown to outperform the leading modeling-based algorithm even in the presence of significant amount of data. Is there then any value in using human behavior models in solving SSGs? Through extensive experiments with 547 human subjects playing 11102 games in total, we emphatically answer the question in the affirmative, while providing the following key contributions: (i) we show that our algorithm, SU-BRQR, based on a novel integration of human behavior model with the subjective utility function, significantly outperforms both MATCH and its improvements; (ii) we are the first to present experimental results with security intelligence experts, and find that even though the experts are more rational than the Amazon Turk workers, SU-BRQR still outperforms an approach assuming perfect rationality (and to a more limited extent MATCH); (iii) we show the advantage of SU-BRQR in a new, large game setting and demonstrate that sufficient data enables it to improve its performance over MATCH.

adversary, algorithm, game theory, (20 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)