AITopics | Wang, Baoxiang

Collaborating Authors

Wang, Baoxiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Li, Jiahui, Kuang, Kun, Wang, Baoxiang, Liu, Furui, Chen, Long, Wu, Fei, Xiao, Jun

arXiv.org Artificial IntelligenceJun-1-2021

Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.

agent, artificial intelligence, computer game, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3447548.3467420

2106.00285

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Combinatorial Bandits under Strategic Manipulations

Dong, Jing, Li, Ke, Li, Shuai, Wang, Baoxiang

arXiv.org Artificial IntelligenceFeb-25-2021

We study the problem of combinatorial multi-armed bandits (CMAB) under strategic manipulations of rewards, where each arm can modify the emitted reward signals for its own interest. Our setting elaborates a more realistic model of adaptive arms that imposes relaxed assumptions compared to adversarial corruptions and adversarial attacks. Algorithms designed under strategic arms gain robustness in real applications while avoiding being overcautious and hampering the performance. We bridge the gap between strategic manipulations and adversarial attacks by investigating the optimal colluding strategy among arms under the MAB problem. We then propose a strategic variant of the combinatorial UCB algorithm, which has a regret of at most $O(m\log T + m B_{max})$ under strategic manipulations, where $T$ is the time horizon, $m$ is the number of arms, and $B_{max}$ is the maximum budget. We further provide lower bounds on the strategic budgets for attackers to incur certain regret of the bandit algorithm. Extensive experiments corroborate our theoretical findings on robustness and regret bounds, in a variety of regimes of manipulation budgets.

algorithm, artificial intelligence, big data, (19 more...)

arXiv.org Artificial Intelligence

2102.12722

Country:

Asia (0.93)
Europe (0.93)
North America > United States > California > Los Angeles County (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.89)
Government > Military (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.90)
Information Technology > Security & Privacy (0.89)

Add feedback

Recurrent Existence Determination Through Policy Optimization

Wang, Baoxiang

arXiv.org Artificial IntelligenceJun-3-2019

Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness. One of the possible reasons is that humans can skip most of the clutter and attend only on salient regions. Recurrent attention models (RAM) are the first computational models to imitate the way humans process images via the REINFORCE algorithm. Despite that RAM is originally designed for image recognition, we extend it and present recurrent existence determination, an attention-based mechanism to solve the existence determination. Our algorithm employs a novel $k$-maximum aggregation layer and a new reward mechanism to address the issue of delayed rewards, which would have caused the instability of the training process. The experimental analysis demonstrates significant efficiency and accuracy improvement over existing approaches, on both synthetic and real-world datasets.

agent, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

1905.13551

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Private Q-Learning with Functional Noise in Continuous Spaces

Wang, Baoxiang, Hegde, Nidhi

arXiv.org Machine LearningJan-29-2019

We consider privacy-preserving algorithms for deep reinforcement learning. State-of-the-art methods that guarantee differential privacy are not extendable to very large state spaces because the noise level necessary to ensure privacy would scale to infinity. We address the problem of providing differential privacy in Q-learning where a function approximation through a neural network is used for parametrization. We develop a rigorous and efficient algorithm by inspecting the reproducing kernel Hilbert space in which the neural network is embedded. Our approach uses functional noise to guarantee privacy, while the noise level scales linearly with the complexity of the neural network architecture. There are no known theoretical guarantees on the performance of deep reinforcement learning, but we gain some insight by providing a utility analysis under the discrete space setting.

artificial intelligence, exp, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1901.10634

Country: North America > Canada (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

Wang, Baoxiang, Sun, Tongfang, Zheng, Sam Xianjun

arXiv.org Artificial IntelligenceJul-1-2018

In recent years, reinforcement learning (RL) methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go, and Poker. However, those studies mostly focus on winning the game and have largely ignored the rich and complex human motivations, which are essential for understanding different players' diverse behaviors. In this paper, we present a novel method called Multi-Motivation Behavior Modeling (MMBM) that takes the multifaceted human motivations into consideration and models the underlying value structure of the players using inverse RL. Our approach does not require the access to the dynamic of the system, making it feasible to model complex interactive environments such as massively multiplayer online games. MMBM is tested on the World of Warcraft Avatar History dataset, which recorded over 70,000 users' gameplay spanning three years period. Our model reveals the significant difference of value structures among different player groups. Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players' behaviors.

artificial intelligence, computer game, motivation, (21 more...)

arXiv.org Artificial Intelligence

1807.00366

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Young, Kenny, Wang, Baoxiang, Taylor, Matthew E.

arXiv.org Artificial IntelligenceMay-10-2018

Reinforcement learning (RL) has had many successes in both "deep" and "shallow" settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably large experience replay buffers or the use of multiple parallel actors. These techniques come at the cost of moving away from the online RL problem as it is traditionally formulated (i.e., a single agent learning online without maintaining a large database of training examples). Meta-learning can potentially help with both these issues by tuning hyperparameters online and allowing the algorithm to more robustly adjust to non-stationarity in a problem. This paper applies meta-gradient descent to derive a set of step-size tuning algorithms specifically for online RL control with eligibility traces. Our novel technique, Metatrace, makes use of an eligibility trace analogous to methods like $TD(\lambda)$. We explore tuning both a single scalar step-size and a separate step-size for each learned parameter. We evaluate Metatrace first for control with linear function approximation in the classic mountain car problem and then in a noisy, non-stationary version. Finally, we apply Metatrace for control with nonlinear function approximation in 5 games in the Arcade Learning Environment where we explore how it impacts learning speed and robustness to initial step-size choice. Results show that the meta-step-size parameter of Metatrace is easy to set, Metatrace can speed learning, and Metatrace can allow an RL algorithm to deal with non-stationarity in the learning task.

artificial intelligence, educational setting, metatrace, (18 more...)

arXiv.org Artificial Intelligence

1805.04514

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Policy Optimization with Second-Order Advantage Information

Li, Jiajin, Wang, Baoxiang

arXiv.org Artificial IntelligenceMay-9-2018

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide & deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.

deep learning, estimator, neural network, (18 more...)

arXiv.org Artificial Intelligence

1805.03586

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback