AITopics | Wang, Baoxiang

Collaborating Authors

Wang, Baoxiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

Li, Wenhao, Qiao, Dan, Wang, Baoxiang, Wang, Xiangfeng, Jin, Bo, Zha, Hongyuan

arXiv.org Artificial IntelligenceSep-30-2023

The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved. Automatic subgoal generation (ASG) has recently emerged as a viable MARL approach inspired by utilizing subgoals in intrinsically motivated reinforcement learning. However, end-to-end learning of complex task planning from sparse rewards without prior knowledge, undoubtedly requires massive training samples. Moreover, the diversity-promoting nature of existing ASG methods can lead to the "over-representation" of subgoals, generating numerous spurious subgoals of limited relevance to the actual task reward and thus decreasing the sample efficiency of the algorithm. To address this problem and inspired by the disentangled representation learning, we propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA), that prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning. Additionally, SAMA incorporates language-grounded RL to train each agent's subgoal-conditioned policy. SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods, as evidenced by its performance on two challenging sparse-reward tasks, Overcooked and MiniRTS.

machine learning, onion, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.10865

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning

Zhao, Canzhe, Ze, Yanjie, Dong, Jing, Wang, Baoxiang, Li, Shuai

arXiv.org Artificial IntelligenceAug-19-2023

Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. To this end, we propose the \textit{differentially private multi-agent communication} (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous $(\epsilon, \delta)$-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.

artificial intelligence, dpmac, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2308.09902

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games

Dong, Jing, Wu, Jingyu, Wang, Siwei, Wang, Baoxiang, Chen, Wei

arXiv.org Artificial IntelligenceJun-18-2023

The congestion game is a powerful model that encompasses a range of engineering systems such as traffic networks and resource allocation. It describes the behavior of a group of agents who share a common set of $F$ facilities and take actions as subsets with $k$ facilities. In this work, we study the online formulation of congestion games, where agents participate in the game repeatedly and observe feedback with randomness. We propose CongestEXP, a decentralized algorithm that applies the classic exponential weights method. By maintaining weights on the facility level, the regret bound of CongestEXP avoids the exponential dependence on the size of possible facility sets, i.e., $\binom{F}{k} \approx F^k$, and scales only linearly with $F$. Specifically, we show that CongestEXP attains a regret upper bound of $O(kF\sqrt{T})$ for every individual player, where $T$ is the time horizon. On the other hand, exploiting the exponential growth of weights enables CongestEXP to achieve a fast convergence rate. If a strict Nash equilibrium exists, we show that CongestEXP can converge to the strict Nash policy almost exponentially fast in $O(F\exp(-t^{1-\alpha}))$, where $t$ is the number of iterations and $\alpha \in (1/2, 1)$.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2306.13673

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.88)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Online Influence Maximization under Decreasing Cascade Model

Kong, Fang, Xie, Jize, Wang, Baoxiang, Yao, Tao, Li, Shuai

arXiv.org Artificial IntelligenceMay-19-2023

We study online influence maximization (OIM) under a new model of decreasing cascade (DC). This model is a generalization of the independent cascade (IC) model by considering the common phenomenon of market saturation. In DC, the chance of an influence attempt being successful reduces with previous failures. The effect is neglected by previous OIM works under IC and linear threshold models. We propose the DC-UCB algorithm to solve this problem, which achieves a regret bound of the same order as the state-of-the-art works on the IC model. Extensive experiments on both synthetic and real datasets show the effectiveness of our algorithm.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.15428

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Industry: Information Technology (0.31)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Tian, Qi, Kuang, Kun, Liu, Furui, Wang, Baoxiang

arXiv.org Artificial IntelligenceMar-1-2023

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e, multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.15612

Country: Asia > China (1.00)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting > Online (0.46)
Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Diverse Policy Optimization for Structured Action Space

Li, Wenhao, Wang, Baoxiang, Yang, Shanchao, Zha, Hongyuan

arXiv.org Artificial IntelligenceFeb-23-2023

Enhancing the diversity of policies is beneficial for robustness, exploration, and transfer in reinforcement learning (RL). In this paper, we aim to seek diverse policies in an under-explored setting, namely RL tasks with structured action spaces with the two properties of composability and local dependencies. The complex action structure, non-uniform reward landscape, and subtle hyperparameter tuning due to the properties of structured actions prevent existing approaches from scaling well. We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework. A recently proposed novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. DPO follows a joint optimization framework: the outer layer uses the diverse policies sampled by the GFlowNet to update the EBM-based policies, which supports the GFlowNet training in the inner layer. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies in challenging scenarios and substantially outperform existing state-of-the-art methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2302.11917

Country:

Europe (0.46)
Asia > China (0.29)

Genre: Research Report (1.00)

Industry:

Transportation (0.70)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

Kong, Fang, Zhang, Xiangcheng, Wang, Baoxiang, Li, Shuai

arXiv.org Artificial IntelligenceFeb-14-2023

Reinforcement learning (RL) describes the interaction between a learning agent and an unknown environment, where the agent aims to maximize the cumulative reward through trial and error Sutton and Barto [2018]. It has achieved great success in many real applications, such as games [Mnih et al., 2013; Silver et al., 2016], robotics [Kober et al., 2013; Lillicrap et al., 2015], autonomous driving [Kiran et al., 2021] and recommendation systems [Afsar et al., 2022; Lin et al., 2021]. The interaction in RL is commonly portrayed by Markov decision processes (MDP). Most of the works study the stochastic setting, where the reward is sampled from a fixed distribution [Azar et al., 2017; Jin et al., 2018; Simchowitz and Jamieson, 2019; Yang et al., 2021]. RL in real applications is in general more challenging than the stochastic setting, as the environment could be nonstationary and the reward function could be adaptive towards the agent's policy. For example, a scheduling algorithm will be deployed to self-interested parties, and recommendation algorithms will face strategic users. To design robust algorithms that work under non-stationary environments, a line of works focuses on the adversarial setting, where the reward function could be arbitrarily chosen by an adversary [Yu et al., 2009; Rosenberg and Mansour, 2019; Jin et al., 2020a; Chen et al., 2021; Luo et al., 2021a].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.06834

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Algorithms and Theory for Supervised Gradual Domain Adaptation

Dong, Jing, Zhou, Shiji, Wang, Baoxiang, Zhao, Han

arXiv.org Artificial IntelligenceNov-13-2022

The phenomenon of data distribution evolving over time has been observed in a range of applications, calling for the need for adaptive learning algorithms. We thus study the problem of supervised gradual domain adaptation, where labeled data from shifting distributions are available to the learner along the trajectory, and we aim to learn a classifier on a target data distribution of interest. Under this setting, we provide the first generalization upper bound on the learning error under mild assumptions. Our results are algorithm agnostic, general for a range of loss functions, and only depend linearly on the averaged learning error across the trajectory. This shows significant improvement compared to the previous upper bound for unsupervised gradual domain adaptation, where the learning error on the target domain depends exponentially on the initial error on the source domain. Compared with the offline setting of learning from multiple domains, our results also suggest the potential benefits of the temporal structure among different domains in adapting to the target one. Empirically, our theoretical results imply that learning proper representations across the domains will effectively mitigate learning errors. Motivated by these theoretical insights, we propose a min-max learning objective to learn the representation and classifier simultaneously. Experimental results on both semi-synthetic and large-scale real datasets corroborate our findings and demonstrate the effectiveness of our objectives.

adaptation, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2204.11644

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Multi-agent Communication with Graph Information Bottleneck under Limited Bandwidth

Tian, Qi, Kuang, Kun, Wang, Baoxiang, Liu, Furui, Wu, Fei

arXiv.org Artificial IntelligenceDec-29-2021

Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). In many real-world scenarios, communication can be expensive and the bandwidth of the multi-agent system is subject to certain constraints. Redundant messages who occupy the communication resources can block the transmission of informative messages and thus jeopardize the performance. In this paper, we aim to learn the minimal sufficient communication messages. First, we initiate the communication between agents by a complete graph. Then we introduce the graph information bottleneck (GIB) principle into this complete graph and derive the optimization over graph structures. Based on the optimization, a novel multi-agent communication module, called CommGIB, is proposed, which effectively compresses the structure information and node information in the communication graph to deal with bandwidth-constrained settings. Extensive experiments in Traffic Control and StanCraft II are conducted. The results indicate that the proposed methods can achieve better performance in bandwidth-restricted settings compared with state-of-the-art algorithms, with especially large margins in large-scale multi-agent tasks.

artificial intelligence, communication, information, (12 more...)

arXiv.org Artificial Intelligence

2112.10374

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Yang, Shanchao, Ma, Kaili, Wang, Baoxiang, Zha, Hongyuan

arXiv.org Artificial IntelligenceOct-18-2021

Improving the resilience of a network protects the system from natural disasters and malicious attacks. This is typically achieved by introducing new edges, which however may reach beyond the maximum number of connections a node could sustain. Many studies then resort to the degree-preserving operation of rewiring, which swaps existing edges $AC, BD$ to new edges $AB, CD$. A significant line of studies focuses on this technique for theoretical and practical results while leaving three limitations: network utility loss, local optimality, and transductivity. In this paper, we propose ResiNet, a reinforcement learning (RL)-based framework to discover resilient network topologies against various disasters and attacks. ResiNet is objective agnostic which allows the utility to be balanced by incorporating it into the objective function. The local optimality, typically seen in greedy algorithms, is addressed by casting the cumulative resilience gain into a sequential decision process of step-wise rewiring. The transductivity, which refers to the necessity to run a computationally intensive optimization for each input graph, is lifted by our variant of RL with auto-regressive permutation-invariant variable action space. ResiNet is armed by our technical innovation, Filtration enhanced GNN (FireGNN), which distinguishes graphs with minor differences. It is thus possible for ResiNet to capture local structure changes and adapt its decision among consecutive graphs, which is known to be infeasible for GNN. Extensive experiments demonstrate that with a small number of rewiring operations, ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2110.09035

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Add feedback