Agent Societies
Models as Agents: Optimizing Multi-Step Predictions of Interactive Local Models in Model-Based Multi-Agent Reinforcement Learning
Wu, Zifan, Yu, Chao, Chen, Chen, Hao, Jianye, Zhuo, Hankz Hankui
Research in model-based reinforcement learning has made significant progress in recent years. Compared to single-agent settings, the exponential dimension growth of the joint state-action space in multi-agent systems dramatically increases the complexity of the environment dynamics, which makes it infeasible to learn an accurate global model and thus necessitates the use of agent-wise local models. However, during multi-step model rollouts, the prediction of one local model can affect the predictions of other local models in the next step. As a result, local prediction errors can be propagated to other localities and eventually give rise to considerably large global errors. Furthermore, since the models are generally used to predict for multiple steps, simply minimizing one-step prediction errors regardless of their long-term effect on other models may further aggravate the propagation of local errors. To this end, we propose Models as AGents (MAG), a multi-agent model optimization framework that reversely treats the local models as multi-step decision making agents and the current policies as the dynamics during the model rollout process. In this way, the local models are able to consider the multi-step mutual affect between each other before making predictions. Theoretically, we show that the objective of MAG is approximately equivalent to maximizing a lower bound of the true environment return. Experiments on the challenging StarCraft II benchmark demonstrate the effectiveness of MAG.
DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies
Ikponmwoba, Eloghosa, Owoyele, Ope
We present an approach for designing swarm-based optimizers for the global optimization of expensive black-box functions. In the proposed approach, the problem of finding efficient optimizers is framed as a reinforcement learning problem, where the goal is to find optimization policies that require a few function evaluations to converge to the global optimum. The state of each agent within the swarm is defined as its current position and function value within a design space and the agents learn to take favorable actions that maximize reward, which is based on the final value of the objective function. The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies. Furthermore, the effect of changing the number of agents, as well as the generalization capabilities of the trained agents are investigated. The results show superior performance compared to the other optimizers, desired scaling when the number of agents is varied, and acceptable performance even when applied to unseen functions. On a broader scale, the results show promise for the rapid development of domain-specific optimizers.
A Hierarchical Game-Theoretic Decision-Making for Cooperative Multi-Agent Systems Under the Presence of Adversarial Agents
Yang, Qin, Parasuraman, Ramviyas
Underlying relationships among Multi-Agent Systems (MAS) in hazardous scenarios can be represented as Game-theoretic models. This paper proposes a new hierarchical network-based model called Game-theoretic Utility Tree (GUT), which decomposes high-level strategies into executable low-level actions for cooperative MAS decisions. It combines with a new payoff measure based on agent needs for real-time strategy games. We present an Explore game domain, where we measure the performance of MAS achieving tasks from the perspective of balancing the success probability and system costs. We evaluate the GUT approach against state-of-the-art methods that greedily rely on rewards of the composite actions. Conclusive results on extensive numerical simulations indicate that GUT can organize more complex relationships among MAS cooperation, helping the group achieve challenging tasks with lower costs and higher winning rates. Furthermore, we demonstrated the applicability of the GUT using the simulator-hardware testbed - Robotarium. The performances verified the effectiveness of the GUT in the real robot application and validated that the GUT could effectively organize MAS cooperation strategies, helping the group with fewer advantages achieve higher performance.
The challenge of redundancy on multi-agent value factorisation
Singh, Siddarth, Rosman, Benjamin
In the field of cooperative multi-agent reinforcement learning (MARL), the standard paradigm is the use of centralised training and decentralised execution where a central critic conditions the policies of the cooperative agents based on a central state. It has been shown, that in cases with large numbers of redundant agents these methods become less effective. In a more general case, there is likely to be a larger number of agents in an environment than is required to solve the task. These redundant agents reduce performance by enlarging the dimensionality of both the state space and and increasing the size of the joint policy used to solve the environment. We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals and create a new MARL algorithm: relevance decomposition network (RDN). We find that although the performance of both baselines VDN and Qmix degrades with the number of redundant agents, RDN is unaffected.
An Agent-Based Model for Poverty and Discrimination Policy-Making
Montes, Nieves, Curto, Georgina, Osman, Nardine, Sierra, Carles
The deceleration of global poverty reduction in the last decades suggests that traditional redistribution policies are losing their effectiveness. Alternative ways to work towards the #1 United Nations Sustainable Development Goal (poverty eradication) are required. NGOs have insistingly denounced the criminalization of poverty, and the social science literature suggests that discrimination against the poor (a phenomenon known as aporophobia) could constitute a brake to the fight against poverty. This paper describes a proposal for an agent-based model to examine the impact that aporophobia at the institutional level has on poverty levels. This aporophobia agent-based model (AABM) will first be applied to a case study in the city of Barcelona. The regulatory environment is central to the model, since aporophobia has been identified in the legal framework. The AABM presented in this paper constitutes a cornerstone to obtain empirical evidence, in a non-invasive way, on the causal relationship between aporophobia and poverty levels. The simulations that will be generated based on the AABM have the potential to inform a new generation of poverty reduction policies, which act not only on the redistribution of wealth but also on the discrimination of the poor.
Ezekiel Elliott has narrowed down free agent decision to 3 teams
Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. Running back Ezekiel Elliott has been searching for his next home in the NFL after the Dallas Cowboys released him. He's reportedly narrowed down his search to three teams, all of which could be Super Bowl contenders next season. The Philadelphia Eagles, New York Jets and Cincinnati Bengals are on Elliott's wish list, sources confirmed to Fox News Digital.
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework
Ye, Jianing, Li, Chenghao, Wang, Jianhao, Zhang, Chongjie
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL). Recently, most popular MARL algorithms have adopted decentralized policies to enable decentralized execution and use gradient descent as their optimizer. However, there is hardly any theoretical analysis of these algorithms taking the optimization method into consideration, and we find that various popular MARL algorithms with decentralized policies are suboptimal in toy tasks when gradient descent is chosen as their optimization method. In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods to prove their suboptimality when gradient descent is used. In addition, we propose the Transformation And Distillation (TAD) framework, which reformulates a multi-agent MDP as a special single-agent MDP with a sequential structure and enables decentralized execution by distilling the learned policy on the derived ``single-agent" MDP. This approach uses a two-stage learning paradigm to address the optimization problem in cooperative MARL, maintaining its performance guarantee. Empirically, we implement TAD-PPO based on PPO, which can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
IoT trust and reputation: a survey and taxonomy
Aaqib, Muhammad, Ali, Aftab, Chen, Liming, Nibouche, Omar
IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.
Presenting Multiagent Challenges in Team Sports Analytics
This paper draws correlations between several challenges and opportunities within the area of team sports analytics and key research areas within multiagent systems (MAS). We specifically consider invasion games, defined as sports where players invade the opposing team's territory and can interact anywhere on a playing surface such as ice hockey, soccer, and basketball. We argue that MAS is well-equipped to study invasion games and will benefit both MAS and sports analytics fields. Our discussion highlights areas for MAS implementation and further development along two axes: short-term in-game strategy (coaching) and long-term team planning (management).
Imitating Graph-Based Planning with Goal-Conditioned Policies
Kim, Junsu, Seo, Younggyo, Ahn, Sungsoo, Son, Kyunghwan, Shin, Jinwoo
Recently, graph-based planning algorithms have gained much attention to solve goal-conditioned reinforcement learning (RL) tasks: they provide a sequence of subgoals to reach the target-goal, and the agents learn to execute subgoal-conditioned policies. However, the sample-efficiency of such RL schemes still remains a challenge, particularly for long-horizon tasks. To address this issue, we present a simple yet effective self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. Our intuition here is that to reach a target-goal, an agent should pass through a subgoal, so target-goal- and subgoal- conditioned policies should be similar to each other. We also propose a novel scheme of stochastically skipping executed subgoals in a planned path, which further improves performance. Unlike prior methods that only utilize graph-based planning in an execution phase, our method transfers knowledge from a planner along with a graph into policy learning. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods under various long-horizon control tasks.