Agents
TarMAC: Targeted Multi-Agent Communication
Das, Abhishek, Gervet, Théophile, Romoff, Joshua, Batra, Dhruv, Parikh, Devi, Rabbat, Michael, Pineau, Joelle
We explore a collaborative multi-agent reinforcement learning setting where a team of agents attempts to solve cooperative tasks in partially-observable environments. In this scenario, learning an effective communication protocol is key. We propose a communication architecture that allows for targeted communication, where agents learn both what messages to send and who to send them to, solely from downstream task-specific reward without any communication supervision. Additionally, we introduce a multi-stage communication approach where the agents co-ordinate via multiple rounds of communication before taking actions in the environment. We evaluate our approach on a diverse set of cooperative multi-agent tasks, of varying difficulties, with varying number of agents, in a variety of environments ranging from 2D grid layouts of shapes and simulated traffic junctions to complex 3D indoor environments. We demonstrate the benefits of targeted as well as multi-stage communication. Moreover, we show that the targeted communication strategies learned by agents are both interpretable and intuitive.
Segmentation Analysis in Human Centric Cyber-Physical Systems using Graphical Lasso
Das, Hari Prasanna, Konstantakopoulos, Ioannis C., Manasawala, Aummul Baneen, Veeravalli, Tanya, Liu, Huihan, Spanos, Costas J.
A generalized gamification framework is introduced as a form of smart infrastructure with potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. The proposed framework enables a Human-Centric Cyber-Physical System using an interface to allow building managers to interact with occupants. The interface is designed for occupant engagement-integration supporting learning of their preferences over resources in addition to understanding how preferences change as a function of external stimuli such as physical control, time or incentives. Towards intelligent and autonomous incentive design, a noble statistical learning algorithm performing occupants energy usage behavior segmentation is proposed. We apply the proposed algorithm, Graphical Lasso, on energy resource usage data by the occupants to obtain feature correlations--dependencies. Segmentation analysis results in characteristic clusters demonstrating different energy usage behaviors. The features--factors characterizing human decision-making are made explainable.
Finding Appropriate Traffic Regulations via Graph Convolutional Networks
Iwata, Tomoharu, Otsuka, Takuma, Shimizu, Hitoshi, Sawada, Hiroshi, Naya, Futoshi, Ueda, Naonori
Crowd simulators have been used to find appropriate regulations by simulating multiple scenarios with different regulations. However, this approach requires multiple simulation runs, which are time-consuming. In this paper, we propose a method to learn a function that outputs regulation effects given the current traffic situation as inputs. If the function is learned using the training data of many simulation runs in advance, we can obtain an appropriate regulation efficiently by bypassing simulations for the current situation. We use the graph convolutional networks for modeling the function, which enable us to find regulations even for unseen areas. With the proposed method, we construct a graph for each area, where a node represents a road, and an edge represents the road connection. By running crowd simulations with various regulations on various areas, we generate traffic situations and regulation effects. The graph convolutaional networks are trained to output the regulation effects given the graph with the traffic situation information as inputs. With experiments using real-world road networks and a crowd simulator, we demonstrate that the proposed method can find a road to close that reduces the average time needed to reach the destination.
A Review on Learning Planning Action Models for Socio-Communicative HRI
Arora, Ankuj, Fiorino, Humbert, Pellier, Damien, Pesty, Sylvie
For social robots to be brought more into widespread use in the fields of companionship, care taking and domestic help, they must be capable of demonstrating social intelligence. In order to be acceptable, they must exhibit socio-communicative skills. Classic approaches to program HRI from observed human-human interactions fails to capture the subtlety of multimodal interactions as well as the key structural differences between robots and humans. The former arises due to a difficulty in quantifying and coding multimodal behaviours, while the latter due to a difference of the degrees of liberty between a robot and a human. However, the notion of reverse engineering from multimodal HRI traces to learn the underlying behavioral blueprint of the robot given multimodal traces seems an option worth exploring. With this spirit, the entire HRI can be seen as a sequence of exchanges of speech acts between the robot and human, each act treated as an action, bearing in mind that the entire sequence is goal-driven. Thus, this entire interaction can be treated as a sequence of actions propelling the interaction from its initial to goal state, also known as a plan in the domain of AI planning. In the same domain, this action sequence that stems from plan execution can be represented as a trace. AI techniques, such as machine learning, can be used to learn behavioral models (also known as symbolic action models in AI), intended to be reusable for AI planning, from the aforementioned multimodal traces. This article reviews recent machine learning techniques for learning planning action models which can be applied to the field of HRI with the intent of rendering robots as socio-communicative.
Multi-Agent Actor-Critic with Generative Cooperative Policy Network
Ryu, Heechang, Shin, Hayong, Park, Jinkyoo
We propose an efficient multi-agent reinforcement learning approach to derive equilibrium strategies for multi-agents who are participating in a Markov game. Mainly, we are focused on obtaining decentralized policies for agents to maximize the performance of a collaborative task by all the agents, which is similar to solving a decentralized Markov decision process. We propose to use two different policy networks: (1) decentralized greedy policy network used to generate greedy action during training and execution period and (2) generative cooperative policy network (GCPN) used to generate action samples to make other agents improve their objectives during training period. We show that the samples generated by GCPN enable other agents to explore the policy space more effectively and favorably to reach a better policy in terms of achieving the collaborative tasks.
Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation
Jiang, Jiechuan, Dun, Chen, Lu, Zongqing
Learning to cooperate is crucially important in multi-agent reinforcement learning. The key is to take the influence of other agents into consideration when performing distributed decision making. However, multi-agent environment is highly dynamic, which makes it hard to learn abstract representations of influences between agents by only low-order features that existing methods exploit. In this paper, we propose a graph convolutional model for multi-agent cooperation. The graph convolution architecture adapts to the dynamics of the underlying graph of the multi-agent environment, where the influence among agents is captured by their abstract relation representations. High-order features extracted by relation kernels of convolutional layers from gradually increased receptive fields are exploited to learn cooperative strategies. The gradient of an agent not only backpropagates to itself but also to other agents in its receptive fields to reinforce the learned cooperative strategies. Moreover, the relation representations are temporally regularized to make the cooperation more consistent. Empirically, we show that our model enables agents to develop more cooperative and sophisticated strategies than existing methods in jungle and battle games and routing in packet switching networks.
Top 10 Best Artificial Intelligence Masters Degree Programs in the World
In spite of the fact that the idea of Artificial Intelligence has been around for a long time, it is just in the most recent years that it has gotten on the tech charts and is trending in each and every industry conceivable. Getting to be noticeably extraordinary compared to other cherished techs among the ingenious minds all over the world, Artificial Intelligence demands a mix of computer science, mathematics, cognitive psychology, and engineering. There is no doubt about that soon the demand for experts prepared in Artificial Intelligence would beat supply. In spite of the fact that there is some overlap of Artificial Intelligence with analytics, a capable Artificial Intelligence expert would have profound knowledge on spheres like computer vision, natural language processing, robotics automation, and machine learning. Artificial Intelligence education is still in its youthful days.
Malicious Web Domain Identification using Online Credibility and Performance Data by Considering the Class Imbalance Issue
Hu, Zhongyi, Chiong, Raymond, Pranata, Ilung, Bao, Yukun, Lin, Yuqing
Purpose: Malicious web domain identification is of significant importance to the security protection of Internet users. With online credibility and performance data, this paper aims to investigate the use of machine learning tech-niques for malicious web domain identification by considering the class imbalance issue (i.e., there are more benign web domains than malicious ones). Design/methodology/approach: We propose an integrated resampling approach to handle class imbalance by combining the Synthetic Minority Over-sampling TEchnique (SMOTE) and Particle Swarm Optimisation (PSO), a population-based meta-heuristic algorithm. We use the SMOTE for over-sampling and PSO for under-sampling. Findings: By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain datasets with different imbalance ratios. Com-pared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications: This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains, but also provides an effective resampling approach for handling the class imbal-ance issue in the area of malicious web domain identification. Originality/value: Online credibility and performance data is applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class im-balance issue. The performance of the proposed approach is confirmed based on real-world datasets with different imbalance ratios.
Intrinsic Social Motivation via Causal Influence in Multi-Agent RL
Jaques, Natasha, Lazaridou, Angeliki, Hughes, Edward, Gulcehre, Caglar, Ortega, Pedro A., Strouse, DJ, Leibo, Joel Z., de Freitas, Nando
We derive a new intrinsic social motivation for multi-agent reinforcement learning (MARL), in which agents are rewarded for having causal influence over another agent's actions. Causal influence is assessed using counterfactual reasoning. The reward does not depend on observing another agent's reward function, and is thus a more realistic approach to MARL than taken in previous work. We show that the causal influence reward is related to maximizing the mutual information between agents' actions. We test the approach in challenging social dilemma environments, where it consistently leads to enhanced cooperation between agents and higher collective reward. Moreover, we find that rewarding influence can lead agents to develop emergent communication protocols. We therefore employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward. Finally, we show that influence can be computed by equipping each agent with an internal model that predicts the actions of other agents. This allows the social influence reward to be computed without the use of a centralised controller, and as such represents a significantly more general and scalable inductive bias for MARL with independent agents.
Fairness for Whom? Critically reframing fairness with Nash Welfare Product
Recent studies on disparate impact in machine learning applications have sparked a debate around the concept of fairness along with attempts to formalize its different criteria. Many of these approaches focus on reducing prediction errors while maximizing sole utility of the institution. This work seeks to reconceptualize and critically frame the existing discourse on fairness by underlining the implicit biases embedded in common understandings of fairness in the literature and how they contrast with its corresponding economic and legal definitions. This paper expands the concept of utility and fairness by bringing in concepts from established literature in welfare economics and game theory. We then translate these concepts for the algorithmic prediction domain by defining a formalization of Nash Welfare Product that seeks to expand utility by collapsing that of the institution using the prediction tool and the individual subject to the prediction into one function. We then apply a modulating function that makes the fairness and welfare trade-offs explicit based on designated policy goals and then apply it to a temporal model to take into account the effects of decisions beyond the scope of one-shot predictions. We apply this on a binary classification problem and present results of a multi-epoch simulation based on the UCI Adult Income dataset and a test case analysis of the ProPublica recidivism dataset that show that expanding the concept of utility results in a fairer distribution correcting for the embedded biases in the dataset without sacrificing the classifier accuracy.