Goto

Collaborating Authors

 Agents


A Privacy-Preserving and Trustable Multi-agent Learning Framework

arXiv.org Artificial Intelligence

Distributed multi-agent learning enables agents to cooperatively train a model without requiring to share their datasets. While this setting ensures some level of privacy, it has been shown that, even when data is not directly shared, the training process is vulnerable to privacy attacks including data reconstruction and model inversion attacks. Additionally, malicious agents that train on inverted labels or random data, may arbitrarily weaken the accuracy of the global model. This paper addresses these challenges and presents Privacy-preserving and trustable Distributed Learning (PT-DL), a fully decentralized framework that relies on Differential Privacy to guarantee strong privacy protections of the agents' data, and Ethereum smart contracts to ensure trustability. The paper shows that PT-DL is resilient up to a 50% collusion attack, with high probability, in a malicious trust model and the experimental evaluation illustrates the benefits of the proposed model as a privacy-preserving and trustable distributed multi-agent learning system on several classification tasks.


Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

arXiv.org Artificial Intelligence

In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice. We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant. Finally, we define a new multi-objective distributional tabular reinforcement learning (MOT-DRL) algorithm to learn the ESR set in a multi-objective multi-armed bandit setting.


Ebola Optimization Search Algorithm (EOSA): A new metaheuristic algorithm based on the propagation model of Ebola virus disease

arXiv.org Artificial Intelligence

The Ebola virus and the disease in effect tend to randomly move individuals in the population around susceptible, infected, quarantined, hospitalized, recovered, and dead sub-population. Motivated by the effectiveness in propagating the disease through the virus, a new bio-inspired and population-based optimization algorithm is proposed. This paper presents a novel metaheuristic algorithm named Ebola optimization algorithm (EOSA). To correctly achieve this, this study models the propagation mechanism of the Ebola virus disease, emphasising all consistent states of the propagation. The model was further represented using a mathematical model based on first-order differential equations. After that, the combined propagation and mathematical models were adapted for developing the new metaheuristic algorithm. To evaluate the proposed method's performance and capability compared with other optimization methods, the underlying propagation and mathematical models were first investigated to determine how they successfully simulate the EVD. Furthermore, two sets of benchmark functions consisting of forty-seven (47) classical and over thirty (30) constrained IEEE CEC-2017 benchmark functions are investigated numerically. The results indicate that the performance of the proposed algorithm is competitive with other state-of-the-art optimization methods based on scalability analysis, convergence analysis, and sensitivity analysis. Extensive simulation results indicate that the EOSA outperforms other state-of-the-art popular metaheuristic optimization algorithms such as the Particle Swarm Optimization Algorithm (PSO), Genetic Algorithm (GA), and Artificial Bee Colony Algorithm (ABC) on some shifted, high dimensional and large search range problems.


What Artificial Intelligence Still Can't Do

#artificialintelligence

Today's artificial intelligence remains a long way from the supple, dynamic intelligence of AI ... [ ] characters from popular fiction, like The Jetsons. Modern artificial intelligence is capable of wonders. It can produce breathtaking original content: poetry, prose, images, music, human faces. Last year it produced a solution to the "protein folding problem," a grand challenge in biology that has stumped researchers for half a century. Yet today's AI still has fundamental limitations. Relative to what we would expect from a truly intelligent agent--relative to that original inspiration and benchmark for artificial intelligence, human cognition--AI has a long way to go. Critics like to point to these shortcomings as evidence that the pursuit of artificial intelligence is misguided or has failed.


Field Estimation using Robotic Swarms through Bayesian Regression and Mean-Field Feedback

arXiv.org Artificial Intelligence

Recent years have seen an increased interest in using mean-field density based modelling and control strategy for deploying robotic swarms. In this paper, we study how to dynamically deploy the robots subject to their physical constraints to efficiently measure and reconstruct certain unknown spatial field (e.g. the air pollution index over a city). Specifically, the evolution of the robots' density is modelled by mean-field partial differential equations (PDEs) which are uniquely determined by the robots' individual dynamics. Bayesian regression models are used to obtain predictions and return a variance function that represents the confidence of the prediction. We formulate a PDE constrained optimization problem based on this variance function to dynamically generate a reference density signal which guides the robots to uncertain areas to collect new data, and design mean-field feedback-based control laws such that the robots' density converges to this reference signal. We also show that the proposed feedback law is robust to density estimation errors in the sense of input-to-state stability. Simulations are included to verify the effectiveness of the algorithms.


The Impact of Network Connectivity on Collective Learning

arXiv.org Artificial Intelligence

In decentralised autonomous systems it is the interactions between individual agents which govern the collective behaviours of the system. These local-level interactions are themselves often governed by an underlying network structure. These networks are particularly important for collective learning and decision-making whereby agents must gather evidence from their environment and propagate this information to other agents in the system. Models for collective behaviours may often rely upon the assumption of total connectivity between agents to provide effective information sharing within the system, but this assumption may be ill-advised. In this paper we investigate the impact that the underlying network has on performance in the context of collective learning. Through simulations we study small-world networks with varying levels of connectivity and randomness and conclude that totally-connected networks result in higher average error when compared to networks with less connectivity. Furthermore, we show that networks of high regularity outperform networks with increasing levels of random connectivity.


Large-scale, Dynamic and Distributed Coalition Formation with Spatial and Temporal Constraints

arXiv.org Artificial Intelligence

The Coalition Formation with Spatial and Temporal constraints Problem (CFSTP) is a multi-agent task allocation problem in which few agents have to perform many tasks, each with its deadline and workload. To maximize the number of completed tasks, the agents need to cooperate by forming, disbanding and reforming coalitions. The original mathematical programming formulation of the CFSTP is difficult to implement, since it is lengthy and based on the problematic Big-M method. In this paper, we propose a compact and easy-to-implement formulation. Moreover, we design D-CTS, a distributed version of the state-of-the-art CFSTP algorithm. Using public London Fire Brigade records, we create a dataset with $347588$ tasks and a test framework that simulates the mobilization of firefighters in dynamic environments. In problems with up to $150$ agents and $3000$ tasks, compared to DSA-SDP, a state-of-the-art distributed algorithm, D-CTS completes $3.79\% \pm [42.22\%, 1.96\%]$ more tasks, and is one order of magnitude more efficient in terms of communication overhead and time complexity. D-CTS sets the first large-scale, dynamic and distributed CFSTP benchmark.


Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.


SMASH: a Semantic-enabled Multi-agent Approach for Self-adaptation of Human-centered IoT

arXiv.org Artificial Intelligence

Nowadays, IoT devices have an enlarging scope of activities spanning from sensing, computing to acting and even more, learning, reasoning and planning. As the number of IoT applications increases, these objects are becoming more and more ubiquitous. Therefore, they need to adapt their functionality in response to the uncertainties of their environment to achieve their goals. In Human-centered IoT, objects and devices have direct interactions with human beings and have access to online contextual information. Self-adaptation of such applications is a crucial subject that needs to be addressed in a way that respects human goals and human values. Hence, IoT applications must be equipped with self-adaptation techniques to manage their run-time uncertainties locally or in cooperation with each other. This paper presents SMASH: a multi-agent approach for self-adaptation of IoT applications in human-centered environments. In this paper, we have considered the Smart Home as the case study of smart environments. SMASH agents are provided with a 4-layer architecture based on the BDI agent model that integrates human values with goal-reasoning, planning, and acting. It also takes advantage of a semantic-enabled platform called Home'In to address interoperability issues among non-identical agents and devices with heterogeneous protocols and data formats. This approach is compared with the literature and is validated by developing a scenario as the proof of concept. The timely responses of SMASH agents show the feasibility of the proposed approach in human-centered environments.


SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Value factorisation proves to be a very useful technique in multi-agent reinforcement learning (MARL), but the underlying mechanism is not yet fully understood. This paper explores a theoretic basis for value factorisation. We generalise the Shapley value in the coalitional game theory to a Markov convex game (MCG) and use it to guide value factorisation in MARL. We show that the generalised Shapley value possesses several features such as (1) accurate estimation of the maximum global value, (2) fairness in the factorisation of the global value, and (3) being sensitive to dummy agents. The proposed theory yields a new learning algorithm called Sharpley Q-learning (SHAQ), which inherits the important merits of ordinary Q-learning but extends it to MARL. In comparison with prior-arts, SHAQ has a much weaker assumption (MCG) that is more compatible with real-world problems, but has superior explainability and performance in many cases. We demonstrated SHAQ and verified the theoretic claims on Predator-Prey and StarCraft Multi-Agent Challenge (SMAC).