Goto

Collaborating Authors

 Agents


Learning to Gather without Communication

arXiv.org Machine Learning

A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.


Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

arXiv.org Machine Learning

In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.


Personalized and Private Peer-to-Peer Machine Learning

arXiv.org Machine Learning

The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate. We show how to make the algorithm differentially private to protect against the disclosure of information about the personal datasets, and formally analyze the trade-off between utility and privacy. Our experiments show that our approach dramatically outperforms previous work in the non-private case, and that under privacy constraints, we can significantly improve over models learned in isolation.


Artificial Intelligence, Smart Contract and Islamic Finance - IslamicBanker.com

#artificialintelligence

Accepted: January 16, 2018 Online Published: January 29, 2018 URL: https://doi.org/10.5539/ass.v14n2p145 Abstract This study examines the two important aspect of latest technology issues in Islamic finance that related to artificial intelligence (AI) and smart contract. AI refers to the ability of machines to understand, think, and learn in a similar way to human beings, indicating the possibility of using computers to simulate human intelligence. Smart contract is a computer code running on top of a block-chain containing a set of rules under which the parties to that smart contract agree to interact with each other. The main objectives of this article are to evaluate the operations of AI and smart contract, to make comparison between the operations of AI and smart contract. This article concludes that AI and smart contract will have a huge impact in future for Islamic Finance industry. Keywords: Artificial intelligence (AI), smart contract, digital banking, Islamic Finance 1. Preliminary Artificial Intelligence (AI) is the intelligence machines that have the ability to think. At this point, Artificial Intelligence (AI) offers rapid advancement in technology that mimic human intelligence. It's believed that intelligence machines associated with human thinking activities such as decision making and problem solving learning. Professor J. McCarthy (1955) established the concept of artificial intelligence (AI) during the first artificial intelligence conference at Dartmouth conferences in year 1956. This evolution confirm by Bogue (2014) who described Artificial intelligence (AI) as an intelligent agent system that takes actions in maximize the chances of success in a particular task. Pan (2016) revealed that AI becomes extremely critical when it applies to the technology. According to research report on artificial intelligence, this market is expected to be worth $16.06 billion by 2022. This market is expected to grow at 62.9% compound annual growth rate (CAGR) from 2016 to 2022 (Research and Markets, 2017). In 25th April 2016 to 27th May 2016, a special report under the subheading "Outlook on Artificial Intelligence in the Enterprise 2016" have been produce by Narrative Science in collaboration with National Business Research Institute. This report deployed an online survey with a total of 235 respondents.


Sim-To-Real Optimization Of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

arXiv.org Machine Learning

Mobile network that millions of people use every day is one of the most complex systems in real world. Optimization of mobile network to meet exploding customer demand and reduce CAPEX/OPEX poses greater challenges than in prior works. Learning to solve complex problems in real world to benefit everyone and make the world better has long been ultimate goal of AI. However, it still remains an unsolved problem for deep reinforcement learning (DRL), given imperfect information in real world, huge state/action space, lots of data needed for training, associated time/cost, multi-agent interactions, potential negative impact to real world, etc. To bridge this reality gap, we proposed a DRL framework to direct transfer optimal policy learned from multi-tasks in source domain to unseen similar tasks in target domain without any further training in both domains. First, we distilled temporal-spatial relationships between cells and mobile users to scalable 3D image-like tensor to best characterize partially observed mobile network. Second, inspired by AlphaGo, we used a novel self-play mechanism to empower DRL agent to gradually improve its intelligence by competing for best record on multiple tasks. Third, a decentralized DRL method is proposed to coordinate multi-agents to compete and cooperate as a team to maximize global reward and minimize potential negative impact. Using 7693 unseen test tasks over 160 unseen simulated mobile networks and 6 field trials over 4 commercial mobile networks in real world, we demonstrated the capability of our approach to direct transfer the learning from one simulator to another simulator, and from simulation to real world. This is the first time that a DRL agent successfully transfers its learning directly from simulation to very complex real world problems with incomplete and imperfect information, huge state/action space and multi-agent interactions.


Modeling the Formation of Social Conventions in Multi-Agent Populations

arXiv.org Machine Learning

In order to understand the formation of social conventions we need to know the specific role of control and learning in multi-agent systems. To advance in this direction, we propose, within the framework of the Distributed Adaptive Control (DAC) theory, a novel Control-based Reinforcement Learning architecture (CRL) that can account for the acquisition of social conventions in multi-agent populations that are solving a benchmark social decision-making problem. Our new CRL architecture, as a concrete realization of DAC multi-agent theory, implements a low-level sensorimotor control loop handling the agent's reactive behaviors (pre-wired reflexes), along with a layer based on model-free reinforcement learning that maximizes long-term reward. We apply CRL in a multi-agent game-theoretic task in which coordination must be achieved in order to find an optimal solution. We show that our CRL architecture is able to both find optimal solutions in discrete and continuous time and reproduce human experimental data on standard game-theoretic metrics such as efficiency in acquiring rewards, fairness in reward distribution and stability of convention formation.


Mean Field Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of user interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution. Experiments on resource allocation, Ising model estimation, and battle game tasks verify the learning effectiveness of our mean field approaches in handling many-agent interactions in population.


Generating Plans that Predict Themselves

arXiv.org Artificial Intelligence

Collaboration requires coordination, and we coordinate by anticipating our teammates' future actions and adapting to their plan. In some cases, our teammates' actions early on can give us a clear idea of what the remainder of their plan is, i.e. what action sequence we should expect. In others, they might leave us less confident, or even lead us to the wrong conclusion. Our goal is for robot actions to fall in the first category: we want to enable robots to select their actions in such a way that human collaborators can easily use them to correctly anticipate what will follow. While previous work has focused on finding initial plans that convey a set goal, here we focus on finding two portions of a plan such that the initial portion conveys the final one. We introduce $t$-\ACty{}: a measure that quantifies the accuracy and confidence with which human observers can predict the remaining robot plan from the overall task goal and the observed initial $t$ actions in the plan. We contribute a method for generating $t$-predictable plans: we search for a full plan that accomplishes the task, but in which the first $t$ actions make it as easy as possible to infer the remaining ones. The result is often different from the most efficient plan, in which the initial actions might leave a lot of ambiguity as to how the task will be completed. Through an online experiment and an in-person user study with physical robots, we find that our approach outperforms a traditional efficiency-based planner in objective and subjective collaboration metrics.


Specification Inference from Demonstrations

arXiv.org Artificial Intelligence

Learning from expert demonstrations has received a lot of attention in artificial intelligence and machine learning. The goal is to infer the underlying reward function that an agent is optimizing given a set of observations of the agent's behavior over time in a variety of circumstances, the system state trajectories, and a plant model specifying the evolution of the system state for different agent's actions. The system is often modeled as a Markov decision process, that is, the next state depends only on the current state and agent's action, and the the agent's choice of action depends only on the current state. While the former is a Markovian assumption on the evolution of system state, the later assumes that the target reward function is itself Markovian. In this work, we explore learning a class of non-Markovian reward functions, known in the formal methods literature as specifications. These specifications offer better composition, transferability, and interpretability. We then show that inferring the specification can be done efficiently without unrolling the transition system. We demonstrate on a 2-d grid world example.


Artificial Intelligence (AI) Business Directory – Adaptive Toolbox

#artificialintelligence

This is a list of key companies worldwide with products, services, and applications in the fields related to the Artificial Intelligence (AI) & Optimization. Feel free to submit a listing and maintain it for your own business. The listing service is free. Any additional questions and comments should be submitted here. Typical AI fields include, but not limited to: Machine Learning (ML), Deep Learning, Cognitive Computing, Natural Language Processing (NLP), Computer Vision, Pattern Recognition, Autonomous Agents and Multi-Agent Systems, Automated Planning and Scheduling, Robotics, Predictive Analytics, etc.