Agents
Empowerment -- an Introduction
Salge, Christoph, Glackin, Cornelius, Polani, Daniel
This book chapter is an introduction to and an overview of the information-theoretic, task independent utility function "Empowerment", which is defined as the channel capacity between an agent's actions and an agent's sensors. It quantifies how much influence and control an agent has over the world it can perceive. This book chapter discusses the general idea behind empowerment as an intrinsic motivation and showcases several previous applications of empowerment to demonstrate how empowerment can be applied to different sensor-motor configuration, and how the same formalism can lead to different observed behaviors. Furthermore, we also present a fast approximation for empowerment in the continuous domain.
Double four-bar crank-slider mechanism dynamic balancing by meta-heuristic algorithms
Emdadi, Habib, Yazdanian, Mahsa, Ettefagh, Mir Mohammad, Feizi-Derakhshi, Mohammad-Reza
In this paper, a new method for dynamic balancing of double four-bar crank slider mechanism by meta- heuristic-based optimization algorithms is proposed. For this purpose, a proper objective function which is necessary for balancing of this mechanism and corresponding constraints has been obtained by dynamic modeling of the mechanism. Then PSO, ABC, BGA and HGAPSO algorithms have been applied for minimizing the defined cost function in optimization step. The optimization results have been studied completely by extracting the cost function, fitness, convergence speed and runtime values of applied algorithms. It has been shown that PSO and ABC are more efficient than BGA and HGAPSO in terms of convergence speed and result quality. Also, a laboratory scale experimental doublefour-bar crank-slider mechanism was provided for validating the proposed balancing method practically.
Online Learning of Dynamic Parameters in Social Networks
Shahrampour, Shahin, Rakhlin, Alexander, Jadbabaie, Ali
This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time.
Exponentially Fast Parameter Estimation in Networks Using Distributed Dual Averaging
Shahrampour, Shahin, Jadbabaie, Ali
In this paper we present an optimization-based view of distributed parameter estimation and observational social learning in networks. Agents receive a sequence of random, independent and identically distributed (i.i.d.) signals, each of which individually may not be informative about the underlying true state, but the signals together are globally informative enough to make the true state identifiable. Using an optimization-based characterization of Bayesian learning as proximal stochastic gradient descent (with Kullback-Leibler divergence from a prior as a proximal function), we show how to efficiently use a distributed, online variant of Nesterov's dual averaging method to solve the estimation with purely local information. When the true state is globally identifiable, and the network is connected, we prove that agents eventually learn the true parameter using a randomized gossip scheme. We demonstrate that with high probability the convergence is exponentially fast with a rate dependent on the KL divergence of observations under the true state from observations under the second likeliest state. Furthermore, our work also highlights the possibility of learning under continuous adaptation of network which is a consequence of employing constant, unit stepsize for the algorithm.
A solution concept for games with altruism and cooperation
Over the years, numerous experiments have been accumulated to show that cooperation is not casual and depends on the payoffs of the game. These findings suggest that humans have attitude to cooperation by nature and the same person may act more or less cooperatively depending on the particular payoffs. In other words, people do not act a priori as single agents, but they forecast how the game would be played if they formed coalitions and then they play according to their best forecast. In this paper we formalize this idea and we define a new solution concept for one-shot normal form games. We prove that this \emph{cooperative equilibrium} exists for all finite games and it explains a number of different experimental findings, such as (1) the rate of cooperation in the Prisoner's dilemma depends on the cost-benefit ratio; (2) the rate of cooperation in the Traveler's dilemma depends on the bonus/penalty; (3) the rate of cooperation in the Publig Goods game depends on the pro-capite marginal return and on the numbers of players; (4) the rate of cooperation in the Bertrand competition depends on the number of players; (5) players tend to be fair in the bargaining problem; (6) players tend to be fair in the Ultimatum game; (7) players tend to be altruist in the Dictator game; (8) offers in the Ultimatum game are larger than offers in the Dictator game.
Regret-Based Multi-Agent Coordination with Uncertain Task Rewards
Wu, Feng, Jennings, Nicholas R.
Many multi-agent coordination problems can be represented as DCOPs. Motivated by task allocation in disaster response, we extend standard DCOP models to consider uncertain task rewards where the outcome of completing a task depends on its current state, which is randomly drawn from unknown distributions. The goal of solving this problem is to find a solution for all agents that minimizes the overall worst-case loss. This is a challenging problem for centralized algorithms because the search space grows exponentially with the number of agents and is nontrivial for standard DCOP algorithms we have. To address this, we propose a novel decentralized algorithm that incorporates Max-Sum with iterative constraint generation to solve the problem by passing messages among agents. By so doing, our approach scales well and can solve instances of the task allocation problem with hundreds of agents and tasks.
Artificial Intelligence Based Cognitive Routing for Cognitive Radio Networks
Cognitive radio networks (CRNs) are networks of nodes equipped with cognitive radios that can optimize performance by adapting to network conditions. While cognitive radio networks (CRN) are envisioned as intelligent networks, relatively little research has focused on the network level functionality of CRNs. Although various routing protocols, incorporating varying degrees of adaptiveness, have been proposed for CRNs, it is imperative for the long term success of CRNs that the design of cognitive routing protocols be pursued by the research community. Cognitive routing protocols are envisioned as routing protocols that fully and seamless incorporate AI-based techniques into their design. In this paper, we provide a self-contained tutorial on various AI and machine-learning techniques that have been, or can be, used for developing cognitive routing protocols. We also survey the application of various classes of AI techniques to CRNs in general, and to the problem of routing in particular. We discuss various decision making techniques and learning techniques from AI and document their current and potential applications to the problem of routing in CRNs. We also highlight the various inference, reasoning, modeling, and learning sub tasks that a cognitive routing protocol must solve. Finally, open research issues and future directions of work are identified.
Matching Demand with Supply in the Smart Grid using Agent-Based Multiunit Auction
Wijaya, Tri Kurniawan, Larson, Kate, Aberer, Karl
Recent work has suggested reducing electricity generation cost by cutting the peak to average ratio (PAR) without reducing the total amount of the loads. However, most of these proposals rely on consumer's willingness to act. In this paper, we propose an approach to cut PAR explicitly from the supply side. The resulting cut loads are then distributed among consumers by the means of a multiunit auction which is done by an intelligent agent on behalf of the consumer. This approach is also in line with the future vision of the smart grid to have the demand side matched with the supply side. Experiments suggest that our approach reduces overall system cost and gives benefit to both consumers and the energy provider.
Computational Rationalization: The Inverse Equilibrium Problem
Waugh, Kevin, Ziebart, Brian D., Bagnell, J. Andrew
Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the single-agent decision-theoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multi-agent domains. Here, unlike single-agent settings, a player cannot myopically maximize its reward; it must speculate on how the other agents may act to influence the game's outcome. Employing the game-theoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior.