Agents
Stochastic Dynamic Programming Heuristics for Influence Maximization-Revenue Optimization
The well-known Influence Maximization (IM) problem has been actively studied by researchers over the past decade, with emphasis on marketing and social networks. Existing research have obtained solutions to the IM problem by obtaining the influence spread and utilizing the property of submodularity. This paper is based on a novel approach to the IM problem geared towards optimizing clicks and consequently revenue within anOnline Social Network (OSN). Our approach diverts from existing approaches by adopting a novel, decision-making perspective through implementing Stochastic Dynamic Programming (SDP). Thus, we define a new problem Influence Maximization-Revenue Optimization (IM-RO) and propose SDP as a method in which this problem can be solved. The SDP method has lucrative gains for an advertiser in terms of optimizing clicks and generating revenue however, one drawback to the method is its associated "curse of dimensionality" particularly for problems involving a large state space. Thus, we introduce the Lawrence Degree Heuristic (LDH), Adaptive Hill-Climbing (AHC) and Multistage Particle Swarm Optimization (MPSO) heuristics as methods which are orders of magnitude faster than the SDP method whilst achieving near-optimal results. Through a comparative analysis on various synthetic and real-world networks we present the AHC and LDH as heuristics well suited to to the IM-RO problem in terms of their accuracy, running times and scalability under ideal model parameters. In this paper we present a compelling survey on the SDP method as a practical and lucrative method for spreading information and optimizing revenue within the context of OSNs.
Actively Estimating Crowd Annotation Consensus
Kara, Yunus Emre, Genc, Gaye, Aran, Oya, Akarun, Lale
The rapid growth of storage capacity and processing power has caused machine learning applications to increasingly rely on using immense amounts of labeled data. It has become more important than ever to have fast and inexpensive ways to annotate vast amounts of data. With the emergence of crowdsourcing services, the research direction has gravitated toward putting the wisdom of crowds to better use. Unfortunately, spammers and inattentive annotators pose a threat to the quality and trustworthiness of the consensus. Thus, high quality consensus estimation from crowd annotated data requires a meticulous choice of the candidate annotator and the sample in need of a new annotation. Due to time and budget limitations, it is of utmost importance that this choice is carried out while the annotation collection is in progress. We call this process active crowd-labeling. To this end, we propose an active crowd-labeling approach for actively estimating consensus from continuous-valued crowd annotations. Our method is based on annotator models with unknown parameters, and Bayesian inference is employed to reach a consensus in the form of ordinal, binary, or continuous values. We introduce ranking functions for choosing the candidate annotator and sample pair for requesting an annotation. In addition, we propose a penalizing method for preventing annotator domination, investigate the explore-exploit trade-off for incorporating new annotators into the system, and study the effects of inducing a stopping criterion based on consensus quality. We also introduce the crowd-labeled Head Pose Annotations datasets. Experimental results on the benchmark datasets used in the literature and the Head Pose Annotations datasets suggest that our method provides high-quality consensus by using as few as one fifth of the annotations (~80% cost reduction), thereby providing a budget and time-sensitive solution to the crowd-labeling problem.
Lenient Multi-Agent Deep Reinforcement Learning
Palmer, Gregory, Tuyls, Karl, Bloembergen, Daan, Savani, Rahul
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising
Jin, Junqi, Song, Chengru, Li, Han, Gai, Kun, Wang, Jun, Zhang, Weinan
Real-time advertising allows advertisers to bid for each impression for a visiting user. To optimize a specific goal such as maximizing the revenue led by ad placements, advertisers not only need to estimate the relevance between the ads and user's interests, but most importantly require a strategic response with respect to other advertisers bidding in the market. In this paper, we formulate bidding optimization with multi-agent reinforcement learning. To deal with a large number of advertisers, we propose a clustering method and assign each cluster with a strategic bidding agent. A practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed and implemented to balance the tradeoff between the competition and cooperation among advertisers. The empirical study on our industry-scaled real-world data has demonstrated the effectiveness of our modeling methods. Our results show that a cluster based bidding would largely outperform single-agent and bandit approaches, and the coordinated bidding achieves better overall objectives than the purely self-interested bidding agents.
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
Zhang, Kaiqing, Yang, Zhuoran, Liu, Han, Zhang, Tong, Baลar, Tamer
We consider the problem of \emph{fully decentralized} multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large. Under the decentralized structure, the actor step is performed individually by each agent with no need to infer the policies of others. For the critic step, we propose a consensus update via communication over the network. Our algorithms are fully incremental and can be implemented in an online fashion. Convergence analyses of the algorithms are provided when the value functions are approximated within the class of linear functions. Extensive simulation results with both linear and nonlinear function approximations are presented to validate the proposed algorithms. Our work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees.
A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents
This paper offers a multi-disciplinary review of knowledge acquisition methods in human activity systems. The review captures the degree of involvement of various types of agencies in the knowledge acquisition process, and proposes a classification with three categories of methods: the human agent, the human-inspired agent, and the autonomous machine agent methods. In the first two categories, the acquisition of knowledge is seen as a cognitive task analysis exercise, while in the third category knowledge acquisition is treated as an autonomous knowledge-discovery endeavour. The motivation for this classification stems from the continuous change over time of the structure, meaning and purpose of human activity systems, which are seen as the factor that fuelled researchers' and practitioners' efforts in knowledge acquisition for more than a century. We show through this review that the KA field is increasingly active due to the higher and higher pace of change in human activity, and conclude by discussing the emergence of a fourth category of knowledge acquisition methods, which are based on red-teaming and co-evolution.
Shaping Influence and Influencing Shaping: A Computational Red Teaming Trust-based Swarm Intelligence Model
Tang, Jiangjun, Petraki, Eleni, Abbass, Hussein
Sociotechnical systems are complex systems, where nonlinear interaction among different players can obscure causal relationships. The absence of mechanisms to help us understand how to create a change in the system makes it hard to manage these systems. Influencing and shaping are social operators acting on sociotechnical systems to design a change. However, the two operators are usually discussed in an ad-hoc manner, without proper guiding models and metrics which assist in adopting these models successfully. Moreover, both social operators rely on accurate understanding of the concept of trust. Without such understanding, neither of these operators can create the required level to create a change in a desirable direction. In this paper, we define these concepts in a concise manner suitable for modelling the concepts and understanding their dynamics. We then introduce a model for influencing and shaping and use Computational Red Teaming principles to design and demonstrate how this model operates. We validate the results computationally through a simulation environment to show social influencing and shaping in an artificial society.
Can Swarm Intelligence Solve Humanity's Biggest ...
Artificial intelligence is all the rage, but using swarm intelligence might be the best way to solve the world's biggest problems. Dr. Louis Rosenberg is the Founder & CEO of Unanimous AI, an artificial intelligence company that amplifies human intelligence by building "hive minds" modeled after biological swarms. Learn how swarm intelligence can combine the brainpower of humans and computers to solve humanity's biggest problems. Stream or download the podcast using the player below or find the episode everywhere podcasts are found, including iTunes, Stitcher, and Gretta.
Fair Division via Social Comparison
Abebe, Rediet, Kleinberg, Jon, Parkes, David
In the classical cake cutting problem, a resource must be divided among agents with different utilities so that each agent believes they have received a fair share of the resource relative to the other agents. We introduce a variant of the problem in which we model an underlying social network on the agents with a graph, and agents only evaluate their shares relative to their neighbors' in the network. This formulation captures many situations in which it is unrealistic to assume a global view, and also exposes interesting phenomena in the original problem. Specifically, we say an allocation is locally envy-free if no agent envies a neighbor's allocation and locally proportional if each agent values her own allocation as much as the average value of her neighbor's allocations, with the former implying the latter. While global envy-freeness implies local envy-freeness, global proportionality does not imply local proportionality, or vice versa. A general result is that for any two distinct graphs on the same set of nodes and an allocation, there exists a set of valuation functions such that the allocation is locally proportional on one but not the other. We fully characterize the set of graphs for which an oblivious single-cutter protocol-- a protocol that uses a single agent to cut the cake into pieces --admits a bounded protocol with $O(n^2)$ query complexity for locally envy-free allocations in the Robertson-Webb model. We also consider the price of envy-freeness, which compares the total utility of an optimal allocation to the best utility of an allocation that is envy-free. We show that a lower bound of $\Omega(\sqrt{n})$ on the price of envy-freeness for global allocations in fact holds for local envy-freeness in any connected undirected graph. Thus, sparse graphs surprisingly do not provide more flexibility with respect to the quality of envy-free allocations.
Open AI's Algorithm Can Make These Dots Collaborate to Complete a Task
Artificial intelligence is part of humanity's future, but to get to that society needs to pursue AI responsibly. Though the age of super artificial intelligence could prove to be beneficial to humanity, there seems to be an equal chance that AI could be highly destructive. Billionaire and Tesla CEO Elon Musk have made his opinions quite clear on the future of artificial intelligence stating in an interview, "I think we should be very careful about artificial intelligence. If I had to guess at what our biggest existential threat is, it's probably that. So we need to be very careful. I'm increasingly inclined to think that there should be some regulatory oversight, maybe at the national and international level, just to make sure that we don't do something very foolish."