Goto

Collaborating Authors

 Agents


An Adaptive PID Autotuner for Multicopters with Experimental Results

arXiv.org Artificial Intelligence

This paper develops an adaptive PID autotuner for multicopters, and presents simulation and experimental results. The autotuner consists of adaptive digital control laws based on retrospective cost adaptive control implemented in the PX4 flight stack. A learning trajectory is used to optimize the autopilot during a single flight. The autotuned autopilot is then compared with the default PX4 autopilot by flying a test trajectory constructed using the second-order Hilbert curve. In order to investigate the sensitivity of the autotuner to the quadcopter dynamics, the mass of the quadcopter is varied, and the performance of the autotuned and default autopilot is compared. It is observed that the autotuned autopilot outperforms the default autopilot.


[ICML 2021 Spotlight] DFAC Framework: Factorizing the Value Function via Quantile Mixture for…

#artificialintelligence

In multi-agent reinforcement learning (MARL), the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. One of popular research directions is to enhance the training procedure of fully cooperative and decentralized agents. In the past few years, a number of MARL researchers turned their attention to centralized training with decentralized execution (CTDE). Among these CTDE approaches, value function factorization methods are especially promising in terms of their superior performances and data efficiency. Value function factorization methods introduce the assumption of individual-global-max (IGM) [1], which assumes that each agent's optimal actions result in the optimal joint actions of the entire group. Based on IGM, the total return of a group of agents can be factorized into separate utility functions for each agent.


Modelling the transition to a low-carbon energy supply

arXiv.org Artificial Intelligence

A transition to a low-carbon electricity supply is crucial to limit the impacts of climate change. Reducing carbon emissions could help prevent the world from reaching a tipping point, where runaway emissions are likely. Runaway emissions could lead to extremes in weather conditions around the world -- especially in problematic regions unable to cope with these conditions. However, the movement to a low-carbon energy supply can not happen instantaneously due to the existing fossil-fuel infrastructure and the requirement to maintain a reliable energy supply. Therefore, a low-carbon transition is required, however, the decisions various stakeholders should make over the coming decades to reduce these carbon emissions are not obvious. This is due to many long-term uncertainties, such as electricity, fuel and generation costs, human behaviour and the size of electricity demand. A well choreographed low-carbon transition is, therefore, required between all of the heterogenous actors in the system, as opposed to changing the behaviour of a single, centralised actor. The objective of this thesis is to create a novel, open-source agent-based model to better understand the manner in which the whole electricity market reacts to different factors using state-of-the-art machine learning and artificial intelligence methods. In contrast to other works, this thesis looks at both the long-term and short-term impact that different behaviours have on the electricity market by using these state-of-the-art methods.


Emergent behavior and neural dynamics in artificial agents tracking turbulent plumes

arXiv.org Artificial Intelligence

Tracking a turbulent plume to locate its source is a complex control problem because it requires multi-sensory integration and must be robust to intermittent odors, changing wind direction, and variable plume statistics. This task is routinely performed by flying insects, often over long distances, in pursuit of food or mates. Several aspects of this remarkable behavior have been studied in detail in many experimental studies. Here, we take a complementary in silico approach, using artificial agents trained with reinforcement learning to develop an integrated understanding of the behaviors and neural computations that support plume tracking. Specifically, we use deep reinforcement learning (DRL) to train recurrent neural network (RNN) agents to locate the source of simulated turbulent plumes. Interestingly, the agents' emergent behaviors resemble those of flying insects, and the RNNs learn to represent task-relevant variables, such as head direction and time since last odor encounter. Our analyses suggest an intriguing experimentally testable hypothesis for tracking plumes in changing wind direction -- that agents follow local plume shape rather than the current wind direction. While reflexive short-memory behaviors are sufficient for tracking plumes in constant wind, longer timescales of memory are essential for tracking plumes that switch direction. At the level of neural dynamics, the RNNs' population activity is low-dimensional and organized into distinct dynamical structures, with some correspondence to behavioral modules. Our in silico approach provides key intuitions for turbulent plume tracking strategies and motivates future targeted experimental and theoretical developments.


Algorithmic Information Design in Multi-Player Games: Possibility and Limits in Singleton Congestion

arXiv.org Artificial Intelligence

In today's digital economy, there are numerous situations where many players have to compete for limited resources. For instance, on ride-hailing platforms such as Uber and Lyft, drivers pick an area to go and then compete with other drivers for riding requests at that area; on content platforms such as Youtube and Tiktok, content providers choose a style/theme for their contents and then compete with other providers of the same theme for Internet traffic interested in that theme; on digital markets such as Amazon and Wayfair, retailers choose a particular product category (e.g., pet supplies or home&kitchen, etc.) to focus on and compete with other retailers for sale demands on that category. All these problems share the following similarity: (1) many players make a choice (e.g., a ride-sharing area or a content theme) from multiple options and their payoffs has negative externalities with other players of the same choice due to competition; (2) players have high uncertainty about the payoffs of their choices since the entire system's demand of riding requests or Internet traffic are unknown to an individual player, whereas the system usually has much fined-grained information about these uncertainties. An important operational task common in all these applications is the following: how can the system (the sender) strategically reveal her privileged information to influence the decisions of so many players (the receivers) in order to steer their collective decisions towards a desirable social outcome? This task, also known as information design or persuasion [1, 2, 3, 4], has attracted extensive recent interests.


Fairness Maximization among Offline Agents in Online-Matching Markets

arXiv.org Artificial Intelligence

Matching markets involve heterogeneous agents (typically from two parties) who are paired for mutual benefit. During the last decade, matching markets have emerged and grown rapidly through the medium of the Internet. They have evolved into a new format, called Online Matching Markets (OMMs), with examples ranging from crowdsourcing to online recommendations to ridesharing. There are two features distinguishing OMMs from traditional matching markets. One is the dynamic arrival of one side of the market: we refer to these as online agents while the rest are offline agents. Examples of online and offline agents include keywords (online) and sponsors (offline) in Google Advertising; workers (online) and tasks (offline) in Amazon Mechanical Turk (AMT); riders (online) and drivers (offline when restricted to a short time window) in ridesharing. The second distinguishing feature of OMMs is the real-time decision-making element. However, studies have shown that the algorithms making decisions in these OMMs leave disparities in the match rates of offline agents. For example, tasks in neighborhoods of low socioeconomic status rarely get matched to gig workers, and drivers of certain races/genders get discriminated against in matchmaking. In this paper, we propose online matching algorithms which optimize for either individual or group-level fairness among offline agents in OMMs. We present two linear-programming (LP) based sampling algorithms, which achieve online competitive ratios at least 0.725 for individual fairness maximization (IFM) and 0.719 for group fairness maximization (GFM), respectively. We conduct extensive numerical experiments and results show that our boosted version of sampling algorithms are not only conceptually easy to implement but also highly effective in practical instances of fairness-maximization-related models.


Stanford Releases Report on the Current State of AI

#artificialintelligence

Artificial intelligence (AI) has significantly advanced in the past half decade and is making major inroads across many industries and sectors worldwide. Earlier this month, Stanford University released The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report. The new Stanford AI100 report is the second in a series following the inaugural AI100 report published five years ago in September 2016. Stanford plans to continue to publish the A1100 report once every five years for a hundred years or longer. "The field of artificial intelligence has made remarkable progress in the past five years and is having real-world impact on people, institutions and culture," the researchers wrote.


Towards a Multi-Agent System Architecture for Supply Chain Management

arXiv.org Artificial Intelligence

Individual business processes have been changing since the Internet was created, and they are now oriented towards a more distributed and collaborative business model, in an e-commerce environment that adapts itself to the competitive and changing market conditions. This paper presents a multi-agent system architecture for supply chain management, which explores different strategies and offers solutions in a distributed e-commerce environment. The system is designed to support different types of interfaces, which allow interoperating with other business models already developed. In order to show how the entire multi-agent system is being developed, the implementation of a collaborative agent is presented and explained.


A dynamic programming algorithm for informative measurements and near-optimal path-planning

arXiv.org Artificial Intelligence

An informative measurement is the most efficient way to gain information about an unknown state. We give a first-principles derivation of a general-purpose dynamic programming algorithm that returns a sequence of informative measurements by sequentially maximizing the entropy of possible measurement outcomes. This algorithm can be used by an autonomous agent or robot to decide where best to measure next, planning a path corresponding to an optimal sequence of informative measurements. This algorithm is applicable to states and controls that are continuous or discrete, and agent dynamics that is either stochastic or deterministic; including Markov decision processes. Recent results from approximate dynamic programming and reinforcement learning, including on-line approximations such as rollout and Monte Carlo tree search, allow an agent or robot to solve the measurement task in real-time. The resulting near-optimal solutions include non-myopic paths and measurement sequences that can generally outperform, sometimes substantially, commonly-used greedy heuristics such as maximizing the entropy of each measurement outcome. This is demonstrated for a global search problem, where on-line planning with an extended local search is found to reduce the number of measurements in the search by half.


Worst-case Bounds on Power vs. Proportion in Weighted Voting Games with an Application to False-name Manipulation

Journal of Artificial Intelligence Research

Weighted voting games apply to a wide variety of multi-agent settings. They enable the formalization of power indices which quantify the coalitional power of players. We take a novel approach to the study of the power of big vs. small players in these games. We model small (big) players as having single (multiple) votes. The aggregate relative power of big players is measured w.r.t. their votes proportion. For this ratio, we show small constant worst-case bounds for the Shapley-Shubik and the Deegan-Packel indices. In sharp contrast, this ratio is unbounded for the Banzhaf index. As an application, we define a false-name strategic normal form game where each big player may split its votes between false identities, and study its various properties. Together, our results provide foundations for the implications of players’ size, modeled as their ability to split, on their relative power.