Agents
Learning Model Predictive Control for Competitive Autonomous Racing
The goal of this thesis is to design a learning model predictive controller (LMPC) that allows multiple agents to race competitively on a predefined race track in real-time. This thesis addresses two major shortcomings in the already existing single-agent formulation. Previously, the agent determines a locally optimal trajectory but does not explore the state space, which may be necessary for overtaking maneuvers. Additionally, obstacle avoidance for LMPC has been achieved in the past by using a non-convex terminal set, which increases the complexity for determining a solution to the optimization problem. The proposed algorithm for multi-agent racing explores the state space by executing the LMPC for multiple different initializations, which yields a richer terminal safe set. Furthermore, a new method for selecting states in the terminal set is developed, which keeps the convexity for the terminal safe set and allows for taking suboptimal states.
Multi-consensus Decentralized Accelerated Gradient Descent
Ye, Haishan, Luo, Luo, Zhou, Ziang, Zhang, Tong
This paper considers the decentralized optimization problem, which has applications in large scale machine learning, sensor networks, and control theory. We propose a novel algorithm that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Moreover, the proposed algorithm achieves the optimal computation complexity matching the lower bound up to universal constants. Furthermore, to achieve a linear convergence rate, our algorithm \emph{doesn't} require the individual functions to be (strongly) convex. Our method relies on a novel combination of known techniques including Nesterov's accelerated gradient descent, multi-consensus and gradient-tracking. The analysis is new, and may be applied to other related problems. Empirical studies demonstrate the effectiveness of our method for machine learning applications.
Salesforce's AI Economist taps reinforcement learning to generate optimal tax policies
Salesforce today announced the AI Economist, a research environment designed to elucidate how economic design might be improved with techniques from the field of AI and machine learning. The goal is to help economists, governments, and others design tax policies that optimize not only productivity and conservation, but that promote widespread, whole-country social equality. Studies have shown that income inequality gaps can negatively impact economic growth, economic opportunity, and even health. For example, over-taxation can discourage people from working, leading to lower productivity. But it's difficult to experiment with tax policies in the real world, at least in part because economic theory relies on stylized assumptions that are tough to validate, like people's sensitivity to taxes. The AI Economist, then, learns the best tax policies from simulations in which citizens and a government adapt and learn.
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Lei, Deren, Jiang, Gangrong, Gu, Xiaotao, Sun, Kexuan, Mao, Yuning, Ren, Xiang
Walk-based models have shown their unique advantages in knowledge graph (KG) reasoning by achieving state-of-the-art performance while allowing for explicit visualization of the decision sequence. However, the sparse reward signals offered by the KG during a traversal are often insufficient to guide a sophisticated reinforcement learning (RL) model. An alternate approach to KG reasoning is using traditional symbolic methods (e.g., rule induction), which achieve high precision without learning but are hard to generalize due to the limitation of symbolic representation. In this paper, we propose to fuse these two paradigms to get the best of both worlds. Our method leverages high-quality rules generated by symbolic-based methods to provide reward supervision for walk-based agents. Due to the structure of symbolic rules with their entity variables, we can separate our walk-based agent into two sub-agents thus allowing for additional efficiency. Experiments on public datasets demonstrate that walk-based models can benefit from rule guidance significantly.
Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning
Smart modular freight containers -- as propagated in the Physical Internet paradigm -- are equipped with sensors, data storage capability and intelligence that enable them to route themselves from origin to destination without manual intervention or central governance. In this self-organizing setting, containers can autonomously place bids on transport services in a spot market setting. However, for individual containers it may be difficult to learn good bidding policies due to limited observations. By sharing information and costs between one another, smart containers can jointly learn bidding policies, even though simultaneously competing for the same transport capacity. We replicate this behavior by learning stochastic bidding policies in a semi-cooperative multi agent setting. To this end, we develop a reinforcement learning algorithm based on the policy gradient framework. Numerical experiments show that sharing solely bids and acceptance decisions leads to stable bidding policies. Additional system information only marginally improves performance; individual job properties suffice to place appropriate bids. Furthermore, we find that carriers may have incentives not to share information with the smart containers. The experiments give rise to several directions for follow-up research, in particular the interaction between smart containers and transport services in self-organizing logistics.
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Majumdar, Arjun, Shrivastava, Ayush, Lee, Stefan, Anderson, Peter, Parikh, Devi, Batra, Dhruv
Following a navigation instruction such as'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e.g.'stairs') to visual content in the environment (pixels corresponding to'stairs'). We ask the following question - can we leverage abundant'disembodied' web-scraped vision-and-language corpora (e.g. Conceptual Captions [24]) to learn visual groundings (what do'stairs' look like?) that improve performance on a relatively data-starved embodied perception task (Visionand-Language Navigation)? Specifically, we develop VLN-BERT, a visiolinguistic transformer-based model for scoring the compatibility between an instruction ('...stop at the brown sofa') and a sequence of panoramic RGB images captured by the agent. We demonstrate that pretraining VLN-BERT on image-text pairs from the web before fine-tuning on embodied path-instruction data significantly improves performance on VLN - outperforming the prior state-of-the-art in the fully-observed setting by 4 absolute percentage points on success rate. Ablations of our pretraining curriculum show each stage to be impactful - with their combination resulting in further positive synergistic effects.
Hierarchically Fair Federated Learning
Zhang, Jingfeng, Li, Cheng, Robles-Kelly, Antonio, Kankanhalli, Mohan
Traditional machine learning techniques require agents (e.g., mobile devices, terminals, companies, etc.) to upload their data to a central server. This approach not only increases communication between agents and the central server due to the data volume but also entails privacy risks during data transfer or due to a server breach [1]. This is an important concern since data protection regulations impose constraints on sharing of sensitive data. Federated learning, a recent distributed and decentralized machine learning scheme [2] has attracted significant attention. In federated learning, agents maintain their data locally and collaboratively learn a global machine learning model that benefits all. Specifically, each agent sends parameters (or parameters update) of local models to the central server and receives the computed parameters of the global model from the central server. In this way, all agents can jointly train a global model without exposing their own data. This scheme has desirable properties such as privacy-preservation, efficient communication, and decentralized data storage.
PeerNomination: Relaxing Exactness for Increased Accuracy in Peer Selection
Mattei, Nicholas, Turrini, Paolo, Zhydkov, Stanislav
In peer selection agents must choose a subset of themselves for an award or a prize. As agents are self-interested, we want to design algorithms that are impartial, so that an individual agent cannot affect their own chance of being selected. This problem has broad application in resource allocation and mechanism design and has received substantial attention in the artificial intelligence literature. Here, we present a novel algorithm for impartial peer selection, PeerNomination, and provide a theoretical analysis of its accuracy. Our algorithm possesses various desirable features. In particular, it does not require an explicit partitioning of the agents, as previous algorithms in the literature. We show empirically that it achieves higher accuracy than the exiting algorithms over several metrics.
6G White Paper on Edge Intelligence
Peltonen, Ella, Bennis, Mehdi, Capobianco, Michele, Debbah, Merouane, Ding, Aaron, Gil-Castiñeira, Felipe, Jurmu, Marko, Karvonen, Teemu, Kelanti, Markus, Kliks, Adrian, Leppänen, Teemu, Lovén, Lauri, Mikkonen, Tommi, Rao, Ashwin, Samarakoon, Sumudu, Seppänen, Kari, Sroka, Paweł, Tarkoma, Sasu, Yang, Tingting
In this white paper we provide a vision for 6G Edge Intelligence. Moving towards 5G and beyond the future 6G networks, intelligent solutions utilizing data-driven machine learning and artificial intelligence become crucial for several real-world applications including but not limited to, more efficient manufacturing, novel personal smart device environments and experiences, urban computing and autonomous traffic settings. We present edge computing along with other 6G enablers as a key component to establish the future 2030 intelligent Internet technologies as shown in this series of 6G White Papers. In this white paper, we focus in the domains of edge computing infrastructure and platforms, data and edge network management, software development for edge, and real-time and distributed training of ML/AI algorithms, along with security, privacy, pricing, and end-user aspects. We discuss the key enablers and challenges and identify the key research questions for the development of the Intelligent Edge services. As a main outcome of this white paper, we envision a transition from Internet of Things to Intelligent Internet of Intelligent Things and provide a roadmap for development of 6G Intelligent Edge.
Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi
Canaan, Rodrigo, Gao, Xianbo, Togelius, Julian, Nealen, Andy, Menzel, Stefan
Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.