Agents
Layers, Folds, and Semi-Neuronal Information Processing
What role does phenotypic complexity play in the systems-level function of an embodied agent? The organismal phenotype is a topologically complex structure that interacts with a genotype, developmental physics, and an informational environment. Using this observation as inspiration, we utilize a type of embodied agent that exhibits layered representational capacity: meta-brain models. Meta-brains are used to demonstrate how phenotypes process information and exhibit self-regulation from development to maturity. We focus on two candidate structures that potentially explain this capacity: folding and layering. As layering and folding can be observed in a host of biological contexts, they form the basis for our representational investigations. First, an innate starting point (genomic encoding) is described. The generative output of this encoding is a differentiation tree, which results in a layered phenotypic representation. Then we specify a formal meta-brain model of the gut, which exhibits folding and layering in development in addition to different degrees of representation of processed information. This organ topology is retained in maturity, with the potential for additional folding and representational drift in response to inflammation. Next, we consider topological remapping using the developmental Braitenberg Vehicle (dBV) as a toy model. During topological remapping, it is shown that folding of a layered neural network can introduce a number of distortions to the original model, some with functional implications. The paper concludes with a discussion on how the meta-brains method can assist us in the investigation of enactivism, holism, and cognitive processing in the context of biological simulation.
For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Emmons, Scott, Oesterheld, Caspar, Critch, Andrew, Conitzer, Vincent, Russell, Stuart
Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint, asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs.
A Comprehensive Framework for Learning Declarative Action Models
Aineto, Diego | Jimรฉnez, Sergio (Universitat Politรจcnica de Valรจncia) | Onaindia, Eva (Universitat Politรจcnica de Valรจncia)
A declarative action model is a compact representation of the state transitions of dynamic systems that generalizes over world objects. The specification of declarative action models is often a complex hand-crafted task. In this paper we formulate declarative action models via state constraints, and present the learning of such models as a combinatorial search. The comprehensive framework presented here allows us to connect the learning of declarative action models to well-known problem solving tasks. In addition, our framework allows us to characterize the existing work in the literature according to four dimensions: (1) the target action models, in terms of the state transitions they define; (2) the available learning examples; (3) the functions used to guide the learning process, and to evaluate the quality of the learned action models; (4) the learning algorithm. Last, the paper lists relevant successful applications of the learning of declarative actions models and discusses some open challenges with the aim of encouraging future research work.
The Self-Learning Model That Passed The Famous Pommerman Challenge
I recently started an AI-focused educational newsletter, that already has over 125,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. The emergence of trends such as self-driving cars or drones has helped to popularize an area of artificial intelligence(AI) research known as autonomous agents. Conceptually, autonomous agents are AI that builds knowledge in real-time based on the characteristics of their surrounding environment as well as other agents.
Plan Execution for Multi-Agent Path Finding with Indoor Quadcopters
Kulhan, Matouลก, Surynek, Pavel
We study the planning and acting phase for the problem of multi-agent path finding (MAPF) in this paper. MAPF is a problem of navigating agents from their start positions to specified individual goal positions so that agents do not collide with each other. Specifically we focus on executing MAPF plans with a group of Crazyflies, small indoor quadcopters . We show how to modify the existing continuous time conflict-based search algorithm (CCBS) to produce plans that are suitable for execution with the quadcopters. The acting phase uses the the Loco positioning system to check if the plan is executed correctly. Our finding is that the CCBS algorithm allows for extensions that can produce safe plans for quadcopters, namely cylindrical protection zone around each quadcopter can be introduced at the planning level.
Reforming an Envy-Free Matching
Ito, Takehiro, Iwamasa, Yuni, Kakimura, Naonori, Kamiyama, Naoyuki, Kobayashi, Yusuke, Nozaki, Yuta, Okamoto, Yoshio, Ozeki, Kenta
We consider the problem of reforming an envy-free matching when each agent is assigned a single item. Given an envy-free matching, we consider an operation to exchange the item of an agent with an unassigned item preferred by the agent that results in another envy-free matching. We repeat this operation as long as we can. We prove that the resulting envy-free matching is uniquely determined up to the choice of an initial envy-free matching, and can be found in polynomial time. We call the resulting matching a reformist envy-free matching, and then we study a shortest sequence to obtain the reformist envy-free matching from an initial envy-free matching. We prove that a shortest sequence is computationally hard to obtain even when each agent accepts at most four items and each item is accepted by at most three agents. On the other hand, we give polynomial-time algorithms when each agent accepts at most three items or each item is accepted by at most two agents. Inapproximability and fixed-parameter (in)tractability are also discussed.
On the Complexity of Rational Verification
Gutierrez, Julian, Najib, Muhammad, Perelli, Giuseppe, Wooldridge, Michael
Rational verification refers to the problem of checking which temporal logic properties hold of a concurrent multiagent system, under the assumption that agents in the system choose strategies that form a game-theoretic equilibrium. Rational verification can be understood as a counterpart to model checking for multiagent systems, but while classical model checking can be done in polynomial time for some temporal logic specification languages such as CTL, and polynomial space with LTL specifications, rational verification is much harder: the key decision problems for rational verification are 2EXPTIME-complete with LTL specifications, even when using explicit-state system representations. Against this background, our contributions in this paper are threefold. First, we show that the complexity of rational verification can be greatly reduced by restricting specifications to GR(1), a fragment of LTL that can represent a broad and practically useful class of response properties of reactive systems. In particular, we show that for a number of relevant settings, rational verification can be done in polynomial space and even in polynomial time. Second, we provide improved complexity results for rational verification when considering players' goals given by mean-payoff utility functions; arguably the most widely used approach for quantitative objectives in concurrent and multiagent systems. Finally, we consider the problem of computing outcomes that satisfy social welfare constraints. To this end, we consider both utilitarian and egalitarian social welfare and show that computing such outcomes is either PSPACE-complete or NP-complete.
Decentralized scheduling through an adaptive, trading-based multi-agent system
Kรถlle, Michael, Rietdorf, Lennart, Schmid, Kyrill
In multi-agent reinforcement learning systems, the actions of one agent can have a negative impact on the rewards of other agents. One way to combat this problem is to let agents trade their rewards amongst each other. Motivated by this, this work applies a trading approach to a simulated scheduling environment, where the agents are responsible for the assignment of incoming jobs to compute cores. In this environment, reinforcement learning agents learn to trade successfully. The agents can trade the usage right of computational cores to process high-priority, high-reward jobs faster than low-priority, low-reward jobs. However, due to combinatorial effects, the action and observation spaces of a simple reinforcement learning agent in this environment scale exponentially with key parameters of the problem size. However, the exponential scaling behavior can be transformed into a linear one if the agent is split into several independent sub-units. We further improve this distributed architecture using agent-internal parameter sharing. Moreover, it can be extended to set the exchange prices autonomously. We show that in our scheduling environment, the advantages of a distributed agent architecture clearly outweigh more aggregated approaches. We demonstrate that the distributed agent architecture becomes even more performant using agent-internal parameter sharing. Finally, we investigate how two different reward functions affect autonomous pricing and the corresponding scheduling.
Yann LeCun's vision for creating autonomous machines
We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. In the midst of the heated debate about AI sentience, conscious machines and artificial general intelligence, Yann LeCun, Chief AI Scientist at Meta, published a blueprint for creating "autonomous machine intelligence." LeCun has compiled his ideas in a paper that draws inspiration from progress in machine learning, robotics, neuroscience and cognitive science. He lays out a roadmap for creating AI that can model and understand the world, reason and plan to do tasks on different timescales. While the paper is not a scholarly document, it provides a very interesting framework for thinking about the different pieces needed to replicate animal and human intelligence. It also shows how the mindset of LeCun, an award-winning pioneer of deep learning, has changed and why he thinks current approaches to AI will not get us to human-level AI.
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.