Goto

Collaborating Authors

 Agents


Logical Team Q-learning: An approach towards factored policies in cooperative MARL

arXiv.org Artificial Intelligence

We address the challenge of learning factored policies in cooperative MARL scenarios. In particular, we consider the situation in which a team of agents collaborates to optimize a common cost. Our goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. In this work we make contributions to both the dynamic programming and reinforcement learning settings. In the dynamic programming case we provide a number of lemmas that prove the existence of such factored policies and we introduce an algorithm (along with proof of convergence) that provably leads to them. Then we introduce tabular and deep versions of Logical Team Q-learning, which is a stochastic version of the algorithm for the RL case. We conclude the paper by providing experiments that illustrate the claims.


Conflict-Based Search for Connected Multi-Agent Path Finding

arXiv.org Artificial Intelligence

We study a variant of the multi-agent path finding problem (MAPF) in which agents are required to remain connected to each other and to a designated base. This problem has applications in search and rescue missions where the entire execution must be monitored by a human operator. We re-visit the conflict-based search algorithm known for MAPF, and define a variant where conflicts arise from disconnections rather than collisions. We study optimizations, and give experimental results in which we compare our algorithms with the literature.


Spatial Action Maps for Mobile Manipulation

arXiv.org Artificial Intelligence

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.


The Importance of Open-Endedness (for the Sake of Open-Endedness)

arXiv.org Artificial Intelligence

A paper in the recent Artificial Life journal special issue on open-ended evolution (OEE) presents a simple evolving computational system that, it is claimed, satisfies all proposed requirements for OEE (Hintze, 2019). Analysis and discussion of the system are used to support the further claims that complexity and diversity are the crucial features of open-endedness, and that we should concentrate on providing proper definitions for those terms rather than engaging in "the quest for open-endedness for the sake of open-endedness" (Hintze, 2019, p. 205). While I wholeheartedly support the pursuit of precise definitions of complexity and diversity in relation to OEE research, I emphatically reject the suggestion that OEE is not a worthy research topic in its own right. In the same issue of the journal, I presented a "high-level conceptual framework to help orient the discussion and implementation of open-endedness in evolutionary systems" (Taylor, 2019). In the current brief contribution I apply my framework to Hinzte's model to understand its limitations. In so doing, I demonstrate the importance of studying open-endedness for the sake of open-endedness.


Affective Conditioning on Hierarchical Networks applied to Depression Detection from Transcribed Clinical Interviews

arXiv.org Machine Learning

In this work we propose a machine learning model for depression detection from transcribed clinical interviews. Depression is a mental disorder that impacts not only the subject's mood but also the use of language. To this end we use a Hierarchical Attention Network to classify interviews of depressed subjects. We augment the attention layer of our model with a conditioning mechanism on linguistic features, extracted from affective lexica. Our analysis shows that individuals diagnosed with depression use affective language to a greater extent than not-depressed. Our experiments show that external affective information improves the performance of the proposed architecture in the General Psychotherapy Corpus and the DAIC-WoZ 2017 depression datasets, achieving state-of-the-art 71.6 and 68.6 F1 scores respectively.


The Importance of Prior Knowledge in Precise Multimodal Prediction

arXiv.org Artificial Intelligence

Roads have well defined geometries, topologies, and traffic rules. While this has been widely exploited in motion planning methods to produce maneuvers that obey the law, little work has been devoted to utilize these priors in perception and motion forecasting methods. In this paper we propose to incorporate these structured priors as a loss function. In contrast to imposing hard constraints, this approach allows the model to handle non-compliant maneuvers when those happen in the real world. Safe motion planning is the end goal, and thus a probabilistic characterization of the possible future developments of the scene is key to choose the plan with the lowest expected cost. Towards this goal, we design a framework that leverages REINFORCE to incorporate non-differentiable priors over sample trajectories from a probabilistic model, thus optimizing the whole distribution. We demonstrate the effectiveness of our approach on real-world self-driving datasets containing complex road topologies and multi-agent interactions. Our motion forecasts not only exhibit better precision and map understanding, but most importantly result in safer motion plans taken by our self-driving vehicle. We emphasize that despite the importance of this evaluation, it has been often overlooked by previous perception and motion forecasting works.


An optimizable scalar objective value cannot be objective and should not be the sole objective

arXiv.org Artificial Intelligence

The morality of algorithms and their potential for bias and discrimination are important concerns. A popular approach to machine learning and artificial intelligence is via the numerical optimization of objective functions, and adapting such an approach to handle ethics could seem natural: with a hammer in hand, everything looks like a nail. The hammer of much artificial intelligence is the optimization of objective values, so some might like to treat morality solely through such objective functions. However, relying solely on the optimization of scalar objective values is fraught with unavoidable flaws when dealing with real people.


A summary of the keynotes at AAMAS

AIHub

A virtual edition of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) conference was held on 9-13 May. Videos of the talks are now available for public viewing, and you can also see the sessions from the various workshops. Alison is interested in how cities work and builds spatial agent-based models (ABMs) to study how people move around and how behaviour plays out in space and time. There are a number of challenges with these kinds of models and they need to be really robust if they are to be adopted by policy makers. So, why should we be interested in modelling cities?


Extending the Multiple Traveling Salesman Problem for Scheduling a Fleet of Drones Performing Monitoring Missions

arXiv.org Artificial Intelligence

In this paper we schedule the travel path of a set of drones across a graph where the nodes need to be visited multiple times at pre-defined points in time. This is an extension of the well-known multiple traveling salesman problem. The proposed formulation can be applied in several domains such as the monitoring of traffic flows in a transportation network, or the monitoring of remote locations to assist search and rescue missions. Aiming to find the optimal schedule, the problem is formulated as an Integer Linear Program (ILP). Given that the problem is highly combinatorial, the optimal solution scales only for small sized problems. Thus, a greedy algorithm is also proposed that uses a one-step look ahead heuristic search mechanism. In a detailed evaluation, it is observed that the greedy algorithm has near-optimal performance as it is on average at 92.06% of the optimal, while it can potentially scale up to settings with hundreds of drones and locations.


Depth-Optimized Delay-Aware Tree (DO-DAT) for Virtual Network Function Placement

arXiv.org Artificial Intelligence

With the constant increase in demand for data connectivity, network service providers are faced with the task of reducing their capital and operational expenses while ensuring continual improvements to network performance. Although Network Function Virtualization (NFV) has been identified as a solution, several challenges must be addressed to ensure its feasibility. In this paper, we present a machine learning-based solution to the Virtual Network Function (VNF) placement problem. This paper proposes the Depth-Optimized Delay-Aware Tree (DO-DAT) model by using the particle swarm optimization technique to optimize decision tree hyper-parameters. Using the Evolved Packet Core (EPC) as a use case, we evaluate the performance of the model and compare it to a previously proposed model and a heuristic placement strategy.