Agents
An Exhaustive DPLL Algorithm for Model Counting
State-of-the-art model counters are based on exhaustive DPLL algorithms, and have been successfully used in probabilistic reasoning, one of the key problems in AI. In this article, we present a new exhaustive DPLL algorithm with a formal semantics, a proof of correctness, and a modular design. The modular design is based on the separation of the core model counting algorithm from SAT solving techniques. We also show that the trace of our algorithm belongs to the language of Sentential Decision Diagrams (SDDs), which is a subset of Decision-DNNFs, the trace of existing state-of-the-art model counters. Still, our experimental analysis shows comparable results against state-of-the-art model counters. Furthermore, we obtain the first top-down SDD compiler, and show orders-of-magnitude improvements in SDD construction time against the existing bottom-up SDD compiler.
Multi-Agent Path Finding with Deadlines: Preliminary Results
Ma, Hang, Wagner, Glenn, Felner, Ariel, Li, Jiaoyang, Kumar, T. K. Satish, Koenig, Sven
We formalize the problem of multi-agent path finding with deadlines (MAPF-DL). The objective is to maximize the number of agents that can reach their given goal vertices from their given start vertices within a given deadline, without colliding with each other. We first show that the MAPF-DL problem is NP-hard to solve optimally. We then present an optimal MAPF-DL algorithm based on a reduction of the MAPF-DL problem to a flow problem and a subsequent compact integer linear programming formulation of the resulting reduced abstracted multi-commodity flow network.
An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Tang, Hongyao, Wang, Li, Wang, Zan, Baarslag, Tim, Hao, Jianye
Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and the static social learning framework. However, two aspects of dynamics in real-world multiagent scenarios are currently missing in existing works. First, the network topologies can be dynamic where agents may change their connections through rewiring during the course of interactions. Second, the game matrix between each pair of agents may not be static and usually not known as a prior. Both the network dynamic and game uncertainty increase the coordination difficulty among agents. In this paper, we consider a multiagent dynamic social learning environment in which each agent can choose to rewire potential partners and interact with randomly chosen neighbors in each round. We propose an optimal rewiring strategy for agents to select most beneficial peers to interact with for the purpose of maximizing the accumulated payoff in repeated interactions. We empirically demonstrate the effectiveness and robustness of our approach through comparing with benchmark strategies. The performance of three representative learning strategies under our social learning framework with our optimal rewiring is investigated as well.
COMPASS: a new AI-driven situational awareness tool for the Pentagon?
First, the project attempts to utilize game theory to "ascertain the intent of the adversary." Interestingly, the project assumes that all agents are rational--meaning, I believe, that agents will act to maximize their respective utilities--and that normative theories about how agents will act are important. Explicitly excluded, however, are "descriptive theories that focus on โฆ intangible aspects such as human judgment, irrationality, biases, [and] cognitive limitations." While this may make sense from a streamlined game-theoretic approach, it in no way reflects the real world. Indeed, even game theorists acknowledge the importance of factors that COMPASS excludes, which is why they make ample use of theories of bounded rationality, uncertainty, imperfect information, and the very notion of "adversarial."
The rise of autonomous systems will change the world
Harald Sack is Professor for Information Services Engineering at two of the most renowned research institutions in Europe: FIZ Karlsruhe and AIFB. He is a part of SEMANTiCS' research and innovation track program committee as well as of the conference's permanent advisory board. His publications include more than 130 papers in international journals and conferences and three standard textbooks on networking technologies. In this interview he speaks about the limited capabilities of search engines, the necessity of data being open and the coffee culture in Vienna. You have been working in many research areas such as semantic web technologies, knowledge representations, multimedia analysis & retrieval.
Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human/Agent's Demonstration
Wang, Zhaodong, Taylor, Matthew E.
Reinforcement learning has enjoyed multiple successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper introduces a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper uses demonstrations from agents and humans, allowing an untrained agent to quickly achieve high performance. We empirically compare with, and highlight the weakness of, HAT and CHAT, methods of transferring knowledge from a source agent/human to a target agent. This paper introduces an effective transfer approach, DRoP, combining the offline knowledge (demonstrations recorded before learning) with online confidence-based performance analysis. DRoP dynamically involves the demonstrator's knowledge, integrating it into the reinforcement learning agent's online learning loop to achieve efficient and robust learning.
Behavioral Cloning from Observation
Torabi, Faraz, Warnell, Garrett, Stone, Peter
Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.
To shorten flights and lower emissions, scientists are discussing the birds and the bees
When bees leave their hive hoping to find a better location for their nest, they often first settle nearby, usually on a tree branch, and cluster around the queen while several dozen scouts go off in search of a new home. Each scout then returns and starts to dance, indicating the direction and distance of the site it found. The more excited they become, the more frantically they dance, signaling the others to have a look. Ultimately, a favorite location emerges from all this swarming and buzzing about -- and they all depart and fly to it. In computer science, this behavior is known as particle swarm optimization, which holds that each particle's movement not only is influenced by its own position but is guided to other good positions, all of which are updated as other particles find better positions.
Synthesizing Efficient Solutions for Patrolling Problems in the Internet Environment
Brรกzdil, Tomรกลก, Kuฤera, Antonรญn, ลehรกk, Vojtฤch
We propose an algorithm for constructing efficient patrolling strategies in the Internet environment, where the protected targets are nodes connected to the network and the patrollers are software agents capable of detecting/preventing undesirable activities on the nodes. The algorithm is based on a novel compositional principle designed for a special class of strategies, and it can quickly construct (sub)optimal solutions even if the number of targets reaches hundreds of millions.
Solving Sudoku with Ant Colony Optimisation
Sudoku is a well-known logic-based puzzle game that was first published in 1979 under the name of "Number Place". It was popularised in Japan in 1984 by the puzzle company Nikoli, and later named "Sudoku", which roughly translates to "single digits". The puzzle gained attention in the West in 2004, after The Times published its first Sudoku grid (at the instigation of Hong Kong-based judge Wayne Gould, who first encountered the puzzle in 1997, and developed a computer program to automatically generate instances). Sudoku is now a global phenomenon, and many newspapers now carry it alongside their existing crosswords (see [4] for a general history of the puzzle). The simplest variant of Sudoku uses a 9 9 grid of cells divided into nine 3 3 subgrids (Figure 1 (left)). The aim of the puzzle is to fill the grid with digits such that each row, each column, and each 3 3 subgrid contains all of the digits 1-9 (Figure 1 (right)). An instance of Sudoku provides, at the outset, a partially-completed grid, but the difficulty of any grid derives more from the range of techniques required to solve it than the number of cell values that are provided for the player. Sudoku is an NPcomplete problem [12], as first shown in [35] (via a reduction from the Latin Square Completion problem [2]).