Goto

Collaborating Authors

 Agents


Macro-Action-Based Deep Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.


Accumulator Bet Selection Through Stochastic Diffusion Search

arXiv.org Artificial Intelligence

The global sports betting market is worth an estimated $700 billion annually Flepp et al. (2017), and association football (also known as soccer or simply football), being the world's most popular spectator sport, constitutes around 70% of this ever-growing market Constantinou et al. (2012). The last decade has thus seen the emergence of numerous online and offline bookmakers, offering bettors the possibility to place wagers on the results of football matches in more than a hundred different leagues, worldwide. The sports betting industry offers a unique and very popular betting product known as an accumulator bet. In contrast with a single bet, which consists in betting on a single event for a payout equal to the stake (i.e. the sum wagered) multiplied by the odds set by the bookmaker for that event, an accumulator bet combines more than one (and generally less than seven) events into a single wager that pays out only when all individual events are correctly predicted. The payout for a correct accumulator bet is the stake multiplied by the product of the odds of all its constituting wagers. However, if one of these wagers is incorrect, the entire accumulator bet would lose. Thus, this product offers both significantly higher potential payouts and higher risks than single bets, and the large pool of online bookmakers, leagues and, matches that bettors can access nowadays has increased both the complexity of selecting a set of matches to place an accumulator bet on, and the number of opportunities to identify winning combinations. With the rise of sports analytics, a wide variety of statistical models for predicting the outcomes of football matches have been proposed, a good review of which can be found in Langseth (2013).


Omdena Building AI Solutions Through Global Collaboration

#artificialintelligence

Omdena runs AI projects with organizations that want to get started with Artificial Intelligence, solve a real-world problem, or build deployable solutions within two months. The projects are powered by our unique Collaborative AI processes, which results in fast development, innovation, and trusted solutions through a bottom-up development process. At first, an organization submits a problem or idea. Next, we publicly announce the AI project and select up to 50 engineers that work with the organization to refine the problem statement, collect the data, and build their solutions.


F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.


Trump's WHO attack accelerates breakdown in global cooperation

The Japan Times

U.S. President Donald Trump's broadside against the World Health Organization is another blow to international institutions designed to help nations confront global crises -- and may leave countries even less prepared for the next one. Trump's move on Tuesday to suspend WHO funding amid a pandemic that has cost at least 130,000 lives is the latest salvo in a broader struggle between the U.S. and China over global leadership. Both countries are courting other nations and public opinion as they cover up their own shortcomings in the pandemic and position themselves for the post-virus world. China -- widely criticized for missteps early in the outbreak -- has ramped up efforts to send medical supplies to hard-hit nations, even as reports emerged that much of that gear was faulty or expired. The U.S., meanwhile, announced $300 million in aid to countries fighting the virus but rebuffed requests for the most essential gear while receiving donations from the governments of Egypt, Taiwan and Vietnam among others.


MARLeME: A Multi-Agent Reinforcement Learning Model Extraction Library

arXiv.org Artificial Intelligence

Multi-Agent Reinforcement Learning (MARL) encompasses a powerful class of methodologies that have been applied in a wide range of fields. An effective way to further empower these methodologies is to develop libraries and tools that could expand their interpretability and explainability. In this work, we introduce MARLeME: a MARL model extraction library, designed to improve explainability of MARL systems by approximating them with symbolic models. Symbolic models offer a high degree of interpretability, well-defined properties, and verifiable behaviour. Consequently, they can be used to inspect and better understand the underlying MARL system and corresponding MARL agents, as well as to replace all/some of the agents that are particularly safety and security critical.


Symmetry as an Organizing Principle for Geometric Intelligence

arXiv.org Artificial Intelligence

The exploration of geometrical patterns stimulates imagination and encourages abstract reasoning which is a distinctive feature of human intelligence. In cognitive science, Gestalt principles such as symmetry have often explained significant aspects of human perception. We present a computational technique for building artificial intelligence (AI) agents that use symmetry as the organizing principle for addressing Dehaene's test of geometric intelligence \cite{dehaene2006core}. The performance of our model is on par with extant AI models of problem solving on the Dehaene's test and seems correlated with some elements of human behavior on the same test.


A non-cooperative meta-modeling game for automated third-party calibrating, validating, and falsifying constitutive laws with parallelized adversarial attacks

arXiv.org Artificial Intelligence

The evaluation of constitutive models, especially for high-risk and high-regret engineering applications, requires efficient and rigorous third-party calibration, validation and falsification. While there are numerous efforts to develop paradigms and standard procedures to validate models, difficulties may arise due to the sequential, manual and often biased nature of the commonly adopted calibration and validation processes, thus slowing down data collections, hampering the progress towards discovering new physics, increasing expenses and possibly leading to misinterpretations of the credibility and application ranges of proposed models. This work attempts to introduce concepts from game theory and machine learning techniques to overcome many of these existing difficulties. We introduce an automated meta-modeling game where two competing AI agents systematically generate experimental data to calibrate a given constitutive model and to explore its weakness, in order to improve experiment design and model robustness through competition. The two agents automatically search for the Nash equilibrium of the meta-modeling game in an adversarial reinforcement learning framework without human intervention. By capturing all possible design options of the laboratory experiments into a single decision tree, we recast the design of experiments as a game of combinatorial moves that can be resolved through deep reinforcement learning by the two competing players. Our adversarial framework emulates idealized scientific collaborations and competitions among researchers to achieve a better understanding of the application range of the learned material laws and prevent misinterpretations caused by conventional AI-based third-party validation.


Incomplete Preferences in Single-Peaked Electorates

Journal of Artificial Intelligence Research

Incomplete preferences are likely to arise in real-world preference aggregation scenarios. This paper deals with determining whether an incomplete preference profile is single-peaked. This is valuable information since many intractable voting problems become tractable given singlepeaked preferences. We prove that the problem of recognizing single-peakedness is NP-complete for incomplete profiles consisting of partial orders. Despite this intractability result, we find several polynomial-time algorithms for reasonably restricted settings. In particular, we give polynomial-time recognition algorithms for weak orders, which can be viewed as preferences with indifference.


Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

arXiv.org Machine Learning

We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations.