AITopics

2102.05743

Country:

Europe > Finland (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Rigter, Marc, Lacerda, Bruno, Hawes, Nick

Risk-Averse Bayes-Adaptive Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-10-2021

In this work, we address risk-averse Bayesadaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.

bamdp, optimisation, perturbation, (14 more...)

2102.05762

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.69)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Wei, Chen-Yu, Luo, Haipeng

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

arXiv.org Artificial IntelligenceFeb-10-2021

We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret $\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{1/3}T^{2/3}\})$ where $T$ is the number of rounds and $L$ and $\Delta$ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of $L$ and $\Delta$.

alg, algorithm, probability, (16 more...)

2102.05406

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Air (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Hardt, Moritz, Recht, Benjamin

Patterns, predictions, and actions: A story about machine learning

arXiv.org Machine LearningFeb-9-2021

This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions. Starting with the foundations of decision making, we cover representation, optimization, and generalization as the constituents of supervised learning. A chapter on datasets as benchmarks examines their histories and scientific bases. Self-contained introductions to causality, the practice of causal inference, sequential decision making, and reinforcement learning equip the reader with concepts and tools to reason about actions and their consequences. Throughout, the text discusses historical context and societal impact. We invite readers from all backgrounds; some experience with probability, calculus, and linear algebra suffices.

law enforcement, simultaneous perturbation gradient approximation, vascular disease, (35 more...)

2102.05242

Country:

North America > United States > California (0.67)
Europe > United Kingdom > England (0.28)
North America > United States > Texas (0.13)
North America > United States > Pennsylvania (0.13)

Genre:

Workflow (1.00)
Summary/Review (1.00)
Research Report > New Finding (1.00)
(4 more...)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(10 more...)

Ogunsina, Kolawole, Papamichalis, Marios, DeLaurentis, Daniel

Uncertainty Quantification and Propagation for Airline Disruption Management

arXiv.org Artificial IntelligenceFeb-9-2021

Disruption management during the airline scheduling process can be compartmentalized into proactive and reactive processes depending upon the time of schedule execution. The state of the art for decision-making in airline disruption management involves a heuristic human-centric approach that does not categorically study uncertainty in proactive and reactive processes for managing airline schedule disruptions. Hence, this paper introduces an uncertainty transfer function model (UTFM) framework that characterizes uncertainty for proactive airline disruption management before schedule execution, reactive airline disruption management during schedule execution, and proactive airline disruption management after schedule execution to enable the construction of quantitative tools that can allow an intelligent agent to rationalize complex interactions and procedures for robust airline disruption management. Specifically, we use historical scheduling and operations data from a major U.S. airline to facilitate the development and assessment of the UTFM, defined by hidden Markov models (a special class of probabilistic graphical models) that can efficiently perform pattern learning and inference on portions of large data sets.

airline disruption management, disruption management, utfm, (12 more...)

2102.05147

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Europe (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)
Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Liventsev, Vadim, Härmä, Aki, Petković, Milan

Neurogenetic Programming Framework for Explainable Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-8-2021

Automatic programming, the task of generating computer programs compliant with a specification without a human developer, is usually tackled either via genetic programming methods based on mutation and recombination of programs, or via neural language models. We propose a novel method that combines both approaches using a concept of a virtual neuro-genetic programmer: using evolutionary methods as an alternative to gradient descent for neural network training}, or scrum team. We demonstrate its ability to provide performant and explainable solutions for various OpenAI Gym tasks, as well as inject expert knowledge into the otherwise data-driven search for solutions.

codebase, developer, operator, (15 more...)

2102.04231

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.85)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Lyu, Xueguang, Xiao, Yuchen, Daley, Brett, Amato, Christopher

Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-8-2021

Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. However, the implications of using a centralized critic in this context are not fully discussed and understood even though it is the standard choice of many algorithms. We therefore formally analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice. Because our theory makes unrealistic assumptions, we also empirically compare the centralized and decentralized critic methods over a wide set of environments to validate our theories and to provide practical advice. We show that there exist misconceptions regarding centralized critics in the current literature and show that the centralized critic design is not strictly beneficial, but rather both centralized and decentralized critics have different pros and cons that should be taken into account by algorithm designers.

agent, decentralized critic, variance, (15 more...)

2102.04402

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Vicente, Serge, Murua, Alejandro

Determinantal consensus clustering

arXiv.org Machine LearningFeb-7-2021

Random restart of a given algorithm produces many partitions to yield a consensus clustering. Ensemble methods such as consensus clustering have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPP and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points within subsets. So, subsets with more similar points have less chances of being generated than subsets with very distinct points. The current and most popular sampling technique is sampling center points uniformly at random. We show through extensive simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. These two properties of DPP are key to make DPPs achieve good performance with small ensembles. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperform classical algorithms such as k-medoids and k-means consensus clusterings which are based on uniform random sampling of center points.

algorithm, configuration, dataset, (16 more...)

2102.03948

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceFeb-7-2021

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

Hu, Siyi, Zhu, Fengda, Chang, Xiaojun, Liang, Xiaodan

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task. The limitation is due to the restricted model architecture related to fixed input and output dimensions. This hinders the experience accumulation and transfer of the learned agent over tasks with diverse levels of difficulty (e.g. 3 vs 3 or 5 vs 6 multi-agent games). In this paper, we make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks with the requirement of different observation and action configurations. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy by decoupling the policy distribution from the intertwined input observation with an importance weight measured by the merits of the self-attention mechanism. Compared to a standard transformer block, the proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable. UPDeT is general enough to be plugged into any multi-agent reinforcement learning pipeline and equip them with strong generalization abilities that enables the handling of multiple tasks at a time. Extensive experiments on large-scale SMAC multi-agent competitive games demonstrate that the proposed UPDeT-based multi-agent reinforcement learning achieves significant results relative to state-of-the-art approaches, demonstrating advantageous transfer capability in terms of both performance and training speed (10 times faster).

agent, arxiv preprint arxiv, updet, (12 more...)

2101.08001

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Zhejiang Province (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Fattahi, Salar, Gomez, Andres

Scalable Inference of Sparsely-changing Markov Random Fields with Strong Statistical Guarantees

arXiv.org Machine LearningFeb-6-2021

In this paper, we study the problem of inferring time-varying Markov random fields (MRF), where the underlying graphical model is both sparse and changes sparsely over time. Most of the existing methods for the inference of time-varying MRFs rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we introduce a new class of constrained optimization problems for the inference of sparsely-changing MRFs. The proposed optimization problem is formulated based on the exact $\ell_0$ regularization, and can be solved in near-linear time and memory. Moreover, we show that the proposed estimator enjoys a provably small estimation error. As a special case, we derive sharp statistical guarantees for the inference of sparsely-changing Gaussian MRFs (GMRF) in the high-dimensional regime, showing that such problems can be learned with as few as one sample per time. Our proposed method is extremely efficient in practice: it can accurately estimate sparsely-changing graphical models with more than 500 million variables in less than one hour.

covariance matrix, matrix, precision matrix, (15 more...)

2102.03585

Country:

North America > United States > California (0.14)
North America > United States > Michigan (0.04)
Asia > Middle East > Oman (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)