AITopics

2010.1153

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Rashid, Tabish, Farquhar, Gregory, Peng, Bei, Whiteson, Shimon

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

arXiv.org Machine LearningOct-22-2020

QMIX is a popular $Q$-learning algorithm for cooperative MARL in the centralised training and decentralised execution paradigm. In order to enable easy decentralisation, QMIX restricts the joint action $Q$-values it can represent to be a monotonic mixing of each agent's utilities. However, this restriction prevents it from representing value functions in which an agent's ordering over its actions can depend on other agents' actions. To analyse this representational limitation, we first formalise the objective QMIX optimises, which allows us to view QMIX as an operator that first computes the $Q$-learning targets and then projects them into the space representable by QMIX. This projection returns a representable $Q$-value that minimises the unweighted squared error across all joint actions. We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action. We rectify this by introducing a weighting into the projection, in order to place more importance on the better joint actions. We propose two weighting schemes and prove that they recover the correct maximal action for any joint action $Q$-values, and therefore for $Q^*$ as well. Based on our analysis and results in the tabular setting, we introduce two scalable versions of our algorithm, Centrally-Weighted (CW) QMIX and Optimistically-Weighted (OW) QMIX and demonstrate improved performance on both predator-prey and challenging multi-agent StarCraft benchmark tasks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.108

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Tejwani, Ravi, Katz, Boris, Breazeal, Cynthia

Migratable AI: Personalizing Dialog Conversations with migration context

arXiv.org Artificial IntelligenceOct-22-2020

The migration of conversational AI agents across different embodiments in order to maintain the continuity of the task has been recently explored to further improve user experience. However, these migratable agents lack contextual understanding of the user information and the migrated device during the dialog conversations with the user. This opens the question of how an agent might behave when migrated into an embodiment for contextually predicting the next utterance. We collected a dataset from the dialog conversations between crowdsourced workers with the migration context involving personal and non-personal utterances in different settings (public or private) of embodiment into which the agent migrated. We trained the generative and information retrieval models on the dataset using with and without migration context and report the results of both qualitative metrics and human evaluation. We believe that the migration dataset would be useful for training future migratable AI systems.

information retrieval, machine learning, natural language, (19 more...)

2010.12091

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)
(5 more...)

Wang, Ruosong, Zhang, Hanrui, Chaplot, Devendra Singh, Garagić, Denis, Salakhutdinov, Ruslan

Planning with Submodular Objective Functions

arXiv.org Artificial IntelligenceOct-22-2020

Modern reinforcement learning and planning algorithms have achieved tremendous successes on various tasks [Mnih et al., 2015, Silver et al., 2017]. However, most of these algorithms work in the standard Markov decision process (MDP) framework where the goal is to maximize the cumulative reward and thus it can be difficult to apply them to various practical sequential decision-making problems. In this paper, we study planning in generalized MDPs, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function. To motivate our approach, let us consider the following scenario: a company manufactures cars, and as part of its customer service, continuously monitors the status of all cars produced by the company. Each car is equipped with a number of sensors, each of which constantly produces noisy measurements of some attribute of the car, e.g., speed, location, temperature, etc. Due to bandwidth constraints, at any moment, each car may choose to transmit data generated by a single sensor to the company. The goal is to combine the statistics collected over a fixed period of time, presumably from multiple sensors, to gather as much information about the car as possible. Perhaps one seemingly natural strategy is to transmit only data generated by the most "informative" sensor. However, as the output of a sensor remains the same between two samples, it is pointless to transmit the same data multiple times. One may alternatively try to order sensors by their "informativity" and always choose the most informative sensor that has not yet transmitted data since the last sample was generated.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2010.11863

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

arXiv.org Artificial IntelligenceOct-22-2020

Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Xu, Yunqiu, Fang, Meng, Chen, Ling, Du, Yali, Zhou, Joey Tianyi, Zhang, Chengqi

We study reinforcement learning (RL) for text-based games, which are interactive simulations in the context of natural language. While different methods have been developed to represent the environment information and language actions, existing RL agents are not empowered with any reasoning capabilities to deal with textual games. In this work, we aim to conduct explicit reasoning with knowledge graphs for decision making, so that the actions of an agent are generated and supported by an interpretable inference procedure. We propose a stacked hierarchical attention mechanism to construct an explicit representation of the reasoning process by exploiting the structure of the knowledge graph. We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2010.11655

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

He, Jinke, Suau, Miguel, Oliehoek, Frans A.

Influence-Augmented Online Planning for Complex Environments

arXiv.org Artificial IntelligenceOct-21-2020

How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.

artificial intelligence, machine learning, simulator, (19 more...)

2010.11038

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(2 more...)

Filstroff, Louis, Gouvert, Olivier, Févotte, Cédric, Cappé, Olivier

A Comparative Study of Temporal Non-Negative Matrix Factorization with Gamma Markov Chains

arXiv.org Machine LearningOct-20-2020

Non-negative matrix factorization (NMF) has become a well-established class of methods for the analysis of non-negative data. In particular, a lot of effort has been devoted to probabilistic NMF, namely estimation or inference tasks in probabilistic models describing the data, based for example on Poisson or exponential likelihoods. When dealing with time series data, several works have proposed to model the evolution of the activation coefficients as a non-negative Markov chain, most of the time in relation with the Gamma distribution, giving rise to so-called temporal NMF models. In this paper, we review four Gamma Markov chains of the NMF literature, and show that they all share the same drawback: the absence of a well-defined stationary distribution. We then introduce a fifth process, an overlooked model of the time series literature named BGAR(1), which overcomes this limitation. These temporal NMF models are then compared in a MAP framework on a prediction task, in the context of the Poisson likelihood.

artificial intelligence, machine learning, markov chain, (16 more...)

2006.12843

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > Canada > Quebec (0.04)
Europe > Finland (0.04)

Genre:

Research Report (0.50)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Lazarus, Christopher, Lopez, James G., Kochenderfer, Mykel J.

Runtime Safety Assurance Using Reinforcement Learning

arXiv.org Artificial IntelligenceOct-20-2020

The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. The objective of this paper is to design a meta-controller capable of identifying unsafe situations with high accuracy. High dimensional and non-linear dynamics in which modern controllers are deployed along with the black-box nature of the nominal controllers make this a difficult problem. Current approaches rely heavily on domain expertise and human engineering. We frame the design of RTSA with the Markov decision process (MDP) framework and use reinforcement learning (RL) to solve it. Our learned meta-controller consistently exhibits superior performance in our experiments compared to our baseline, human engineered approach.

controller, machine learning, reinforcement learning, (19 more...)

2010.10618

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningOct-19-2020

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

Zou, Difan, Xu, Pan, Gu, Quanquan

We establish a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension, which improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin based algorithms, and sheds some light on the design of fast stochastic gradient based sampling algorithms.

algorithm 1, artificial intelligence, machine learning, (15 more...)

2010.09597

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Dupty, Mohammed Haroon, Lee, Wee Sun

Neuralizing Efficient Higher-order Belief Propagation

arXiv.org Machine LearningOct-19-2020

Graph neural network models have been extensively used to learn node representations for graph structured data in an end-to-end setting. These models often rely on localized first order approximations of spectral graph convolutions and hence are unable to capture higher-order relational information between nodes. Probabilistic Graphical Models form another class of models that provide rich flexibility in incorporating such relational information but are limited by inefficient approximate inference algorithms at higher order. In this paper, we propose to combine these approaches to learn better node and graph representations. First, we derive an efficient approximate sum-product loopy belief propagation inference algorithm for higher-order PGMs. We then embed the message passing updates into a neural network to provide the inductive bias of the inference algorithm in end-to-end learning. This gives us a model that is flexible enough to accommodate domain knowledge while maintaining the computational advantage. We further propose methods for constructing higher-order factors that are conditioned on node and edge features and share parameters wherever necessary. Our experimental evaluation shows that our model indeed captures higher-order information, substantially outperforming state-of-the-art $k$-order graph neural networks in molecular datasets.

artificial intelligence, graph, machine learning, (18 more...)

2010.09283

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)