AITopics

Wang, Rose E., Wu, Sarah A., Evans, James A., Tenenbaum, Joshua B., Parkes, David C., Kleiman-Weiner, Max

Too many cooks: Coordinating multi-agent collaboration through inverse planning

arXiv.org Artificial IntelligenceMar-26-2020

Collaboration requires agents to coordinate their behavior on the fly, sometimes cooperating to solve a single task together and other times dividing it up into sub-tasks to work on in parallel. Underlying the human ability to collaborate is theory-of-mind, the ability to infer the hidden mental states that drive others to act. Here, we develop Bayesian Delegation, a decentralized multi-agent learning mechanism with these abilities. Bayesian Delegation enables agents to rapidly infer the hidden intentions of others by inverse planning. These inferences enable agents to flexibly decide in the absence of communication when to cooperate on the same sub-task and when to work on different sub-tasks in parallel. We test this model in a suite of multi-agent Markov decision processes inspired by cooking problems. To succeed, agents must coordinate both their high-level plans (e.g., what sub-task they should work on) and their low-level actions (e.g., avoiding collisions). Bayesian Delegation bridges these two levels and rapidly aligns agents' beliefs about who should work on what without any communication. When agents cooperate on the same sub-task, coordinated plans emerge that enable the group of agents to achieve tasks no agent can complete on their own. Our model outperforms lesioned agents without Bayesian Delegation or without the ability to cooperate on the same sub-task.

agent, lettuce, merge, (11 more...)

2003.11778

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Gupta, Abhishek, Haskell, William B.

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence

arXiv.org Machine LearningMar-25-2020

This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs) in machine learning and reinforcement learning. RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy certain contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. Inspired by the SVRG algorithm, we develop a method to convert any RSA to a variance reduced RSA that converges to the optimal solution with in almost sure sense or in probability. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework. We apply this framework to ascertain the convergence of mini-batch SGD, forward-backward splitting with catalyst, SVRG, SAGA, empirical Q value iteration, synchronous Q-learning, enhanced policy iteration, and MDPs with a generative model. We also develop two new algorithms for reinforcement learning and establish their convergence using this framework.

algorithm, divergence, iteration, (15 more...)

2003.11403

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningMar-24-2020

Multi-Lead ECG Classification via an Information-Based Attention Convolutional Neural Network

Tung, Hao, Zheng, Chao, Mao, Xinsheng, Qian, Dahong

Objective: A novel structure based on channel-wise attention mechanism is presented in this paper. Embedding with the proposed structure, an efficient classification model that accepts multi-lead electrocardiogram (ECG) as input is constructed. Methods: One-dimensional convolutional neural networks (CNN) have proven to be effective in pervasive classification tasks, enabling the automatic extraction of features while classifying targets. We implement the Residual connection and design a structure which can learn the weights from the information contained in different channels in the input feature map during the training process. An indicator named mean square deviation is introduced to monitor the performance of a particular model segment in the classification task on the two out of the five ECG classes. The data in the MIT-BIH arrhythmia database is used and a series of control experiments is conducted. Results: Utilizing both leads of the ECG signals as input to the neural network classifier can achieve better classification results than those from using single channel inputs in different application scenarios. Models embedded with the channel-wise attention structure always achieve better scores on sensitivity and precision than the plain Resnet models. The proposed model exceeds the performance of most of the state-of-the-art models in ventricular ectopic beats (VEB) classification, and achieves competitive scores for supraventricular ectopic beats (SVEB). Conclusion: Adopting more lead ECG signals as input can increase the dimensions of the input feature maps, helping to improve both the performance and generalization of the network model. Significance: Due to its end-to-end characteristics, and the extensible intrinsic for multi-lead heart diseases diagnosing, the proposed model can be used for the real-time ECG tracking of ECG waveforms for Holter or wearable devices.

artificial intelligence, classification, machine learning, (19 more...)

2003.12009

Country:

Europe > Portugal > Coimbra > Coimbra (0.04)
Asia > China > Shanghai > Shanghai (0.04)
South America > Brazil (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

De Raedt, Luc, Dumančić, Sebastijan, Manhaeve, Robin, Marra, Giuseppe

From Statistical Relational to Neuro-Symbolic Artificial Intelligence

arXiv.org Artificial IntelligenceMar-24-2020

Neuro-symbolic and statistical relational artificial intelligence both integrate frameworks for learning with logical reasoning. This survey identifies several parallels across seven different dimensions between these two fields. These cannot only be used to characterize and position neuro-symbolic artificial intelligence approaches but also to identify a number of directions for further research.

logic, reasoning, representation, (15 more...)

2003.08316

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre:

Overview (0.66)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Machine LearningMar-24-2020

Merge-split Markov chain Monte Carlo for community detection

Peixoto, Tiago P.

We present a Markov chain Monte Carlo scheme based on merges and splits of groups that is capable of efficiently sampling from the posterior distribution of network partitions, defined according to the stochastic block model (SBM). We demonstrate how schemes based on the move of single nodes between groups systematically fail at correctly sampling from the posterior distribution even on small networks, and how our merge-split approach behaves significantly better, and improves the mixing time of the Markov chain by several orders of magnitude in typical cases. We also show how the scheme can be straightforwardly extended to nested versions of the SBM, yielding asymptotically exact samples of hierarchical network partitions.

algorithm, probability, proposal, (15 more...)

2003.0707

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.81)

arXiv.org Artificial IntelligenceMar-24-2020

An empirical investigation of the challenges of real-world reinforcement learning

Dulac-Arnold, Gabriel, Levine, Nir, Mankowitz, Daniel J., Li, Jerry, Paduraru, Cosmin, Gowal, Sven, Hester, Todd

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.

agent, algorithm, constraint, (12 more...)

2003.11881

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Industry:

Transportation (0.67)
Leisure & Entertainment > Games (0.67)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMar-23-2020

Deep Reinforcement Learning with Smooth Policy

Shen, Qianli, Li, Yan, Jiang, Haoming, Wang, Zhaoran, Zhao, Tuo

Deep neural networks have been widely adopted in modern reinforcement learning (RL) algorithms with great empirical successes in various domains. However, the large search space of training a neural network requires a significant amount of data, which makes the current RL algorithms not sample efficient. Motivated by the fact that many environments with continuous state space have smooth transitions, we propose to learn a smooth policy that behaves smoothly with respect to states. In contrast to policies parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy. In this paper, we develop a new training framework --- $\textbf{S}$mooth $\textbf{R}$egularized $\textbf{R}$einforcement $\textbf{L}$earning ($\textbf{SR}^2\textbf{L}$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space of the learning algorithms and enforces smoothness in the learned policy. We apply the proposed framework to both on-policy (TRPO) and off-policy algorithm (DDPG). Through extensive experiments, we demonstrate that our method achieves improved sample efficiency.

algorithm, cumulative reward, regularizer, (14 more...)

2003.09534

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Lartigue, Thomas, Durrleman, Stanley, Allassonnière, Stéphanie

Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM

arXiv.org Machine LearningMar-23-2020

The Expectation Maximisation (EM) algorithm is widely used to optimise non-convex likelihood functions with hidden variables. Many authors modified its simple design to fit more specific situations. For instance the Expectation (E) step has been replaced by Monte Carlo (MC) approximations, Markov Chain Monte Carlo approximations, tempered approximations... Most of the well studied approximations belong to the stochastic class. By comparison, the literature is lacking when it comes to deterministic approximations. In this paper, we introduce a theoretical framework, with state of the art convergence guarantees, for any deterministic approximation of the E step. We analyse theoretically and empirically several approximations that fit into this framework. First, for cases with intractable E steps, we introduce a deterministic alternative to the MC-EM, using Riemann sums. This method is easy to implement and does not require the tuning of hyper-parameters. Then, we consider the tempered approximation, borrowed from the Simulated Annealing optimisation technique and meant to improve the EM solution. We prove that the the tempered EM verifies the convergence guarantees for a wide range of temperature profiles. We showcase empirically how it is able to escape adversarial initialisations. Finally, we combine the Riemann and tempered approximations to accomplish both their purposes.

algorithm, approximation, initialisation, (15 more...)

2003.10126

Country:

Europe (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceMar-22-2020, 03:19:42 GMT

In a Boltzmann machine, why isn't there a simple expression for the optimal edge weights in terms of correlations between variables?

If we do this by using gradient ascent on the log-likelihood function, each step of gradient ascent involves an expensive expectation estimate using MCMC (or some cheaper approximation). Conceptually the edge weights represent the "interaction strength" between variables, i.e. $w_{ij}$ represents how much $x_i$ and $x_j$ "want" to be equal. Just looking at the above we can see that when $w_{ij}$ is large and positive, $x_i$ and $x_j$ have a high probability of being equal and the when it's negative they have a higher probability of being opposite sign. What is the relationship between the empirical correlation between each $x_i$ and $x_j$ versus the optimal edge weight $w_{ij}$? It would make sense that variables that are highly positively correlated have large positive edge weights, and variables that are negatively correlated have negative edge weights.

edge weight, mathbf, optimal edge weight, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)