AITopics

1902.03228

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

arXiv.org Machine LearningFeb-8-2019

Tensor Variable Elimination for Plated Factor Graphs

Obermeyer, Fritz, Bingham, Eli, Jankowiak, Martin, Chiu, Justin, Pradhan, Neeraj, Rush, Alexander, Goodman, Noah

A wide class of machine learning algorithms can be reduced to variable elimination on factor graphs. While factor graphs provide a unifying notation for these algorithms, they do not provide a compact way to express repeated structure when compared to plate diagrams for directed graphical models. To exploit efficient tensor algebra in graphs with plates of variables, we generalize undirected factor graphs to plated factor graphs and variable elimination to a tensor variable elimination algorithm that operates directly on plated factor graphs. Moreover, we generalize complexity bounds based on treewidth and characterize the class of plated factor graphs for which inference is tractable. As an application, we integrate tensor variable elimination into the Pyro probabilistic programming language to enable exact inference in discrete latent variable models with repeated structure. We validate our methods with experiments on both directed and undirected graphical models, including applications to polyphonic music modeling, animal movement modeling, and latent sentiment analysis.

algorithm, factor graph, plated factor graph, (11 more...)

1902.0321

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Colorado (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Industry:

Media > Music (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)

arXiv.org Artificial IntelligenceFeb-7-2019

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Pitis, Silviu

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor $\gamma < 1$, or in episodic settings, with $\gamma = 1$. While this has proven effective for specific tasks with well-defined objectives (e.g., games), it has never been established that fixed discounting is suitable for general purpose use (e.g., as a model of human preferences). This paper characterizes rationality in sequential decision making using a set of seven axioms and arrives at a form of discounting that generalizes traditional fixed discounting. In particular, our framework admits a state-action dependent "discount" factor that is not constrained to be less than 1, so long as there is eventual long run discounting. Although this broadens the range of possible preference structures in continuous settings, we show that there exists a unique "optimizing MDP" with fixed $\gamma < 1$ whose optimal value function matches the true utility of the optimal policy, and we quantify the difference between value and utility for suboptimal policies. Our work can be seen as providing a normative justification for (a slight generalization of) Martha White's RL task formalism (2017) and other recent departures from the traditional RL, and is relevant to task specification in RL, inverse RL and preference-based RL.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1902.02893

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Machine LearningFeb-7-2019

Compatible Natural Gradient Policy Search

Pajarinen, Joni, Thai, Hong Linh, Akrour, Riad, Peters, Jan, Neumann, Gerhard

Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.

approximation, gradient, natural gradient, (15 more...)

1902.02823

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Lincolnshire > Lincoln (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Zacher, Benedikt, Czogiel, Irina

Supervised learning improves disease outbreak detection

arXiv.org Machine LearningFeb-6-2019

The early detection of infectious disease outbreaks is a crucial task to protect population health. To this end, public health surveillance systems have been established to systematically collect and analyse infectious disease data. A variety of statistical tools are available, which detect potential outbreaks as abberations from an expected endemic level using these data. Here, we develop the first supervised learning approach based on hidden Markov models for disease outbreak detection, which leverages data that is routinely collected within a public health surveillance system. We evaluate our model using real Salmonella and Campylobacter data, as well as simulations. In comparison to a state-of-the-art approach, which is applied in multiple European countries including Germany, our proposed model reduces the false positive rate by up to 50% while retaining the same sensitivity. We see our supervised learning approach as a significant step to further develop machine learning applications for disease outbreak detection, which will be instrumental to improve public health surveillance systems.

artificial intelligence, machine learning, outbreak, (17 more...)

1902.10061

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Germany > Berlin (0.04)
Europe > France (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Epidemiology (1.00)
Water & Waste Management > Water Management > Constituents > Bacteria (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Nguyen, Thanh Thi, Nguyen, Ngoc Duy, Nahavandi, Saeid

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

arXiv.org Machine LearningFeb-6-2019

Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.

agent, learning, reinforcement learning, (11 more...)

1812.11794

Country:

North America > United States > Montana (0.04)
Oceania > Australia > Victoria (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Energy (1.00)
Transportation (0.93)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kawamura, Keigo, Tsuruoka, Yoshimasa

Neural Fictitious Self-Play on ELF Mini-RTS

Despite the notable successes in video games such as Atari 2600, current AI is yet to defeat human champions in the domain of real-time strategy (RTS) games. One of the reasons is that an RTS game is a multi-agent game, in which single-agent reinforcement learning methods cannot simply be applied because the environment is not a stationary Markov Decision Process. In this paper, we present a first step toward finding a game-theoretic solution to RTS games by applying Neural Fictitious Self-Play (NFSP), a game-theoretic approach for finding Nash equilibria, to Mini-RTS, a small but nontrivial RTS game provided on the ELF platform. More specifically, we show that NFSP can be effectively combined with policy gradient reinforcement learning and be applied to Mini-RTS. Experimental results also show that the scalability of NFSP can be substantially improved by pretraining the models with simple self-play using policy gradients, which by itself gives a strong strategy despite its lack of theoretical guarantee of convergence.

agent, nfsp, opponent, (16 more...)

1902.02004

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Cherapanamjeri, Yeshwanth, Bartlett, Peter L.

Testing Markov Chains without Hitting

We study the problem of identity testing of markov chains. In this setting, we are given access to a single trajectory from a markov chain with unknown transition matrix $Q$ and the goal is to determine whether $Q = P$ for some known matrix $P$ or $\text{Dist}(P, Q) \geq \epsilon$ where $\text{Dist}$ is suitably defined. In recent work by Daskalakis, Dikkala and Gravin, 2018, it was shown that it is possible to distinguish between the two cases provided the length of the observed trajectory is at least super-linear in the hitting time of $P$ which may be arbitrarily large. In this paper, we propose an algorithm that avoids this dependence on hitting time thus enabling efficient testing of markov chains even in cases where it is infeasible to observe every state in the chain. Our algorithm is based on combining classical ideas from approximation algorithms with techniques for the spectral analysis of markov chains.

algorithm, markov chain, subset, (14 more...)

1902.01999

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Afzalan, Milad, Jazizadeh, Farrokh

An Automated Spectral Clustering for Multi-scale Data

Spectral clustering algorithms typically require a priori selection of input parameters such as the number of clusters, a scaling parameter for the affinity measure, or ranges of these values for parameter tuning. Despite efforts for automating the process of spectral clustering, the task of grouping data in multi-scale and higher dimensional spaces is yet to be explored. This study presents a spectral clustering heuristic algorithm that obviates the need for an input by estimating the parameters from the data itself. Specifically, it introduces the heuristic of iterative eigengap search with (1) global scaling and (2) local scaling. These approaches estimate the scaling parameter and implement iterative eigengap quantification along a search tree to reveal dissimilarities at different scales of a feature space and identify clusters. The performance of these approaches has been tested on various real-world datasets of power variation with multi-scale nature and gene expression. Our findings show that iterative eigengap search with a PCA-based global scaling scheme can discover different patterns with an accuracy of higher than 90% in most cases without asking for a priori input information.

algorithm, dataset, spectral, (12 more...)

1902.0199

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Middleton, Lawrence, Deligiannidis, George, Doucet, Arnaud, Jacob, Pierre E.

Unbiased Smoothing using Particle Independent Metropolis-Hastings

We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators. Unbiased estimators are appealing in the context of parallel computing, and facilitate the construction of confidence intervals. The proposed scheme only requires access to off-the-shelf Particle Filters (PF) and is thus easier to implement than recently proposed unbiased smoothers. The approach is demonstrated on a L\'evy-driven stochastic volatility model and a stochastic kinetic model.

arnaud doucet, estimator, unbiased estimator, (13 more...)

1902.01781

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)