AITopics

2001.00593

Country:

Europe > France (0.04)
Europe > Austria > Tyrol > Innsbruck (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningDec-31-2019

The Gambler's Problem and Beyond

Wang, Baoxiang, Li, Shuai, Li, Jiajin, Chan, Siu On

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, not smooth on any interval, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could lead insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.

bellman equation, optimal value function, value function, (15 more...)

2001.00102

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Ohio (0.04)
Europe > Hungary (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report (0.50)
Summary/Review (0.34)
Instructional Material (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kaltenbach, Sebastian, Koutsourelakis, Phaedon-Stelios

Incorporating physical constraints in a deep probabilistic machine learning framework for coarse-graining dynamical systems

arXiv.org Machine LearningDec-30-2019

Data-based discovery of effective, coarse-grained (CG) models of high-dimensional dynamical systems presents a unique challenge in computational physics and particularly in the context of multiscale problems. The present paper offers a data-based, probablistic perspective that enables the quantification of predictive uncertainties. One of the outstanding problems has been the introduction of physical constraints in the probabilistic machine learning objectives. The primary utility of such constraints stems from the undisputed physical laws such as conservation of mass, energy etc that they represent. Furthermore and apart from leading to physically realistic predictions, they can significantly reduce the requisite amount of training data which for high-dimensional, multiscale systems are expensive to obtain (Small Data regime). We formulate the coarse-graining process by employing a probabilistic state-space model and account for the aforementioned equality constraints as virtual observables in the associated densities. We demonstrate how probabilistic inference tools can be employed to identify the coarse-grained variables in combination with deep neural nets and their evolution model without ever needing to define a fine-to-coarse (restriction) projection and without needing time-derivatives of state variables. The formulation adopted enables the quantification of a crucial, and often neglected, component in the CG process, i.e. the predictive uncertainty due to information loss. Furthermore, it is capable of reconstructing the evolution of the full, fine-scale system and therefore the observables of interest need not be selected a priori. We demonstrate the efficacy of the proposed framework by applying it to systems of interacting particles and an image series of a nonlinear pendulum. In both cases we identify the underlying coarse dynamics and can generate extrap-olative predicitions including the forming and propagation of a shock for the particle systems and a stable trajectory in the phase space for the pendulum. Keywords: Bayesian machine learning, virtual observables, multiscale modeling, reduced order modeling, coarse graining1. Introduction High-dimensional, nonlinear dynamical systems are ubiquitous in applied physics and engineering. The computational resources needed for their solution can grow exponentially with the dimension of the state-space as well as with the smallest timescale that needs to be resolved as this determines the discretization time-step.

constraint, equation, prediction, (15 more...)

1912.12976

Country:

North America > United States > New York (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Bernton, Espen, Heng, Jeremy, Doucet, Arnaud, Jacob, Pierre E.

Schr\"odinger Bridge Samplers

arXiv.org Machine LearningDec-30-2019

Consider a reference Markov process with initial distribution $\pi_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $\pi_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schr\"odinger addressed the problem of identifying the Markov process with initial distribution $\pi_{0}$ and terminal distribution equal to $\pi_{T}$ which is the closest to the reference process in terms of Kullback--Leibler divergence. This special case of the so-called Schr\"odinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schr\"odinger bridge samplers, to approximate a target distribution $\pi$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $\pi_T = \pi$, via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures.

artificial intelligence, machine learning, schr odinger bridge, (14 more...)

1912.1317

Country: Europe > United Kingdom (0.27)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningDec-29-2019

Hierarchical Variational Imitation Learning of Control Programs

Fox, Roy, Shin, Richard, Paul, William, Zou, Yitian, Song, Dawn, Goldberg, Ken, Abbeel, Pieter, Stoica, Ion

Autonomous agents can learn by imitating teacher demonstrations of the intended behavior. Hierarchical control policies are ubiquitously useful for such learning, having the potential to break down structured tasks into simpler sub-tasks, thereby improving data efficiency and generalization. In this paper, we propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP), a program-like structure in which procedures can invoke sub-procedures to perform sub-tasks. Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations. Samples from this learned distribution then guide the training of the hierarchical control policy. We identify and demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods. Training PHP with variational inference outperforms LSTM baselines in terms of data efficiency and generalization, requiring less than half as much data to achieve a 24% error rate in executing the bubble sort algorithm, and to achieve no error in executing Karel programs.

ariational imitation learning, demonstration, procedure, (11 more...)

1912.12612

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Demirel, Berkay, Sánchez-Fibla, Martí

Speeding up reinforcement learning by combining attention and agency features

arXiv.org Artificial IntelligenceDec-29-2019

When playing video-games we immediately detect which entity we control and we center the attention towards it to focus the learning and reduce its dimensionality. Reinforcement Learning (RL) has been able to deal with big state spaces, including states derived from pixel images in Atari games, but the learning is slow, depends on the brute force mapping from the global state to the action values (Q-function), thus its performance is severely affected by the dimensionality of the state and cannot be transferred to other games or other parts of the same game. We propose different transformations of the input state that combine attention and agency detection mechanisms which both have been addressed separately in RL but not together to our knowledge. We propose and benchmark different architectures including both global and local agency centered versions of the state and also including summaries of the surroundings. Results suggest that even a redundant global-local state network can learn faster than the global alone. Summarized versions of the state look promising to achieve input-size independence learning.

agent, architecture, global state, (13 more...)

doi: 10.3233/FAIA190111

1912.12623

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Andriotis, C. P., Papakonstantinou, K. G., Chatzi, E. N.

Value of structural health monitoring quantification in partially observable stochastic environments

arXiv.org Artificial IntelligenceDec-28-2019

Sequential decision-making under uncertainty for optimal life-cycle control of deteriorating engineering systems and infrastructure entails two fundamental classes of decisions. The first class pertains to the various structural interventions, which can directly modify the existing properties of the system, while the second class refers to prescribing appropriate inspection and monitoring schemes, which are essential for updating our existing knowledge about the system states. The latter have to rely on quantifiable measures of efficiency, determined on the basis of objective criteria that, among others, consider the Value of Information (VoI) of different observational strategies, and the Value of Structural Health Monitoring (VoSHM) over the entire system life-cycle. In this work, we present general solutions for quantifying the VoI and VoSHM in partially observable stochastic domains, and although our definitions and methodology are general, we are particularly emphasizing and describing the role of Partially Observable Markov Decision Processes (POMDPs) in solving this problem, due to their advantageous theoretical and practical attributes in estimating arbitrarily well globally optimal policies. POMDP formulations are articulated for different structural environments having shared intervention actions but diversified inspection and monitoring options, thus enabling VoI and VoSHM estimation through their differentiated stochastic optimal control policies. POMDP solutions are derived using point-based solvers, which can efficiently approximate the POMDP value functions through Bellman backups at selected reachable points of the belief space. The suggested methodology is applied on stationary and non-stationary deteriorating environments, with both infinite and finite planning horizons, featuring single- or multi-component engineering systems.

algorithm, information, papakonstantinou, (14 more...)

1912.12534

Country:

Europe > Austria > Vienna (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry:

Materials > Construction Materials (0.68)
Health & Medicine > Consumer Health (0.62)
Energy > Renewable > Wind (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceDec-27-2019

Observational Overfitting in Reinforcement Learning

Song, Xingyou, Jiang, Yiding, Tu, Stephen, Du, Yilun, Neyshabur, Behnam

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP). We provide a general framework for analyzing this scenario, which we use to design multiple synthetic benchmarks from only modifying the observation space of an MDP. When an agent overfits to different observation spaces even if the underlying MDP dynamics is fixed, we term this observational overfitting. Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).

generalization, international conference, regularization, (13 more...)

1912.02975

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Sweden > Stockholm > Stockholm (0.05)
(6 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Read, Jesse, Pfahringer, Bernhard, Holmes, Geoff, Frank, Eibe

Classifier Chains: A Review and Perspectives

arXiv.org Artificial IntelligenceDec-26-2019

The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves linking together off-the-shelf binary classifiers in a chain structure, such that class label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of how exactly it works, and how it could be improved, and in the recent decade numerous studies have explored classifier chains mechanisms on a theoretical level, and many improvements have been made to the training and inference procedures, such that this method remains among the state-of-the-art options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining a number of areas for future research.

artificial intelligence, data mining, machine learning, (19 more...)

1912.13405

Country:

North America > United States > New York > New York County > New York City (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Overview (0.86)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

arXiv.org Machine LearningDec-25-2019

Unsupervised model-free representation learning

Ryabko, Daniil

Numerous control and learning problems face the situation where sequences of high-dimensional highly dependent data are available but no or little feedback is provided to the learner, which makes any inference rather challenging. To address this challenge, we formulate the following problem. Given a series of observations $X_0,\dots,X_n$ coming from a large (high-dimensional) space $\mathcal X$, find a representation function $f$ mapping $\mathcal X$ to a finite space $\mathcal Y$ such that the series $f(X_0),\dots,f(X_n)$ preserves as much information as possible about the original time-series dependence in $X_0,\dots,X_n$. We show that, for stationary time series, the function $f$ can be selected as the one maximizing a certain information criterion that we call time-series information. Some properties of this functions are investigated, including its uniqueness and consistency of its empirical estimates. Implications for the problem of optimal control are presented.

conditionally independent, information, representation, (14 more...)

doi: 10.1109/TIT.2019.2961814

1304.4806

Country:

North America > United States (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)