AITopics

1907.06198

Country: Europe > Italy (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

arXiv.org Machine LearningJul-14-2019

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Yang, Zhuoran, Chen, Yongxin, Hong, Mingyi, Wang, Zhaoran

Compared with the classical policy gradient algorithm 1992), actor-critic tracks the action-value function (critic) in policy gradient in an online(Williams, manner, and alternatively updates the policy (actor) and the critic. On the one hand, the online update of critic significantly reduces the variance of policy gradient and hence leads to faster convergence. On the other hand, it also introduces algorithmic instability, which is often observed in practice (Islam et al., 2017) and parallels the notoriously unstable training of generative adversarial and Vinyals, 2016). Such instability of actor-critic originates from severalnetworks (Pfau intertwining challenges, including(i) function approximation of actor and critic, (ii) improper choice of stepsizes, (iii) the noise arising from stochastic approximation, (iv) the asynchrony between actor and critic, and (v) possibly off-policy data used in the update of critic. As a result, the convergence of actor-critic remains much less well understood than that of policy gradient, which itself is open. Consequently, the practical use of actor-critic often lacks theoretical guidance. In this paper, we aim to theoretically understand the algorithmic instability of actor-critic.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1907.06246

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Saad, Feras A., Cusumano-Towner, Marco F., Schaechtle, Ulrich, Rinard, Martin C., Mansinghka, Vikash K.

Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

arXiv.org Artificial IntelligenceJul-14-2019

We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs in these modeling languages given observed data. We provide a precise formulation of Bayesian synthesis for automatic data modeling that identifies sufficient conditions for the resulting synthesis procedure to be sound. We also derive a general class of synthesis algorithms for domain-specific languages specified by probabilistic context-free grammars and establish the soundness of our approach for these languages. We apply the techniques to automatically synthesize probabilistic programs for time series data and multivariate tabular data. We show how to analyze the structure of the synthesized programs to compute, for key qualitative properties of interest, the probability that the underlying data generating process exhibits each of these properties. Second, we translate probabilistic programs in the domain-specific language into probabilistic programs in Venture, a general-purpose probabilistic programming system. The translated Venture programs are then executed to obtain predictions of new time series data and new multivariate data records. Experimental results show that our techniques can accurately infer qualitative structure in multiple real-world data sets and outperform standard data analysis methods in forecasting and predicting new data.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1145/3290350

1907.06249

Country:

North America > United States (0.28)
Europe > United Kingdom (0.27)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
(5 more...)

#artificialintelligenceJul-12-2019, 06:54:35 GMT

Restricted Boltzmann Machine with Multivalued Hidden Variables

Generalization is one of the most important goals in statistical machine learning problems [1]. In various standard machine learning techniques, given a particular data set, we fit our probabilistic learning model to the empirical distribution (or the data distribution) of the data set. When our learning model is sufficiently flexible, it can fit the empirical distribution exactly via an appropriate learning method. A learning model that is too close to the empirical distribution frequently gives poor results for new data points. This situation is known as over-fitting. Over-fitting impedes generalization; therefore, techniques that can suppress over-fitting are needed to achieve good generalizations.

artificial intelligence, machine learning, rbm, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.42)

Kim, Yoon, Dyer, Chris, Rush, Alexander M.

Compound Probabilistic Context-Free Grammars for Grammar Induction

arXiv.org Machine LearningJul-12-2019

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods for grammar induction from words with neural language models.

artificial intelligence, machine learning, natural language, (15 more...)

1906.10225

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry:

Law (1.00)
Banking & Finance > Trading (1.00)
Government > Military (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Chiang, Hao-Tien Lewis, Hsu, Jasmine, Fiser, Marek, Tapia, Lydia, Faust, Aleksandra

RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies

arXiv.org Artificial IntelligenceJul-12-2019

This paper addresses two challenges facing sampling-based kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference. Video: https://youtu.be/dDMVMTOI8KY

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1907.04799

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Walraven, Erwin, Spaan, Matthijs T. J.

Point-Based Value Iteration for Finite-Horizon POMDPs

Journal of Artificial Intelligence ResearchJul-11-2019

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

algorithm, pomdp, value function, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11324

AI Access Foundation

11324

Journal of Artificial Intelligence Research

Country:

North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
Europe > Spain (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(2 more...)

Genre: Overview (0.68)

Industry: Energy > Power Industry (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Nadjahi, Kimia, Laroche, Romain, Combes, Rémi Tachet des

Safe Policy Improvement with Soft Baseline Bootstrapping

arXiv.org Artificial IntelligenceJul-11-2019

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining the policy change according to the local model uncertainty. The method can take more risks on uncertain actions all the while remaining provably-safe, and is therefore less conservative than the state-of-the-art methods. We propose two algorithms (one optimal and one approximate) to solve this constrained optimization problem and empirically show a significant improvement over existing SPI algorithms both on finite MDPs and on infinite MDPs with a neural network function approximation.

algorithm, baseline, safe policy improvement, (10 more...)

1907.05079

Country:

North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.62)

Phan, Thomy, Gabor, Thomas, Müller, Robert, Roch, Christoph, Linnhoff-Popien, Claudia

Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning

arXiv.org Artificial IntelligenceJul-11-2019

We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.

artificial intelligence, data mining, machine learning, (17 more...)

1907.05861

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Jarboui, Firas, Gruson-daniel, Célya, Durmus, Alain, Rocchisani, Vincent, Ebongue, Sophie-helene Goulet, Depoux, Anneliese, Kirschenmann, Wilfried, Perchet, Vianney

Markov Decision Process for MOOC users behavioral inference

arXiv.org Machine LearningJul-10-2019

Studies on massive open online courses (MOOCs) users discuss the existence of typical profiles and their impact on the learning process of the students. However defining the typical behaviors as well as classifying the users accordingly is a difficult task. In this paper we suggest two methods to model MOOC users behaviour given their log data. We mold their behavior into a Markov Decision Process framework. We associate the user's intentions with the MDP reward and argue that this allows us to classify them.

artificial intelligence, machine learning, markov decision process, (11 more...)

1907.04723

Country: Europe > France (0.05)

Genre: Instructional Material > Online (0.93)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)