AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Provably Efficient Reinforcement Learning with Aggregated States

Dong, Shi, Van Roy, Benjamin, Zhou, Zhengyuan

arXiv.org Machine LearningDec-13-2019

We establish that an optimistic variant of Q-learning applied to a finite-horizon episodic Markov decision process with an aggregated state representation incurs regret $\tilde{\mathcal{O}}(\sqrt{H^5 M K} + \epsilon HK)$, where $H$ is the horizon, $M$ is the number of aggregate states, $K$ is the number of episodes, and $\epsilon$ is the largest difference between any pair of optimal state-action values associated with a common aggregate state. Notably, this regret bound does not depend on the number of states or actions. To the best of our knowledge, this is the first such result pertaining to a reinforcement learning algorithm applied with nontrivial value function approximation without any restrictions on the Markov decision process.

aggregate state, aggregation, algorithm, (13 more...)

arXiv.org Machine Learning

1912.06366

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Add feedback

Provably Efficient Exploration in Policy Optimization

Cai, Qi, Yang, Zhuoran, Jin, Chi, Wang, Zhaoran

arXiv.org Machine LearningDec-12-2019

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an "optimistic version" of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves $\tilde{O}(\sqrt{d^3 H^3 T})$ regret. Here $d$ is the feature dimension, $H$ is the episode horizon, and $T$ is the total number of steps. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.

algorithm 1, arxiv preprint arxiv, reward function, (10 more...)

arXiv.org Machine Learning

1912.0583

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Game Design for Eliciting Distinguishable Behavior

Yang, Fan, Leqi, Liu, Wu, Yifan, Lipton, Zachary C., Ravikumar, Pradeep, Cohen, William W., Mitchell, Tom

arXiv.org Artificial IntelligenceDec-12-2019

The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing \emph{behavior diagnostic games} that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin.

grid, mutual information, trajectory, (14 more...)

arXiv.org Artificial Intelligence

1912.06074

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Iowa (0.04)
North America > Canada (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Recurrent Transform Learning

Gupta, Megha, Majumdar, Angshul

arXiv.org Machine LearningDec-11-2019

The objective of this work is to improve the accuracy of building demand forecasting . This is a more challenging t ask than grid level forecasting. For the said purpose, we develop a new technique called recurrent transform learning (RTL). The first one (RTL) is unsupervised; this is used as a feature extraction tool that is further fed into a regression model. Forecasting experiments have been carried out on three popular publicly available datasets. Both of our proposed techniques yield results superior to the state - of - the - art like long short term memory network, echo state network and sparse coding regression. Index Terms -- demand forecasting, dynamical model, load forecasting, transform learning . H E impor tance of electrical load forecasting is well known. The issue has gained even more significance with the advent of smartgrids, microgrids and smart buildings. An excellent review on this topic can be found in [1].

forecasting, formulation, load forecasting, (13 more...)

arXiv.org Machine Learning

1912.05198

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Asia > India > NCT > Delhi (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > India > NCT > New Delhi (0.04)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

Mou, Wenlong, Ho, Nhat, Wainwright, Martin J., Bartlett, Peter L., Jordan, Michael I.

arXiv.org Machine LearningDec-11-2019

Various researchers have studied posterior inference of parameters in Bayesian mixture models [24, 42, 23], so that the statistical behavior of such models is relatively well-understood. In contrast, much less is known about the efficiency of different algorithms for sampling from the posterior distributions that arise from Bayesian mixture models. A standard approach for doing so is via some form of Markov Chain Monte Carlo (MCMC). Many different types of MCMC algorithms have been introduced for various types of Bayesian mixture models, including finite Bayesian mixture models [21, 49, 50, 26, 40], Dirichlet process mixture models [37, 41, 25, 28], and hierarchical and nested Dirichlet process models [52, 47]. Despite the plethora of possible MCMC methods, upper bounds on their mixing times are often challenging to establish. We refer the reader to the papers [27, 3, 55, 48, 57] for non-asymptotic upper bounds on mixing times for certain types of Bayesian models, different from those studied in this paper. In recent years, it has been increasingly common in the Bayesian literature to make use of a fractional likelihood--meaning an ordinary likelihood raised to some fractional power. Combining such a fractional likelihood with a prior distribution in the usual way leads to a class of posteriors known as power posterior or fractional posterior distributions. The power posterior distributions have been shown to have attractive properties in terms of robustness to mis-specification in Bayesian mixture models [39], and have been used in various applications 1 arXiv:1912.05153v1

algorithm, inequality, power posterior, (16 more...)

arXiv.org Machine Learning

1912.05153

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

A spiking neural-network model of goal-directed behaviour

#artificialintelligenceDec-10-2019, 01:59:39 GMT

In mammals, goal-directed and planning processes support flexible behaviour usable to face new situations or changed conditions that cannot be tackled through more efficient but rigid habitual behaviours. Within the Bayesian modelling approach of brain and behaviour, probabilistic models have been proposed to perform planning as a probabilistic inference. Recently, some models have started to face the important challenge met by this approach: grounding such processes on the computations implemented by brain spiking networks. Here we propose a model of goal-directed behaviour that has a probabilistic interpretation and is centred on a recurrent spiking neural network representing the world model. The model, building on previous proposals on spiking neurons and plasticity rules having a probabilistic interpretation, presents these novelties at the system level: (a) the world model is learnt in parallel with its use for planning, and an arbitration mechanism decides when to exploit the world-model knowledge with planning, or to explore, on the basis of an entropy-based confidence on the world model knowledge; (b) the world model is a hidden Markov model (HMM) able to simulate sequences of states and actions, thus planning selects actions through the same neural generative process used to predict states; (c) the world model learns the hidden causes of observations, and their temporal dependencies, through a biologically plausible unsupervised learning mechanism.

conséquence, goal-directed behaviour, world model, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Xu, Pan, Gu, Quanquan

arXiv.org Machine LearningDec-10-2019

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

algorithm, function approximation, q-learning, (15 more...)

arXiv.org Machine Learning

1912.04511

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Before we can find a model, we must forget about perfection

Dobrev, Dimiter

arXiv.org Artificial IntelligenceDec-10-2019

With Reinforcement Learning we assume that a model of the world does exist. We assume furthermore that the model in question is perfect (i.e. it describes the world completely and unambiguously). This article will demonstrate that it does not make sense to search for the perfect model because this model is too complicated and practically impossible to find. We will show that we should abandon the pursuit of perfection and pursue Event-Driven (ED) models instead. These models are generalization of Markov Decision Process (MDP) models. This generalization is essential because nothing can be found without it. Rather than a single MDP, we will aim to find a raft of neat simple ED models each one describing a simple dependency or property. In other words, we will replace the search for a singular and complex perfect model with a search for a large number of simple models.

agent, generator, probability, (17 more...)

arXiv.org Artificial Intelligence

1912.04964

Genre:

Research Report (0.50)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Self-regularizing restricted Boltzmann machines

Loukas, Orestis

arXiv.org Machine LearningDec-9-2019

Focusing on the grand-canonical extension of the ordinary restricted Boltzmann machine, we suggest an energy-based model for feature extraction that uses a layer of hidden units with varying size. By an appropriate choice of the chemical potential and given a sufficiently large number of hidden resources the generative model is able to efficiently deduce the optimal number of hidden units required to learn the target data with exceedingly small generalization error. The formal simplicity of the grand-canonical ensemble combined with a rapidly converging ansatz in mean-field theory enable us to recycle well-established numerical algothhtims during training, like contrastive divergence, with only minor changes. As a proof of principle and to demonstrate the novel features of grand-canonical Boltzmann machines, we train our generative models on data from the Ising theory and MNIST.

boltzmann machine, chemical potential, rbm, (16 more...)

arXiv.org Machine Learning

1912.05634

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria > Vienna (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Add feedback

An empirical study of neural networks for trend detection in time series

Miot, Alexandre, Drigout, Gilles

arXiv.org Machine LearningDec-9-2019

We have derived theoretical maximum likelihood estimators of trends for standard dynamics and implemented them. We have reframed the problem of trend detection into a classification problem amenable to machine learning methods. We have shown that RNN are in a way a generalization of simple moving average techniques and motivated this by theory. In a simple case, we have shown that this generalization transforms the trend estimation problem into simply locating the state vector into convex polytopes cells. Finally, we have showed empirically that GRU or LSTM cells are on average the best building block to use compared to a broad range of estimators in order to detect trends in time series. Putting the emphasis on learning stylized data and then transferring to real data rather than building complex structures fitted to data is also an important takeaway of this paper. Ongoing preliminary research seems to validate our approach for financial applications. This might pave the way to building efficient market estimators protected against over-fitting.

estimator, rnn baseline, time sery, (12 more...)

arXiv.org Machine Learning

1912.04009

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Banking & Finance > Trading (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback