AITopics

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Particle Filter-based Policy Gradient in POMDPs

Coquelin, Pierre-arnaud, Deguest, Romain, Munos, Rémi

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency.

artificial intelligence, machine learning, variance, (17 more...)

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Compressed Least-Squares Regression

Maillard, Odalric, Munos, Rémi

We consider the problem of learning, from K data, a regression function in a linear spaceof high dimension N using projections onto a random subspace of lower dimension M. From any algorithm minimizing the (possibly penalized) empirical risk,we provide bounds on the excess risk of the estimate computed in the projected subspace (compressed domain) in terms of the excess risk of the estimate builtin the high-dimensional space (initial domain). We show that solving the problem in the compressed domain instead of the initial domain reduces the estimation error at the price of an increased (but controlled) approximation error. We apply the analysis to Least-Squares (LS) regression and discuss the excess risk and numerical complexity of the resulting "Compressed Least Squares Regression" (CLSR)in terms of N, K, and M. When we choose M O( K),we show that CLSR has an estimation error of order O(log K/ K).

artificial intelligence, excess risk, machine learning, (17 more...)

Country: North America > United States > New York (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Online Optimization in X-Armed Bandits

Bubeck, Sébastien, Stoltz, Gilles, Szepesvári, Csaba, Munos, Rémi

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Hölder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $n$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.

artificial intelligence, big data, node, (20 more...)

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)

Algorithms for Infinitely Many-Armed Bandits

Wang, Yizao, Audibert, Jean-yves, Munos, Rémi

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being anear-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matches (up to a logarithmic factor) the upper-bound in some cases.

algorithm, artificial intelligence, big data, (18 more...)

Country:

Europe > France (0.28)
North America > United States > Michigan (0.14)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Sensitivity analysis in HMMs with application to likelihood maximization

Coquelin, Pierre-arnaud, Deguest, Romain, Munos, Rémi

This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.

artificial intelligence, estimator, machine learning, (16 more...)

Country:

North America > United States > New York (0.15)
Europe > France (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsDec-31-2008

Fitted Q-iteration in continuous action-space MDPs

Antos, András, Szepesvári, Csaba, Munos, Rémi

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous theoretical analysis of this algorithm, proving what we believe is the first finite-time bounds for value-function based algorithms for continuous state- and action-space problems.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

Country:

Europe (0.47)
North America > Canada > Alberta (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Neural Information Processing SystemsDec-31-2002

Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.

artificial intelligence, derivative, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Neural Information Processing SystemsDec-31-1999

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

Munos, Rémi, Moore, Andrew W.

In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement functions.

artificial intelligence, convergence, reinforcement learning, (13 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)