AITopics | Szepesvari, Csaba

Collaborating Authors

Szepesvari, Csaba

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm

Abbasi-Yadkori, Yasin, Szepesvari, Csaba

arXiv.org Machine LearningJun-16-2014

We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems. Since computing the optimal control is computationally expensive, we design an algorithm that trades off performance for computational efficiency. The algorithm is a lazy posterior sampling method that maintains a distribution over the unknown parameter. The algorithm changes its policy only when the variance of the distribution is reduced sufficiently. Importantly, we analyze the algorithm and show the precise nature of the performance vs. computation tradeoff. Finally, we show the effectiveness of the method on a web server control application.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1406.3926

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Control Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Abbasi, Yasin, Bartlett, Peter L., Kanade, Varun, Seldin, Yevgeny, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2013

We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves $O(\sqrt{T\log|\Pi|}+\log|\Pi|)$ regret with respect to a comparison set of policies $\Pi$. The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set $\Pi$ has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem. Here, in each episode an adversary may choose a weighted directed acyclic graph with an identified start and finish node. The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node. At the end of each episode the loss function (given by weights on the edges) is revealed to the learning algorithm. The goal is to minimize regret with respect to a fixed policy for selecting paths. This problem is a special case of the online MDP problem. For randomly chosen graphs and adversarial losses, this problem can be efficiently solved. We show that it also can be efficiently solved for adversarial graphs and randomly chosen losses. When both graphs and losses are adversarially chosen, we present an efficient algorithm whose regret scales linearly with the number of distinct graphs. Finally, we show that designing efficient algorithms for the adversarial online shortest path problem (and hence for the adversarial MDP problem) is as hard as learning parity with noise, a notoriously difficult problem that has been used to design efficient cryptographic schemes.

algorithm, computer based training, educational technology, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)

Add feedback

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Abbasi-Yadkori, Yasin, Bartlett, Peter L., Szepesvari, Csaba

arXiv.org Machine LearningMar-12-2013

We study the problem of learning Markov decision processes with finite state and action spaces when the transition probability distributions and loss functions are chosen adversarially and are allowed to change with time. We introduce an algorithm whose regret with respect to any policy in a comparison class grows as the square root of the number of rounds of the game, provided the transition probabilities satisfy a uniform mixing condition. Our approach is efficient as long as the comparison class is polynomial and we can compute expectations over sample paths for each policy. Designing an efficient algorithm with small regret for the general case remains an open problem.

algorithm, computer based training, educational technology, (16 more...)

arXiv.org Machine Learning

1303.3055

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.42)

Add feedback

Approximate Policy Iteration with Linear Action Models

Yao, Hengshuai (University of Alberta) | Szepesvari, Csaba (University of Alberta)

AAAI ConferencesJul-21-2012

In this paper we consider the problem of finding a good policy given some batch data.We propose a new approach, LAM-API, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy.A natural choice for the policy evaluation step in this algorithm is to use least-squares temporal difference (LSTD) learning algorithm.Empirical results on three benchmark problems show that this particular instance of LAM-API performs competitively as compared with LSPI, both from the point of view of data and computational efficiency.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Statistical Linear Estimation with Penalized Estimators: an Application to Reinforcement Learning

Pires, Bernardo Avila, Szepesvari, Csaba

arXiv.org Machine LearningJun-27-2012

Motivated by value function estimation in reinforcement learning, we study statistical linear inverse problems, i.e., problems where the coefficients of a linear system to be solved are observed in noise. We consider penalized estimators, where performance is evaluated using a matrix-weighted two-norm of the defect of the estimator measured with respect to the true, unknown coefficients. Two objective functions are considered depending whether the error of the defect measured with respect to the noisy coefficients is squared or unsquared. We propose simple, yet novel and theoretically well-founded data-dependent choices for the regularization parameters for both cases that avoid data-splitting. A distinguishing feature of our analysis is that we derive deterministic error bounds in terms of the error of the coefficients, thus allowing the complete separation of the analysis of the stochastic properties of these errors. We show that our results lead to new insights and bounds for linear value function estimation in reinforcement learning.

artificial intelligence, inequality, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1206.6444

Country:

North America > Canada > Alberta (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Adaptive Algorithm for Finite Stochastic Partial Monitoring

Bartok, Gabor, Zolghadr, Navid, Szepesvari, Csaba

arXiv.org Machine LearningJun-27-2012

We present a new anytime algorithm that achieves near-optimal regret for any instance of finite stochastic partial monitoring. In particular, the new algorithm achieves the minimax regret, within logarithmic factors, for both "easy" and "hard" problems. For easy problems, it additionally achieves logarithmic individual regret. Most importantly, the algorithm is adaptive in the sense that if the opponent strategy is in an "easy region" of the strategy space then the regret grows as if the problem was easy. As an implication, we show that under some reasonable additional assumptions, the algorithm enjoys an O(\sqrt{T}) regret in Dynamic Pricing, proven to be hard by Bartok et al. (2011).

algorithm, artificial intelligence, learner, (10 more...)

arXiv.org Machine Learning

1206.6487

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

Neu, Gergely, Szepesvari, Csaba

arXiv.org Machine LearningJun-20-2012

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1206.5264

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Analysis of Kernel Mean Matching under Covariate Shift

Yu, Yaoliang, Szepesvari, Csaba

arXiv.org Machine LearningJun-18-2012

In real supervised learning scenarios, it is not uncommon that the training and test sample follow different probability distributions, thus rendering the necessity to correct the sampling bias. Focusing on a particular covariate shift problem, we derive high probability confidence bounds for the kernel mean matching (KMM) estimator, whose convergence rate turns out to depend on some regularity measure of the regression function and also on some capacity measure of the kernel. By comparing KMM with the natural plug-in estimator, we establish the superiority of the former hence provide concrete evidence/understanding to the effectiveness of KMM under covariate shift.

artificial intelligence, assumption, machine learning, (11 more...)

arXiv.org Machine Learning

1206.465

Country:

North America > Canada > Alberta (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Add feedback

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions

Isaza, Alejandro, Szepesvari, Csaba, Bulitko, Vadim, Greiner, Russell

arXiv.org Artificial IntelligenceJun-13-2012

In this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions.

abstraction, computer game, neural network, (19 more...)

arXiv.org Artificial Intelligence

1206.3233

Country: North America > Canada > Alberta (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Add feedback

PAC-Bayesian Policy Evaluation for Reinforcement Learning

Fard, Mahdi MIlani, Pineau, Joelle, Szepesvari, Csaba

arXiv.org Machine LearningFeb-14-2012

Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.

bayesian inference, epilepsy, value function, (20 more...)

arXiv.org Machine Learning

1202.3717

Country:

North America > United States > Massachusetts (0.28)
North America > Canada > Alberta (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback