AITopics | Szepesvari, Csaba

Collaborating Authors

Szepesvari, Csaba

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning

Lever, Guy (University College London) | Shawe-Taylor, John (University College London) | Stafford, Ronnie (University College London) | Szepesvari, Csaba (University of Alberta)

AAAI ConferencesApr-19-2016

We present a model-based approach to solving Markov decision processes (MDPs) in which the system dynamics are learned using conditional mean embeddings (CMEs). This class of methods comes with strong performance guarantees, and enables planning to be performed in an induced finite (pseudo-)MDP, which approximates the MDP, but can be solved exactly using dynamic programming. Two drawbacks of existing methods exist: firstly, the size of the induced finite (pseudo-)MDP scales quadratically with the amount of data used to learn the model, costing much memory and time when planning with the learned model; secondly, learning the CME itself using powerful kernel least-squares is costly – a second computational bottleneck. We present an algorithm which maintains a rich kernelized CME model class, but solves both problems: firstly we demonstrate that the loss function for the CME model suggests a principled approach to compressing the induced (pseudo-)MDP, leading to faster planning, while maintaining guarantees; secondly we propose to learn the CME model itself using fast sparse-greedy kernel regression well-suited to the RL context. We demonstrate superior performance to existing methods in this class of modelbased approaches on a range of MDPs.

Add feedback

Online Learning with Gaussian Payoffs and Side Observations

Wu, Yifan, György, András, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2015

We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

algorithm, computer based training, educational technology, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.57)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)

Add feedback

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

Hsu, Daniel J., Kontorovich, Aryeh, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2015

This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time $t_{mix}$ of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The interval is constructed around the relaxation time $t_{relax}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least $\Omega(t_{relax})$ times on the average. Finally, future directions of research are identified.

artificial intelligence, machine learning, markov chain, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

Add feedback

Combinatorial Cascading Bandits

Kveton, Branislav, Wen, Zheng, Ashkan, Azin, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2015

We propose combinatorial cascading bandits, a class of partial monitoring problems whereat each step a learning agent chooses a tuple of ground items subject to constraints and receives a reward if and only if the weights of all chosen items are one. The weights of the items are binary, stochastic, and drawn independently of each other. The agent observes the index of the first chosen item whose weight is zero. This observation model arises in network routing, for instance, where the learning agent may only observe the first link in the routing path which is down, and blocks the path. We propose a UCB-like algorithm for solving our problems, CombCascade; and prove gap-dependent and gap-free upper bounds on its n-step regret. Our proofs build on recent work in stochastic combinatorial semi-bandits but also address two novel challenges of our setting, a nonlinear reward function and partial observability. We evaluate CombCascade on two real-world problems and show that it performs well even when our modeling assumptions are violated. We also demonstrate that our setting requires a new learning algorithm.

artificial intelligence, combcascade, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (0.47)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Communications > Networks (0.89)

Add feedback

Linear Multi-Resource Allocation with Semi-Bandit Feedback

Lattimore, Tor, Crammer, Koby, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2015

We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks. Assigning more resources to a task increases the probability that it is completed. The problem is challenging because the alignment of the tasks to the resource types is unknown and the feedback is noisy. Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis. Along the way we draw connections to the problem of minimising regret for stochastic linear bandits with heteroscedastic noise. We also present some new results for stochastic linear bandits on the hypercube that significantly out-performs existing work, especially in the sparse case.

algorithm, artificial intelligence, linear bandit, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Combinatorial Cascading Bandits

Kveton, Branislav, Wen, Zheng, Ashkan, Azin, Szepesvari, Csaba

arXiv.org Machine LearningNov-17-2015

We propose combinatorial cascading bandits, a class of partial monitoring problems where at each step a learning agent chooses a tuple of ground items subject to constraints and receives a reward if and only if the weights of all chosen items are one. The weights of the items are binary, stochastic, and drawn independently of each other. The agent observes the index of the first chosen item whose weight is zero. This observation model arises in network routing, for instance, where the learning agent may only observe the first link in the routing path which is down, and blocks the path. We propose a UCB-like algorithm for solving our problems, CombCascade; and prove gap-dependent and gap-free upper bounds on its $n$-step regret. Our proofs build on recent work in stochastic combinatorial semi-bandits but also address two novel challenges of our setting, a non-linear reward function and partial observability. We evaluate CombCascade on two real-world problems and show that it performs well even when our modeling assumptions are violated. We also demonstrate that our setting requires a new learning algorithm.

artificial intelligence, combcascade, machine learning, (16 more...)

arXiv.org Machine Learning

1507.04208

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.94)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Cascading Bandits: Learning to Rank in the Cascade Model

Kveton, Branislav, Szepesvari, Csaba, Wen, Zheng, Ashkan, Azin

arXiv.org Machine LearningMay-18-2015

A search engine usually outputs a list of $K$ web pages. The user examines this list, from the first web page to the last, and chooses the first attractive page. This model of user behavior is known as the cascade model. In this paper, we propose cascading bandits, a learning variant of the cascade model where the objective is to identify $K$ most attractive items. We formulate our problem as a stochastic combinatorial partial monitoring problem. We propose two algorithms for solving it, CascadeUCB1 and CascadeKL-UCB. We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits. The lower bound matches the upper bound of CascadeKL-UCB up to a logarithmic factor. We experiment with our algorithms on several problems. The algorithms perform surprisingly well even when our modeling assumptions are violated.

artificial intelligence, cascadekl-ucb, data mining, (16 more...)

arXiv.org Machine Learning

1502.02763

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Kveton, Branislav, Wen, Zheng, Ashkan, Azin, Szepesvari, Csaba

arXiv.org Artificial IntelligenceJan-27-2015

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove $O(K L (1 / \Delta) \log n)$ and $O(\sqrt{K L n \log n})$ upper bounds on its $n$-step regret, where $L$ is the number of ground items, $K$ is the maximum number of chosen items, and $\Delta$ is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

artificial intelligence, combucb1, machine learning, (10 more...)

arXiv.org Artificial Intelligence

1410.0949

Country: North America > United States > California > Santa Clara County (0.28)

Industry: Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Universal Option Models

yao, hengshuai, Szepesvari, Csaba, Sutton, Richard S., Modayil, Joseph, Bhatnagar, Shalabh

Neural Information Processing SystemsDec-31-2014

We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the {\it universal option model (UOM)}. We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function. We extend the UOM to linear function approximation, and we show it gives the TD solution of option returns and value functions of policies over options. We provide a stochastic approximation algorithm for incrementally learning UOMs from data and prove its consistency. We demonstrate our method in two domains. The first domain is document recommendation, where each user query defines a new reward function and a document's relevance is the expected return of a simulated random-walk through the document's references. The second domain is a real-time strategy game, where the controller must select the best game unit to accomplish dynamically-specified tasks. Our experiments show that UOMs are substantially more efficient in evaluating option returns and policies than previously known methods.

artificial intelligence, computer game, reward function, (21 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.28)
Asia > India (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

On Minimax Optimal Offline Policy Evaluation

Li, Lihong, Munos, Remi, Szepesvari, Csaba

arXiv.org Artificial IntelligenceSep-12-2014

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax risk lower bound, and analyze the risk of two standard estimators. It is shown, and verified in simulation, that one is minimax optimal up to a constant, while another can be arbitrarily worse, despite its empirical success and popularity. The results are applied to related problems in contextual bandits and fixed-horizon Markov decision processes, and are also related to semi-supervised learning.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1409.3653

Country: North America > Canada > Alberta (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback