AITopics | Szepesvari, Csaba

TopRank: A practical algorithm for online stochastic ranking

Lattimore, Tor, Kveton, Branislav, Li, Shuai, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2018

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, and (d) outperforms existing algorithms empirically.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PAC-Bayes bounds for stable algorithms with instance-dependent priors

Rivasplata, Omar, Szepesvari, Csaba, Shawe-Taylor, John S., Parrado-Hernandez, Emilio, Sun, Shiliang

Neural Information Processing SystemsDec-31-2018

PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. In this paper the PAC-Bayes approach is combined with stability of the hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is used with a Gaussian prior centered at the expected output. Thus a novelty of our paper is using priors defined in terms of the data-generating distribution. Our main result estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients. We also provide a new bound for the SVM classifier, which is compared to other known bounds experimentally. Ours appears to be the first uniform hypothesis stability-based bound that evaluates to non-trivial values.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.67)

Add feedback

TopRank: A practical algorithm for online stochastic ranking

Lattimore, Tor, Kveton, Branislav, Li, Shuai, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2018

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, and (d) outperforms existing algorithms empirically.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Uesato, Jonathan, Kumar, Ananya, Szepesvari, Csaba, Erez, Tom, Ruderman, Avraham, Anderson, Keith, Krishmamurthy, null, Dvijotham, null, Heess, Nicolas, Kohli, Pushmeet

arXiv.org Machine LearningDec-4-2018

This paper addresses the problem of evaluating learning systems in safety critical domains such as autonomous driving, where failures can have catastrophic consequences. We focus on two problems: searching for scenarios when learned agents fail and assessing their probability of failure. The standard method for agent evaluation in reinforcement learning, Vanilla Monte Carlo, can miss failures entirely, leading to the deployment of unsafe agents. We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation. To address this shortcoming, we draw upon the rare event probability estimation literature and propose an adversarial evaluation approach. Our approach focuses evaluation on adversarially chosen situations, while still providing unbiased estimates of failure probabilities. The key difficulty is in identifying these adversarial situations -- since failures are rare there is little signal to drive optimization. To solve this we propose a continuation approach that learns failure modes in related but less robust agents. Our approach also allows reuse of data already collected for training the agent. We demonstrate the efficacy of adversarial evaluation on two standard domains: humanoid control and simulated driving. Experimental results show that our methods can find catastrophic failures and estimate failures rates of agents multiple orders of magnitude faster than standard evaluation schemes, in minutes to hours rather than days.

agent, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1812.01647

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.67)
Automobiles & Trucks (0.48)
Transportation > Ground > Road (0.34)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

Kveton, Branislav, Szepesvari, Csaba, Wen, Zheng, Ghavamzadeh, Mohammad, Lattimore, Tor

arXiv.org Machine LearningNov-13-2018

We propose a multi-armed bandit algorithm that explores based on randomizing its history. The key idea is to estimate the value of the arm from the bootstrap sample of its history, where we add pseudo observations after each pull of the arm. The pseudo observations seem to be harmful. But on the contrary, they guarantee that the bootstrap sample is optimistic with a high probability. Because of this, we call our algorithm Giro, which is an abbreviation for garbage in, reward out. We analyze Giro in a $K$-armed Bernoulli bandit and prove a $O(K \Delta^{-1} \log n)$ bound on its $n$-round regret, where $\Delta$ denotes the difference in the expected rewards of the optimal and best suboptimal arms. The main advantage of our exploration strategy is that it can be applied to any reward function generalization, such as neural networks. We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that Giro is comparable to or better than state-of-the-art algorithms.

bandit, big data, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

1811.05154

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.42)

Add feedback

PAC-Bayes bounds for stable algorithms with instance-dependent priors

Rivasplata, Omar, Parrado-Hernandez, Emilio, Shawe-Taylor, John, Sun, Shiliang, Szepesvari, Csaba

arXiv.org Machine LearningJun-18-2018

PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. In this paper the PAC-Bayes approach is combined with stability of the hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is used with a Gaussian prior centered at the expected output. Thus a novelty of our paper is using priors defined in terms of the data-generating distribution. Our main result estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients. We also provide a new bound for the SVM classifier, which is compared to other known bounds experimentally. Ours appears to be the first stability-based bound that evaluates to non-trivial values.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1806.06827

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.67)

Add feedback

BubbleRank: Safe Online Learning to Rerank

Kveton, Branislav, Li, Chang, Lattimore, Tor, Markov, Ilya, de Rijke, Maarten, Szepesvari, Csaba, Zoghi, Masrour

arXiv.org Machine LearningJun-15-2018

We study the problem of online learning to re-rank, where users provide feedback to improve the quality of displayed lists. Learning to rank has been traditionally studied in two settings. In the offline setting, rankers are typically learned from relevance labels of judges. These approaches have become the industry standard. However, they lack exploration, and thus are limited by the information content of offline data. In the online setting, an algorithm can propose a list and learn from the feedback on it in a sequential fashion. Bandit algorithms developed for this setting actively experiment, and in this way overcome the biases of offline data. But they also tend to ignore offline data, which results in a high initial cost of exploration. We propose BubbleRank, a bandit algorithm for re-ranking that combines the strengths of both settings. The algorithm starts with an initial base list and improves it gradually by swapping higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive numerical experiments on a large real-world click dataset.

bubblerank, computer based training, educational technology, (24 more...)

arXiv.org Machine Learning

1806.05819

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

TopRank: A practical algorithm for online stochastic ranking

Lattimore, Tor, Kveton, Branislav, Li, Shuai, Szepesvari, Csaba

arXiv.org Machine LearningJun-6-2018

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, (d) outperforms existing algorithms empirically.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1806.02248

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Lattimore, Tor, Szepesvari, Csaba

arXiv.org Machine LearningMay-23-2018

Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner. We complete the classification of finite adversarial partial monitoring to include all games, solving an open problem posed by Bartok et al. [2014]. Along the way we simplify and improve existing algorithms and correct errors in previous analyses. Our second contribution is a new algorithm for the class of games studied by Bartok [2013] where we prove upper and lower regret bounds that shed more light on the dependence of the regret on the game structure.

artificial intelligence, big data, probability, (18 more...)

arXiv.org Machine Learning

1805.09247

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Regret Bounds for Model-Free Linear Quadratic Control

Abbasi-Yadkori, Yasin, Lazic, Nevena, Szepesvari, Csaba

arXiv.org Machine LearningApr-16-2018

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based approaches. In this work, we present a model-free algorithm for controlling linear quadratic (LQ) systems, which is the simplest setting for continuous control and widely used in practice. Our approach is based on a reduction of the control of Markov decision processes to an expert prediction problem. We show that the algorithm regret scales as $O(T^{3/4})$, where $T$ is the number of rounds.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1804.06021

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Szepesvari, Csaba

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

TopRank: A practical algorithm for online stochastic ranking

PAC-Bayes bounds for stable algorithms with instance-dependent priors

TopRank: A practical algorithm for online stochastic ranking

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

PAC-Bayes bounds for stable algorithms with instance-dependent priors

BubbleRank: Safe Online Learning to Rerank

TopRank: A practical algorithm for online stochastic ranking

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Regret Bounds for Model-Free Linear Quadratic Control