AITopics | Szepesvári, Csaba

Collaborating Authors

Szepesvári, Csaba

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

Kuzborskij, Ilja, Vernade, Claire, György, András, Szepesvári, Csaba

arXiv.org Machine LearningNov-1-2020

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies. We propose a new method to compute a lower bound on the value of an arbitrary target policy given some logged data in contextual bandits for a desired coverage. The lower bound is built around the so-called Self-normalized Importance Weighting (SN) estimator. It combines the use of a semi-empirical Efron-Stein tail inequality to control the concentration and Harris' inequality to control the bias. The new approach is evaluated on a number of synthetic and real datasets and is found to be superior to its main competitors, both in terms of tightness of the confidence intervals and the quality of the policies chosen.

artificial intelligence, data mining, estimator, (13 more...)

arXiv.org Machine Learning

2006.1046

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

On Optimality of Meta-Learning in Fixed-Design Regression with Weighted Biased Regularization

Konobeev, Mikhail, Kuzborskij, Ilja, Szepesvári, Csaba

arXiv.org Machine LearningOct-31-2020

We consider a fixed-design linear regression in the meta-learning model of Baxter (2000) and establish a problem-dependent finite-sample lower bound on the transfer risk (risk on a newly observed task) valid for all estimators. Moreover, we prove that a weighted form of a biased regularization - a popular technique in transfer and meta-learning - is optimal, i.e. it enjoys a problem-dependent upper bound on the risk matching our lower bound up to a constant. Thus, our bounds characterize meta-learning linear regression problems and reveal a fine-grained dependency on the task structure. Our characterization suggests that in the non-asymptotic regime, for a sufficiently large number of tasks, meta-learning can be considerably superior to a single-task learning. Finally, we propose a practical adaptation of the optimal estimator through Expectation-Maximization procedure and show its effectiveness in series of experiments.

artificial intelligence, machine learning, null, (17 more...)

arXiv.org Machine Learning

2011.00344

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

Verma, Arun, Hanawal, Manjesh K., Szepesvári, Csaba, Saligrama, Venkatesh

arXiv.org Machine LearningOct-23-2020

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learner's goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called 'Contextual Weak Dominance' (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.

crowdsourcing, probability, social media, (21 more...)

arXiv.org Machine Learning

2010.12353

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

CoinDICE: Off-Policy Confidence Interval Estimation

Dai, Bo, Nachum, Ofir, Chow, Yinlam, Li, Lihong, Szepesvári, Csaba, Schuurmans, Dale

arXiv.org Machine LearningOct-22-2020

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the $Q$-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.

artificial intelligence, optimization problem, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2010.11652

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Weisz, Gellert, Amortila, Philip, Szepesvári, Csaba

arXiv.org Artificial IntelligenceOct-3-2020

We consider the problem of local planning in fixed-horizon Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exists sound planners that need only poly(H, d) queries regardless of the MDP, where H is the horizon and d is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least min(exp({\Omega}(d)), {\Omega}(2^H)) samples. We also show that for any {\delta}>0, the least-squares value iteration algorithm with O(H^5d^(H+1)/{\delta}^2) queries can compute a {\delta}-optimal policy. We discuss implications and remaining open questions.

artificial intelligence, machine learning, mdp, (17 more...)

arXiv.org Artificial Intelligence

2010.01374

Country:

North America > Canada > Alberta (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Tighter risk certificates for neural networks

Pérez-Ortiz, María, Rivasplata, Omar, Shawe-Taylor, John, Szepesvári, Csaba

arXiv.org Machine LearningAug-12-2020

This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates that are valid on any unseen examples for the learnt predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of certifying the risk on any unseen data without the need for data-splitting protocols.

deep learning, neural network, risk certificate, (19 more...)

arXiv.org Machine Learning

2007.12911

Country:

Europe > United Kingdom > England (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Shariff, Roshan, Szepesvári, Csaba

arXiv.org Machine LearningJul-13-2020

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.

approx, fuzzy logic, optimization problem, (22 more...)

arXiv.org Machine Learning

2007.06184

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

Hu, Xiaowei, A., Prashanth L., György, András, Szepesvári, Csaba

arXiv.org Machine LearningJul-4-2020

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of ``noise'' in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In this paper we propose a novel framework that replaces the specific gradient estimation methods with an abstract oracle. With the help of the new framework we unify previous works, reproducing their results in a clean and concise fashion, while, perhaps more importantly, the framework also allows us to formally show that to achieve the optimal root-$n$ rate either the algorithms that use existing gradient estimators, or the proof techniques used to analyze them have to go beyond what exists today.

educational setting, optimization problem, oracle, (17 more...)

arXiv.org Machine Learning

1609.07087

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.81)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

Multi-Step Dyna Planning for Policy Evaluation and Control

Yao, Hengshuai, Bhatnagar, Shalabh, Diao, Dongcui, Sutton, Richard S., Szepesvári, Csaba

Neural Information Processing SystemsFeb-15-2020, 04:12:13 GMT

We extend Dyna planning architecture for policy evaluation and control in two significant aspects. First, we introduce a multi-step Dyna planning that projects the simulated state/feature many steps into the future. Our multi-step Dyna is based on a multi-step model, which we call the {\em $\lambda$-model}. The $\lambda$-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online. Second, we use for Dyna control a dynamic multi-step model that is able to predict the results of a sequence of greedy actions and track the optimal policy in the long run.

artificial intelligence, multi-step dyna planning, policy evaluation and control, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.35)

Add feedback

Regularized Policy Iteration

Farahmand, Amir M., Ghavamzadeh, Mohammad, Mannor, Shie, Szepesvári, Csaba

Neural Information Processing SystemsFeb-15-2020, 01:42:30 GMT

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.

artificial intelligence, regularized policy iteration, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback