Goto

Collaborating Authors

 regret


Regret in Online Recommendation Systems

Neural Information Processing Systems

This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, arrives. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown, and the performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each of them regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.


On Regret with Multiple Best Arms

Neural Information Processing Systems

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be minimax optimal simultaneously over all hardness levels; and (2) our algorithm achieves a rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over all hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.


Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

Neural Information Processing Systems

Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel confidence intervals for multitask regression in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d.


Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

Neural Information Processing Systems

We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap.


Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Neural Information Processing Systems

We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market.At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret.


How Does Variance Shape the Regret in Contextual Bandits?

Neural Information Processing Systems

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax regret bounds, we show that the eluder dimension d_{\text{elu}} - a measure of the complexity of the function class - plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of \Omega( \sqrt{ \min (A, d_{\text{elu}}) \Lambda } d_{\text{elu}}) is unavoidable when d_{\text{elu}} \leq \sqrt{A T}, where A is the number of actions, T is the total number of rounds, and \Lambda is the total variance over T rounds. For the A\leq d_{\text{elu}} regime, we derive a nearly matching upper bound \tilde{O}( \sqrt{ A\Lambda } d_{\text{elu} }) for the special case where the variance is revealed at the beginning of each round.


Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning

arXiv.org Artificial Intelligence

Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance strongly. But which task to choose for transfer learning? Prior methods producing useful task rankings are infeasible for large source pools, as they require forward passes through all source language models. We overcome this by introducing Embedding Space Maps (ESMs), light-weight neural networks that approximate the effect of fine-tuning a language model. We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs. We find that applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively, while retaining high selection performance (avg. regret@5 score of 2.95).


Elicitation of Factored Utilities

AI Magazine

We provide a brief overview of recent direct preference elicitation methods: these methods ask users to answer (ideally, a small number of) queries regarding their preferences and use this information to recommend a feasible decision that would be (approximately) optimal given those preferences. We argue for the importance of assessing numerical utilities rather than qualitative preferences and survey several utility elicitation techniques from artificial intelligence, operations research, and conjoint analysis. Specifically, since the ability to make reasonable decisions on behalf of a user depends on that user's preferences over outcomes in the domain in question, AI systems must assess or estimate these preferences before making decisions. Designing effective preference assessment techniques to incorporate such user-specific considerations (that is, breaking the preference bottleneck) is one of the most important problems facing AI. In this brief survey, we focus on explicit elicitation techniques where a system actively queries a user to glean relevant preferences. Preference elicitation is difficult for two main reasons. First, many decision problems have exponentially sized outcome spaces, defined by the possible values of outcome attributes. As an illustrative example, consider sophisticated flight selection: possible outcomes are defined by attributes such as trip cost, departure time, return time, airline, number of connections, flight length, baggage weight limit, flight class, (the possibility of) lost luggage, flight delays, and other stochastic outcomes. An ideal decision support system should be able to use, for example, precise flight delay statistics and incorporate a user's relative tolerance for delays in making recommendations. Representing and eliciting preferences for all outcomes in a case like this is infeasible given the size of the outcome space. A second difficulty arises due to the fact that quantitative strength of preferences, or utility, is needed to trade off, for instance, the odds of flight delays with other attributes. Unfortunately, people are notoriously inept at quantifying their preferences with any degree of precision, adding to the challenges facing automated utility elicitation.