Goto

Collaborating Authors

 Bartók, Gábor


Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

arXiv.org Machine Learning

We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial optimization under semi-bandit feedback, where a learner sequentially selects its actions from a combinatorial decision set so as to minimize its cumulative loss. In particular, we show that the well-known Follow-the-Perturbed-Leader (FPL) prediction method coupled with Geometric Resampling yields the first computationally efficient reduction from offline to online optimization in this setting. We provide a thorough theoretical analysis for the resulting algorithm, showing that its performance is on par with previous, inefficient solutions. Our main contribution is showing that, despite the relatively large variance induced by the GR procedure, our performance guarantees hold with high probability rather than only in expectation. As a side result, we also improve the best known regret bounds for FPL in online combinatorial optimization with full feedback, closing the perceived performance gap between FPL and exponential weights in this setting.


Incentivizing Users for Balancing Bike Sharing Systems

AAAI Conferences

Bike sharing systems have been recently adopted by a growing number of cities as a new means of transportation offering citizens a flexible, fast and green alternative for mobility. Users can pick up or drop off the bicycles at a station of their choice without prior notice or time planning. This increased flexibility comes with the challenge of unpredictable and fluctuating demand as well as irregular flow patterns of the bikes. As a result, these systems can incur imbalance problems such as the unavailability of bikes or parking docks at stations. In this light, operators deploy fleets of vehicles which re-distribute the bikes in order to guarantee a desirable service level. Can we engage the users themselves to solve the imbalance problem in bike sharing systems? In this paper, we address this question and present a crowdsourcing mechanism that incentivizes the users in the bike repositioning process by providing them with alternate choices to pick or return bikes in exchange for monetary incentives. We design the complete architecture of the incentives system which employs optimal pricing policies using the approach of regret minimization in online learning. We investigate the incentive compatibility of our mechanism and extensively evaluate it through simulations based on data collected via a survey study. Finally, we deployed the proposed system through a smartphone app among users of a large scale bike sharing system operated by a public transport company, and we provide results from this experimental deployment. To our knowledge, this is the first dynamic incentives system for bikes re-distribution ever deployed in a real-world bike sharing system.


Efficient Partial Monitoring with Prior Information

Neural Information Processing Systems

Partial monitoring is a general model for online learning with limited feedback: a learner chooses actions in a sequential manner while an opponent chooses outcomes. In every round, the learner suffers some loss and receives some feedback based on the action and the outcome. The goal of the learner is to minimize her cumulative loss. Applications range from dynamic pricing to label-efficient prediction to dueling bandits. In this paper, we assume that we are given some prior information about the distribution based on which the opponent generates the outcomes. We propose BPM, a family of new efficient algorithms whose core is to track the outcome distribution with an ellipsoid centered around the estimated distribution. We show that our algorithm provably enjoys near-optimal regret rate for locally observable partial-monitoring problems against stochastic opponents. As demonstrated with experiments on synthetic as well as real-world data, the algorithm outperforms previous approaches, even for very uninformed priors, with an order of magnitude smaller regret and lower running time.


Toward a Classification of Finite Partial-Monitoring Games

arXiv.org Machine Learning

Partial-monitoring games constitute a mathematical framework for sequential decision making problems with imperfect feedback: The learner repeatedly chooses an action, opponent responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his total cumulative loss. We make progress towards the classification of these games based on their minimax expected regret. Namely, we classify almost all games with two outcomes and finite number of actions: We show that their minimax expected regret is either zero, $\widetilde{\Theta}(\sqrt{T})$, $\Theta(T^{2/3})$, or $\Theta(T)$ and we give a simple and efficiently computable classification of these four classes of games. Our hope is that the result can serve as a stepping stone toward classifying all finite partial-monitoring games.