Goto

Collaborating Authors

 Li, Yingkai


Your College Dorm and Dormmates: Fair Resource Sharing with Externalities

Journal of Artificial Intelligence Research

We study a fair resource sharing problem, where a set of resources are to be shared among a group of agents. Each agent demands one resource and each resource can serve a limited number of agents. An agent cares about what resource they get as well as the externalities imposed by their mates, who share the same resource with them. Clearly, the strong notion of envy-freeness, where no agent envies another for their resource or mates, cannot always be achieved and we show that even deciding the existence of such a strongly envy-free assignment is an intractable problem. Hence, a more interesting question is whether (and in what situations) a relaxed notion of envy-freeness, the Pareto envyfreeness, can be achieved. Under this relaxed notion, an agent envies another only when they envy both the resource and the mates of the other agent. In particular, we are interested in a dorm assignment problem, where students are to be assigned to dorms with the same capacity and they have dichotomous preference over their dormmates. We show that when the capacity of each dorm is 2, a Pareto envy-free assignment always exists and we present a polynomial-time algorithm to compute such an assignment. Nevertheless, the result breaks immediately when the capacity increases to 3, in which case even Pareto envyfreeness cannot be guaranteed. In addition to the existential results, we also investigate the utility guarantees of (Pareto) envy-free assignments in our model.


Multinomial Logit Bandit with Low Switching Cost

arXiv.org Machine Learning

We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $\Omega(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.


Stochastic Linear Optimization with Adversarial Corruption

arXiv.org Machine Learning

The multi-armed bandit problem has been extensively studie d in computer science, operations research and economics since the seminal work of Robb ins (1952). It is a model designed for sequential decision-making in which a player c hooses at each time step amongst a finite set of available arms and receives a reward for the cho sen decision. The player's objective is to minimize the difference, called regret, betwee n the rewards she receives and the rewards accumulated by the best arm. The rewards of each arm i s drawn from a probability distribution in the stochastic multi-armed bandit problem; but in adversarial multi-armed bandit models, there is typically no assumption imposed on t he sequence of rewards received by the player. In recent work, Lykouris et al. (2018) introduce a model in wh ich an adversary could corrupt the stochastic reward generated by an arm pull. They provide an algorithm and show that the regret of this "middle ground" scenario degrad es smoothly with the amount of corruption injected by the adversary. Gupta et al. (2019) pr esent an alternative algorithm which gives a significant improvement.


Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

arXiv.org Machine Learning

Linear contextual bandit is a class of sequential decision making problems with important applications in recommendation systems, online advertising, healthcare, and other machine learning related tasks. While there is much prior research, tight regret bounds of linear contextual bandit with infinite action sets remain open. In this paper, we prove regret upper bound of $O(\sqrt{d^2T\log T})\times \mathrm{poly}(\log\log T)$ where $d$ is the domain dimension and $T$ is the time horizon. Our upper bound matches the previous lower bound of $\Omega(\sqrt{d^2 T\log T})$ up to iterated logarithmic terms.


Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

arXiv.org Machine Learning

We study the linear contextual bandit problem with finite action sets. When the problem dimension is $d$, the time horizon is $T$, and there are $n \leq 2^{d/2}$ candidate actions per time period, we (1) show that the minimax expected regret is $\Omega(\sqrt{dT \log T \log n})$ for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors. Our algorithmic result saves two $\sqrt{\log T}$ factors from previous analysis, and our information-theoretical lower bound also improves previous results by one $\sqrt{\log T}$ factor, revealing a regret scaling quite different from classical multi-armed bandits in which no logarithmic $T$ term is present in minimax regret. Our proof techniques include variable confidence levels and a careful analysis of layer sizes of SupLinUCB on the upper bound side, and delicately constructed adversarial sequences showing the tightness of elliptical potential lemmas on the lower bound side.


Implementation of Stochastic Quasi-Newton's Method in PyTorch

arXiv.org Machine Learning

In this paper, we implement the Stochastic Damped LBFGS (SdLBFGS) for stochastic non-convex optimization. We make two important modifications to the original SdLBFGS algorithm. First, by initializing the Hessian at each step using an identity matrix, the algorithm converges better than original algorithm. Second, by performing direction normalization we could gain stable optimization procedure without line search. Experiments on minimizing a 2D non-convex function shows that our improved algorithm converges better than original algorithm, and experiments on the CIFAR10 and MNIST datasets show that our improved algorithm works stably and gives comparable or even better testing accuracies than first order optimizers SGD, Adagrad, and second order optimizers LBFGS in PyTorch.