Goto

Collaborating Authors

 Basu, Soumya


Competing Bandits in Decentralized Large Contextual Matching Markets

arXiv.org Machine Learning

Matching markets have become increasingly relevant in a variety of modern applications, including but not limited to school admissions, organ transplantation, and job matching. Traditionally, these markets were studied under the assumption that the demand side agents (aka players or agents) and the supply side agents (aka arms) have fixed, known preferences, allowing for stable matching via the deferred acceptance algorithm like the Gale-Shapley algorithm introduced in Gale and Shapley (1962). However, in applications like crowdsourcing, online labor markets, and finance, the preferences are not given to the agents, and they must learn it over time by interacting with the environment. Modeling the matching market as multi-agent, multi-armed competitive bandits, there has been extensive work on various aspects, including coordinated centralized matching, decentralized matching, and game-theoretic analysis (see Liu et al. (2020); Basu et al. (2021); Sankararaman et al. (2021); Etesami and Srikant (2024)). In this paper, we study a large and dynamic matching market where the number of arms K is large and often exceeds the number of agents N( K).


Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

arXiv.org Artificial Intelligence

Recommendation systems for suggesting items to users are commonplace in online services such as marketplaces, content delivery platforms and ad placement systems. Such systems, over time, learn from user feedback, and improve their recommendations. An important caveat, however, is that both the distribution of user types and their respective preferences change over time, thus inducing changes in the optimal recommendation and requiring the system to periodically "reset" its learning. We consider systems with known change-points (aka episodes) in the distribution of user-features and preferences. Examples include seasonality in product recommendations where there are marked changes in interests based on time-of-year, or ad-placements based on time-of-day. While a baseline strategy would be to re-learn the recommendation algorithm in each episode, it is often advantageous to share some learning across episodes. Specifically, one often has access to (potentially, a very) large number of pre-trained recommendation algorithms (aka experts), and the goal then is to quickly determine (in an online manner) which expert is best suited to a specific episode.


Double Auctions with Two-sided Bandit Feedback

arXiv.org Artificial Intelligence

Double Auction enables decentralized transfer of goods between multiple buyers and sellers, thus underpinning functioning of many online marketplaces. Buyers and sellers compete in these markets through bidding, but do not often know their own valuation a-priori. As the allocation and pricing happens through bids, the profitability of participants, hence sustainability of such markets, depends crucially on learning respective valuations through repeated interactions. We initiate the study of Double Auction markets under bandit feedback on both buyers' and sellers' side. We show with confidence bound based bidding, and `Average Pricing' there is an efficient price discovery among the participants. In particular, the regret on combined valuation of the buyers and the sellers -- a.k.a. the social regret -- is $O(\log(T)/\Delta)$ in $T$ rounds, where $\Delta$ is the minimum price gap. Moreover, the buyers and sellers exchanging goods attain $O(\sqrt{T})$ regret, individually. The buyers and sellers who do not benefit from exchange in turn only experience $O(\log{T}/ \Delta)$ regret individually in $T$ rounds. We augment our upper bound by showing that $\omega(\sqrt{T})$ individual regret, and $\omega(\log{T})$ social regret is unattainable in certain Double Auction markets. Our paper is the first to provide decentralized learning algorithms in a two-sided market where \emph{both sides have uncertain preference} that need to be learned.


Robust Estimation of Tree Structured Markov Random Fields

arXiv.org Machine Learning

We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by unknown noise. As the presence of noise in the observations obfuscates the original tree structure, the extent of recoverability of the tree-structured MRFs under noisy observations is brought into question. We show that in a general noise model, the underlying tree structure can be recovered only up to an equivalence class where each of the leaf nodes is indistinguishable from its parent and siblings, forming a leaf cluster. As the indistinguishability arises due to contrived noise models, we study the natural k-ary symmetric channel noise model where the value of each node is changed to a uniform value in the support with an unequal and unknown probability. Here, the answer becomes much more nuanced. We show that with a support size of 2, and the binary symmetric channel noise model, the leaf clusters remain indistinguishable. From support size 3 and up, the recoverability of a leaf cluster is dictated by the joint probability mass function of the nodes within it. We provide a precise characterization of recoverability by deriving a necessary and sufficient condition for the recoverability of a leaf cluster. We provide an algorithm that recovers the tree if this condition is satisfied, and recovers the tree up to the leaf clusters failing this condition.


On Generalization of Adaptive Methods for Over-parameterized Linear Regression

arXiv.org Machine Learning

Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on these areas in the quest to understand -- why do neural networks generalize well? The setting of over-parameterized linear regression has provided key insights into understanding this mysterious behavior of neural networks. In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting. First, we focus on two sub-classes of adaptive methods depending on their generalization performance. For the first class of adaptive methods, the parameter vector remains in the span of the data and converges to the minimum norm solution like gradient descent (GD). On the other hand, for the second class of adaptive methods, the gradient rotation caused by the pre-conditioner matrix results in an in-span component of the parameter vector that converges to the minimum norm solution and the out-of-span component that saturates. Our experiments on over-parameterized linear regression and deep neural networks support this theory.


Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation

arXiv.org Machine Learning

We study regret minimization problems in a two-sided matching market where uniformly valued demand side agents (a.k.a. agents) continuously compete for getting matched with supply side agents (a.k.a. arms) with unknown and heterogeneous valuations. Such markets abstract online matching platforms (for e.g. UpWork, TaskRabbit) and falls within the purview of matching bandit models introduced in Liu et al. \cite{matching_bandits}. The uniform valuation in the demand side admits a unique stable matching equilibrium in the system. We design the first decentralized algorithm - \fullname\; (\name), for matching bandits under uniform valuation that does not require any knowledge of reward gaps or time horizon, and thus partially resolves an open question in \cite{matching_bandits}. \name\; works in phases of exponentially increasing length. In each phase $i$, an agent first deletes dominated arms -- the arms preferred by agents ranked higher than itself. Deletion follows dynamic explore-exploit using UCB algorithm on the remaining arms for $2^i$ rounds. {Finally, the preferred arm is broadcast in a decentralized fashion to other agents through {\em pure exploitation} in $(N-1)K$ rounds with $N$ agents and $K$ arms.} Comparing the obtained reward with respect to the unique stable matching, we show that \name\; achieves $O(\log(T)/\Delta^2)$ regret in $T$ rounds, where $\Delta$ is the minimum gap across all agents and arms. We provide a (orderwise) matching regret lower-bound.


Blocking Bandits

arXiv.org Machine Learning

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem. Subsequently, we show that a simple greedy algorithm that plays the available arm with the highest reward is asymptotically $(1-1/e)$ optimal. When the rewards are unknown, we design a UCB based algorithm which is shown to have $c \log T + o(\log T)$ cumulative regret against the greedy algorithm, leveraging the free exploration of arms due to the unavailability. Finally, when all the delays are equal the problem reduces to Combinatorial Semi-bandits providing us with a lower bound of $c' \log T+ \omega(\log T)$.


Disentangling Mixtures of Epidemics on Graphs

arXiv.org Machine Learning

We consider the problem of learning the weighted edges of a mixture of two graphs from epidemic cascades. This is a natural setting in the context of social networks, where a post created by one user will not spread on the same graph if it is about basketball or if it is about politics. However, very little is known about whether this problem is solvable. To the best of our knowledge, we establish the first conditions under which this problem can be solved, and provide conditions under which the problem is provably not solvable. When the conditions are met, i.e. when the graphs are connected, with distinct edges, and have at least three edges, we give an efficient algorithm for learning the weights of both graphs with almost optimal sample complexity (up to log factors). We extend the results to the setting in which the priors of the mixture are unknown and obtain similar guarantees.