information feedback
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (3 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (3 more...)
Decentralized and Uncoordinated Learning of Stable Matchings: A Game-Theoretic Approach
Etesami, S. Rasoul, Srikant, R.
We consider the problem of learning stable matchings with unknown preferences in a decentralized and uncoordinated manner, where "decentralized" means that players make decisions individually without the influence of a central platform, and "uncoordinated" means that players do not need to synchronize their decisions using pre-specified rules. First, we provide a game formulation for this problem with known preferences, where the set of pure Nash equilibria (NE) coincides with the set of stable matchings, and mixed NE can be rounded to a stable matching. Then, we show that for hierarchical markets, applying the exponential weight (EXP) learning algorithm to the stable matching game achieves logarithmic regret in a fully decentralized and uncoordinated fashion. Moreover, we show that EXP converges locally and exponentially fast to a stable matching in general markets. We also introduce another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability. Finally, we provide stronger feedback conditions under which it is possible to drive the market faster toward an approximate stable matching. Our proposed gametheoretic framework bridges the discrete problem of learning stable matchings with the problem of learning NE in continuous-action games. Index Terms Learning stable matchings, learning Nash equilibrium, weakly acyclic games, monotone games, decentralized learning, uncoordinated learning, online mirror descent, regret minimization. Learning stable matchings is one of the fundamental problems in computer science, economics, and engineering that has received considerable attention over the past decades. Stable matchings provide a desirable notion of stability in two-sided matching markets where agents on each side of the market have preferences over the other side.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games
Li, Tao, Hammar, Kim, Stadler, Rolf, Zhu, Quanyan
Asymmetric information stochastic games (\textsc{aisg}s) arise in many complex socio-technical systems, such as cyber-physical systems and IT infrastructures. Existing computational methods for \textsc{aisg}s are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to special classes of \textsc{aisg}s to avoid belief hierarchies. To address these limitations, we propose conjectural online learning (\textsc{col}), an online method for generic \textsc{aisg}s. \textsc{col} uses a forecaster-actor-critic (\textsc{fac}) architecture where subjective forecasts are used to conjecture the opponents' strategies within a lookahead horizon, and Bayesian learning is used to calibrate the conjectures. To adapt strategies to nonstationary environments, \textsc{col} uses online rollout with cost function approximation (actor-critic). We prove that the conjectures produced by \textsc{col} are asymptotically consistent with the information feedback in the sense of a relaxed Bayesian consistency. We also prove that the empirical strategy profile induced by \textsc{col} converges to the Berk-Nash equilibrium, a solution concept characterizing rationality under subjectivity. Experimental results from an intrusion response use case demonstrate \textsc{col}'s superiority over state-of-the-art reinforcement learning methods against nonstationary attacks.
- North America > United States > New York (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Leisure & Entertainment > Games (0.88)
- Education > Educational Setting > Online (0.61)
Dynamic Budget Throttling in Repeated Second-Price Auctions
Chen, Zhaohua, Wang, Chang, Wang, Qian, Pan, Yuqi, Shi, Zhuming, Cai, Zheng, Ren, Yukun, Zhu, Zhihua, Deng, Xiaotie
In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Marketing (1.00)
- Information Technology > Services (1.00)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Communications (0.93)
On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks
Ai, Rui, Chen, Zhaohua, Deng, Xiaotie, Pan, Yuqi, Wang, Chang, Yang, Mingwei
In the problem of (binary) contextual bandits with knapsacks (CBwK), the agent receives an i.i.d. context in each of the $T$ rounds and chooses an action, resulting in a random reward and a random consumption of resources that are related to an i.i.d. external factor. The agent's goal is to maximize the accumulated reward under the initial resource constraints. In this work, we combine the re-solving heuristic, which proved successful in revenue management, with distribution estimation techniques to solve this problem. We consider two different information feedback models, with full and partial information, which vary in the difficulty of getting a sample of the external factor. Under both information feedback settings, we achieve two-way results: (1) For general problems, we show that our algorithm gets an $\widetilde O(T^{\alpha_u} + T^{\alpha_v} + T^{1/2})$ regret against the fluid benchmark. Here, $\alpha_u$ and $\alpha_v$ reflect the complexity of the context and external factor distributions, respectively. This result is comparable to existing results. (2) When the fluid problem is linear programming with a unique and non-degenerate optimal solution, our algorithm leads to an $\widetilde O(1)$ regret. To the best of our knowledge, this is the first $\widetilde O(1)$ regret result in the CBwK problem regardless of information feedback models. We further use numerical experiments to verify our results.
Distributed Online Optimization with Long-Term Constraints
Yuan, Deming, Proutiere, Alexandre, Shi, Guodong
We consider distributed online convex optimization problems, where the distributed system consists of various computing units connected through a time-varying communication graph. In each time step, each computing unit selects a constrained vector, experiences a loss equal to an arbitrary convex function evaluated at this vector, and may communicate to its neighbors in the graph. The objective is to minimize the system-wide loss accumulated over time. We propose a decentralized algorithm with regret and cumulative constraint violation in $\mathcal{O}(T^{\max\{c,1-c\} })$ and $\mathcal{O}(T^{1-c/2})$, respectively, for any $c\in (0,1)$, where $T$ is the time horizon. When the loss functions are strongly convex, we establish improved regret and constraint violation upper bounds in $\mathcal{O}(\log(T))$ and $\mathcal{O}(\sqrt{T\log(T)})$. These regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions). In the case of bandit feedback, the proposed algorithms achieve a regret and constraint violation in $\mathcal{O}(T^{\max\{c,1-c/3 \} })$ and $\mathcal{O}(T^{1-c/2})$ for any $c\in (0,1)$. We numerically illustrate the performance of our algorithms for the particular case of distributed online regularized linear regression problems.
Distributed Online Linear Regression
Yuan, Deming, Proutiere, Alexandre, Shi, Guodong
We study online linear regression problems in a distributed setting, where the data is spread over a network. In each round, each network node proposes a linear predictor, with the objective of fitting the \emph{network-wide} data. It then updates its predictor for the next round according to the received local feedback and information received from neighboring nodes. The predictions made at a given node are assessed through the notion of regret, defined as the difference between their cumulative network-wide square errors and those of the best off-line network-wide linear predictor. Various scenarios are investigated, depending on the nature of the local feedback (full information or bandit feedback), on the set of available predictors (the decision set), and the way data is generated (by an oblivious or adaptive adversary). We propose simple and natural distributed regression algorithms, involving, at each node and in each round, a local gradient descent step and a communication and averaging step where nodes aim at aligning their predictors to those of their neighbors. We establish regret upper bounds typically in ${\cal O}(T^{3/4})$ when the decision set is unbounded and in ${\cal O}(\sqrt{T})$ in case of bounded decision set.
- Oceania > Australia > New South Wales > Sydney (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (2 more...)
Online Newton Step Algorithm with Estimated Gradient
Liu, Binbin, Li, Jundong, Song, Yunquan, Liang, Xijun, Jian, Ling, Liu, Huan
They have shown to be effective in handling large-scale and high-velocity streaming data and emerged to become popular in the big data era Hoi et al. [2018, 2014]. In recent years, a number of effective online learning algorithms have been investigated and applied in a variety of high impact domains,ranging from game theory, information theory to machine learning and data mining Ding et al. [2017], Shalev-Shwartz [2011], Wang et al. [2003]. Most previously proposed online learning algorithms fall into the wellestablished frameworkof online convex optimization Gordon [1999], Zinkevich [2003]. In terms of the optimization algorithms, online learning algorithms can be grouped into the following categories: (i) first-order algorithms which aim to optimize the objective function using the first-order (sub) gradient such as the well-known OGD algorithm Zinkevich [2003]; and (ii) second-order algorithms which aim to exploit second-order information to speed up the convergence of the optimization, such as the ONS algorithm Hazan et al. [2007]. In online convex optimization, previous approaches are mainly based on the first-order optimization, i.e., optimization using the first-order derivative of the cost function. Theregret bound achieved by these algorithms is proportional to the polynomial of the number of rounds T . For example, Zinkevich [2003] showed that with the simple OGD, we can achieve the regret bound of O( T). Later on, Hazan et al. [2007] introduced a new algorithm with ONS by exploiting the second-order derivative of the cost function, which can be viewed as an online 2 analogy of the Newton-Raphson method Ypma and Tjalling [1995] in the offline learning.Although the time complexity O(d
- Asia > China (0.04)
- North America > United States > Arizona (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning
Cesa-Bianchi, Nicolò, Gaillard, Pierre, Gentile, Claudio, Gerchinovitz, Sébastien
We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz and semi-Lipschitz losses with regret bounds improving on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Marketing (0.67)
- Information Technology > Services (0.46)