nsw
- North America > United States > California (0.14)
- North America > United States > Iowa (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
No-Regret Learning for Fair Multi-Agent Social Welfare Optimization
We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings.
Cross-Validated Causal Inference: a Modern Method to Combine Experimental and Observational Data
Yang, Xuelin, Lin, Licong, Athey, Susan, Jordan, Michael I., Imbens, Guido W.
We develop new methods to integrate experimental and observational data in causal inference. While randomized controlled trials offer strong internal validity, they are often costly and therefore limited in sample size. Observational data, though cheaper and often with larger sample sizes, are prone to biases due to unmeasured confounders. To harness their complementary strengths, we propose a systematic framework that formulates causal estimation as an empirical risk minimization (ERM) problem. A full model containing the causal parameter is obtained by minimizing a weighted combination of experimental and observational losses--capturing the causal parameter's validity and the full model's fit, respectively. The weight is chosen through cross-validation on the causal parameter across experimental folds. Our experiments on real and synthetic data show the efficacy and reliability of our method. We also provide theoretical non-asymptotic error bounds.
- Asia > Middle East > Jordan (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
- North America > United States > California (0.14)
- North America > United States > Iowa (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Appendix A Lower Bound In this section, we establish a lower bound on the expected regret of any algorithm for our multi-agent
Our goal in this section is twofold. To avoid excessive repetition of notation and proof arguments, we purposefully leave this section not self-contained and only outline these adjustments needed. We refer an interested reader to the work of Auer et al. We leave it to future work to optimize the dependence on N and K . In our MA-MAB problem, an algorithm is allowed to "pull" a distribution over the arms In the proof of their Lemma A.1, in the explanation of their Equation (30), they cite the assumption that given the rewards observed in the first Finally, in the proof of their Theorem A.2, they again consider the probability Thus, we have the following lower bound.
- North America > Canada > Ontario > Toronto (0.28)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding
Li, Kaiyuan, Wang, Pengyu, Peng, Yunshan, Yuan, Pengjia, Zeng, Yanxiang, Xiang, Rui, Cheng, Yanhua, Liu, Xialong, Jiang, Peng
Reinforcement learning has been widely applied in automated bidding. Traditional approaches model bidding as a Markov Decision Process (MDP). Recently, some studies have explored using generative reinforcement learning methods to address long-term dependency issues in bidding environments. Although effective, these methods typically rely on supervised learning approaches, which are vulnerable to low data quality due to the amount of sub-optimal bids and low probability rewards resulting from the low click and conversion rates. Unfortunately, few studies have addressed these challenges. In this paper, we formalize the automated bidding as a sequence decision-making problem and propose a novel Expert-guided Bag Reward Transformer (EBaReT) to address concerns related to data quality and uncertainty rewards. Specifically, to tackle data quality issues, we generate a set of expert trajectories to serve as supplementary data in the training process and employ a Positive-Unlabeled (PU) learning-based discriminator to identify expert transitions. To ensure the decision also meets the expert level, we further design a novel expert-guided inference strategy. Moreover, to mitigate the uncertainty of rewards, we consider the transitions within a certain period as a "bag" and carefully design a reward function that leads to a smoother acquisition of rewards. Extensive experiments demonstrate that our model achieves superior performance compared to state-of-the-art bidding methods.
No-Regret Learning for Fair Multi-Agent Social Welfare Optimization
We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that \sqrt{T} -regret is possible after T rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. We then consider a more challenging version of the problem with adversarial rewards.
Optimizing Electric Vehicle Charging Station Locations: A Data-driven System with Multi-source Fusion
Li, Lihuan, Yin, Du, Xue, Hao, Lillo-Trynes, David, Salim, Flora
With the growing electric vehicles (EVs) charging demand, urban planners face the challenges of providing charging infrastructure at optimal locations. For example, range anxiety during long-distance travel and the inadequate distribution of residential charging stations are the major issues many cities face. To achieve reasonable estimation and deployment of the charging demand, we develop a data-driven system based on existing EV trips in New South Wales (NSW) state, Australia, incorporating multiple factors that enhance the geographical feasibility of recommended charging stations. Our system integrates data sources including EV trip data, geographical data such as route data and Local Government Area (LGA) boundaries, as well as features like fire and flood risks, and Points of Interest (POIs). We visualize our results to intuitively demonstrate the findings from our data-driven, multi-source fusion system, and evaluate them through case studies. The outcome of this work can provide a platform for discussion to develop new insights that could be used to give guidance on where to position future EV charging stations.
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
- Government > Regional Government > Oceania Government > Australia Government (0.49)