Statistical Learning
Demographic Parity Constrained Minimax Optimal Regression under Linear Model
We explore the minimax optimal error associated with a demographic parityconstrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder [6]. Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by ฮ(dM/n), where ndenotes the sample size, d represents the dimensionality, and M signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.
Demographic Parity Constrained Minimax Optimal Regression under Linear Model
We explore the minimax optimal error associated with a demographic parityconstrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder [6]. Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by ฮ(dM/n), where ndenotes the sample size, d represents the dimensionality, and M signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.
Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as XTransformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XRTransformer takes significantly less training time compared to other transformerbased XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%. Our code is publicly available at https://github.com/amzn/pecos.
lower bound
While there remains a small gap between our main lower bound of Theorem 3 and the deterministic quantised gradient descent of Section 6, we can show that the gap cannot be closed by improved deterministic algorithms where the coordinator learns value of objective function F(x) in addition to the minimiser x. That is, our quantised gradient descent is the communication-optimal deterministic algorithm for variant (1) for objectives with constant condition number. Recall that in the N-player equality over universe of size d, denoted by EQd,N, each player i is given an input bi 2{ 0,1}d, and the task is to decide if all players have the same input. It is known [33] that the deterministic communication complexity of EQd,N is CC(EQd,N)= ( Nd). Theorem 8. Given parameters N, d, ", 0 and = 0N satisfying d /" = (1), any deterministic protocol solving (1) for quadratic input functions x 7! 0kx x0k22 has communication complexity Nd log( d/"), if the coordinator is also required to output estimate r 2 R for the minimum function value such that Assume is a deterministic protocol solving (1) with communication complexity C .We show that can then solve N-party equality over a universe of size D = ( dlog( d/")), implying C = ( ND)= Nd log( d/") . More specifically, let S be the set given by Lemma 2 with =(2 "/)1/2, and let D = dlog|S|e = (dlog( d/")). Note that since we assume d /" = (1), the set S has at least two elements and D 1.
Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees
We consider the task of training machine learning models with data-dependent constraints. Such constraints often arise as empirical versions of expected value constraints that enforce fairness or stability goals. We reformulate data-dependent constraints so that they are calibrated: enforcing the reformulated constraints guarantees that their expected value counterparts are satisfied with a user-prescribed probability. The resulting optimization problem is amendable to standard stochastic optimization algorithms, and we demonstrate the efficacy of our method on a fairness-sensitive classification task where we wish to guarantee the classifier's fairness (at test time).