model
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
The divergence of the Q-value estimation has been a prominent issue offline reinforcement learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs attribute this instability to querying out-of-distribution actions when bootstrapping value targets. Though this issue can be alleviated with policy constraints or conservative Q estimation, a theoretical understanding of the underlying mechanism causing the divergence has been absent. In this work, we aim to thoroughly comprehend this mechanism and attain an improved solution. We first identify a fundamental pattern, \emph{self-excitation}, as the primary cause of Q-value estimation divergence in offline RL.
Counterfactual Fairness by Combining Factual and Counterfactual Predictions
In high-stakes domains such as healthcare and hiring, the role of machine learning (ML) in decision-making raises significant fairness concerns. This work focuses on Counterfactual Fairness (CF), which posits that an ML model's outcome on any individual should remain unchanged if they had belonged to a different demographic group.Previous works have proposed methods that guarantee CF. Notwithstanding, their effects on the model's predictive performance remain largely unclear.To fill this gap, we provide a theoretical study on the inherent trade-off between CF and predictive performance in a model-agnostic manner. We first propose a simple but effective method to cast an optimal but potentially unfair predictor into a fair one with a minimal loss of performance.By analyzing the excess risk incurred by perfect CF, we quantify this inherent trade-off. Further analysis on our method's performance with access to only incomplete causal knowledge is also conducted. Built upon this, we propose a practical algorithm that can be applied in such scenarios. Experiments on both synthetic and semi-synthetic datasets demonstrate the validity of our analysis and methods.
Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions
Pruning is one of the predominant approaches for compressing deep neural networks (DNNs). Lately, coresets (provable data summarizations) were leveraged for pruning DNNs, adding the advantage of theoretical guarantees on the trade-off between the compression rate and the approximation error. However, coresets in this domain were either data dependant or generated under restrictive assumptions on both the model's weights and inputs. In real-world scenarios, such assumptions are rarely satisfied, limiting the applicability of coresets. To this end, we suggest a novel and robust framework for computing such coresets under mild assumptions on the model's weights and without any assumption on the training data. The idea is to compute the importance of each neuron in each layer with respect to the output of the following layer. This is achieved by an elegant combination of L\{o}wner ellipsoid and Caratheodory theorem.Our method is simultaneously data-independent, applicable to various networks and datasets (due to the simplified assumptions), and theoretically supported. Experimental results show that our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets. For example, our method achieved a $62\%$ compression rate on ResNet50 on ImageNet with $1.09\%$ drop in accuracy.
MaNo: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts
Leveraging the model's outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground-truth labels.Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift.
Training Neural Networks with Fixed Sparse Masks
During typical gradient-based training of deep neural networks, all of the model's parameters are updated at each iteration. Recent work has shown that it is possible to update only a small subset of the model's parameters during training, which can alleviate storage and communication requirements. In this paper, we show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations. Our method constructs the mask out of the $k$ parameters with the largest Fisher information as a simple approximation as to which parameters are most important for the task at hand. In experiments on parameter-efficient transfer learning and distributed training, we show that our approach matches or exceeds the performance of other methods for training with sparse updates while being more efficient in terms of memory usage and communication costs. We release our code publicly to promote further applications of our approach.
Understanding the detrimental class-level effects of data augmentation
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, we identify other sources of accuracy degradations by analyzing class confusions. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.
Towards Multi-Grained Explainability for Graph Neural Networks
When a graph neural network (GNN) made a prediction, one raises question about explainability: "Which fraction of the input graph is most influential to the model's decision?" Producing an answer requires understanding the model's inner workings in general and emphasizing the insights on the decision for the instance at hand. Nonetheless, most of current approaches focus only on one aspect: (1) local explainability, which explains each instance independently, thus hardly exhibits the class-wise patterns; and (2) global explainability, which systematizes the globally important patterns, but might be trivial in the local context. This dichotomy limits the flexibility and effectiveness of explainers greatly. A performant paradigm towards multi-grained explainability is until-now lacking and thus a focus of our work. In this work, we exploit the pre-training and fine-tuning idea to develop our explainer and generate multi-grained explanations. Specifically, the pre-training phase accounts for the contrastivity among different classes, so as to highlight the class-wise characteristics from a global view; afterwards, the fine-tuning phase adapts the explanations in the local context. Experiments on both synthetic and real-world datasets show the superiority of our explainer, in terms of AUC on explaining graph classification over the leading baselines. Our codes and datasets are available at https://github.com/Wuyxin/ReFine.
Supplementary Material: Einsum Benchmark Mark Blacher
For what purpose was the dataset created? The dataset was created with two primary purposes. First, it serves as a benchmark for einsum libraries, enabling the assessment of both the efficiency in determining contraction paths and the performance in executing einsum expressions. The dataset instances were created by the authors. Who funded the creation of the dataset?
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.47)
- Media (0.46)
- Asia > Singapore (0.04)
- South America (0.04)
- North America (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law > Criminal Law (0.68)
- Government (0.68)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)