Goto

Collaborating Authors

 Accuracy


OpenAUC: Towards AUC-Oriented Open-Set Recognition

arXiv.org Artificial Intelligence

Traditional machine learning follows a close-set assumption that the training and test set share the same label space. While in many practical scenarios, it is inevitable that some test samples belong to unknown classes (open-set). To fix this issue, Open-Set Recognition (OSR), whose goal is to make correct predictions on both close-set samples and open-set samples, has attracted rising attention. In this direction, the vast majority of literature focuses on the pattern of open-set samples. However, how to evaluate model performance in this challenging task is still unsolved. In this paper, a systematic analysis reveals that most existing metrics are essentially inconsistent with the aforementioned goal of OSR: (1) For metrics extended from close-set classification, such as Open-set F-score, Youden's index, and Normalized Accuracy, a poor open-set prediction can escape from a low performance score with a superior close-set prediction. (2) Novelty detection AUC, which measures the ranking performance between close-set and open-set samples, ignores the close-set performance. To fix these issues, we propose a novel metric named OpenAUC. Compared with existing metrics, OpenAUC enjoys a concise pairwise formulation that evaluates open-set performance and close-set performance in a coupling manner. Further analysis shows that OpenAUC is free from the aforementioned inconsistency properties. Finally, an end-to-end learning method is proposed to minimize the OpenAUC risk, and the experimental results on popular benchmark datasets speak to its effectiveness. Project Page: https://github.com/wang22ti/OpenAUC.


Drop Edges and Adapt: a Fairness Enforcing Fine-tuning for Graph Neural Networks

arXiv.org Artificial Intelligence

The rise of graph representation learning as the primary solution for many different network science tasks led to a surge of interest in the fairness of this family of methods. Link prediction, in particular, has a substantial social impact. However, link prediction algorithms tend to increase the segregation in social networks by disfavoring the links between individuals in specific demographic groups. This paper proposes a novel way to enforce fairness on graph neural networks with a fine-tuning strategy. We Drop the unfair Edges and, simultaneously, we Adapt the model's parameters to those modifications, DEA in short. We introduce two covariance-based constraints designed explicitly for the link prediction task. We use these constraints to guide the optimization process responsible for learning the new "fair" adjacency matrix. One novelty of DEA is that we can use a discrete yet learnable adjacency matrix in our fine-tuning. We demonstrate the effectiveness of our approach on five real-world datasets and show that we can improve both the accuracy and the fairness of the link prediction tasks. In addition, we present an in-depth ablation study demonstrating that our training algorithm for the adjacency matrix can be used to improve link prediction performances during training. Finally, we compute the relevance of each component of our framework to show that the combination of both the constraints and the training of the adjacency matrix leads to optimal performances.


ScaTE: A Scalable Framework for Self-Supervised Traversability Estimation in Unstructured Environments

arXiv.org Artificial Intelligence

For the safe and successful navigation of autonomous vehicles in unstructured environments, the traversability of terrain should vary based on the driving capabilities of the vehicles. Actual driving experience can be utilized in a self-supervised fashion to learn vehicle-specific traversability. However, existing methods for learning self-supervised traversability are not highly scalable for learning the traversability of various vehicles. In this work, we introduce a scalable framework for learning self-supervised traversability, which can learn the traversability directly from vehicle-terrain interaction without any human supervision. We train a neural network that predicts the proprioceptive experience that a vehicle would undergo from 3D point clouds. Using a novel PU learning method, the network simultaneously identifies non-traversable regions where estimations can be overconfident. With driving data of various vehicles gathered from simulation and the real world, we show that our framework is capable of learning the self-supervised traversability of various vehicles. By integrating our framework with a model predictive controller, we demonstrate that estimated traversability results in effective navigation that enables distinct maneuvers based on the driving characteristics of the vehicles. In addition, experimental results validate the ability of our method to identify and avoid non-traversable regions.


Stochastic Methods for AUC Optimization subject to AUC-based Fairness Constraints

arXiv.org Artificial Intelligence

As machine learning being used increasingly in making high-stakes decisions, an arising challenge is to avoid unfair AI systems that lead to discriminatory decisions for protected population. A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints, which achieves Pareto efficiency when trading off performance against fairness. Among various fairness metrics, the ones based on the area under the ROC curve (AUC) are emerging recently because they are threshold-agnostic and effective for unbalanced data. In this work, we formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints. This problem can be reformulated as a min-max optimization problem with min-max constraints, which we solve by stochastic first-order methods based on a new Bregman divergence designed for the special structure of the problem. We numerically demonstrate the effectiveness of our approach on real-world data under different fairness metrics.


STD: Stable Triangle Descriptor for 3D place recognition

arXiv.org Artificial Intelligence

In this work, we present a novel global descriptor termed stable triangle descriptor (STD) for 3D place recognition. For a triangle, its shape is uniquely determined by the length of the sides or included angles. Moreover, the shape of triangles is completely invariant to rigid transformations. Based on this property, we first design an algorithm to efficiently extract local key points from the 3D point cloud and encode these key points into triangular descriptors. Then, place recognition is achieved by matching the side lengths (and some other information) of the descriptors between point clouds. The point correspondence obtained from the descriptor matching pair can be further used in geometric verification, which greatly improves the accuracy of place recognition. In our experiments, we extensively compare our proposed system against other state-of-the-art systems (i.e., M2DP, Scan Context) on public datasets (i.e., KITTI, NCLT, and Complex-Urban) and our self-collected dataset (with a non-repetitive scanning solid-state LiDAR). All the quantitative results show that STD has stronger adaptability and a great improvement in precision over its counterparts. To share our findings and make contributions to the community, we open source our code on our GitHub: https://github.com/hku-mars/STD.


SimFair: A Unified Framework for Fairness-Aware Multi-Label Classification

arXiv.org Artificial Intelligence

Recent years have witnessed increasing concerns towards unfair decisions made by machine learning algorithms. To improve fairness in model decisions, various fairness notions have been proposed and many fairness-aware methods are developed. However, most of existing definitions and methods focus only on single-label classification. Fairness for multi-label classification, where each instance is associated with more than one labels, is still yet to establish. To fill this gap, we study fairness-aware multi-label classification in this paper. We start by extending Demographic Parity (DP) and Equalized Opportunity (EOp), two popular fairness notions, to multi-label classification scenarios. Through a systematic study, we show that on multi-label data, because of unevenly distributed labels, EOp usually fails to construct a reliable estimate on labels with few instances. We then propose a new framework named Similarity $s$-induced Fairness ($s_\gamma$-SimFair). This new framework utilizes data that have similar labels when estimating fairness on a particular label group for better stability, and can unify DP and EOp. Theoretical analysis and experimental results on real-world datasets together demonstrate the advantage of over existing methods $s_\gamma$-SimFair on multi-label classification tasks.


A Survey of Recommender System Techniques and the Ecommerce Domain

arXiv.org Artificial Intelligence

In this big data era, it is hard for the current generation to find the right data from the huge amount of data contained within online platforms. In such a situation, there is a need for an information filtering system that might help them find the information they are looking for. In recent years, a research field has emerged known as recommender systems. Recommenders have become important as they have many real-life applications. This paper reviews the different techniques and developments of recommender systems in e-commerce, e-tourism, e-resources, e-government, e-learning, and e-library. By analyzing recent work on this topic, we will be able to provide a detailed overview of current developments and identify existing difficulties in recommendation systems. The final results give practitioners and researchers the necessary guidance and insights into the recommendation system and its application.


On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

arXiv.org Artificial Intelligence

Evaluation studies frequently rely on simplified experimental settings with non-expert users (e.g., workers on Amazon Mechanical Turk), use proxy tasks (e.g., forward simulation), or use subjective, user-reported measures as metrics of explanation quality [9, 16, 18, 19, 25, 26, 31]. Such settings are not equipped to evaluate the real-world utility of explainable ML methods since proxy task performance does not reflect real-task performance [3], users' perception of explanation usefulness is not reflective of utility in a task [3, 17], and proxy users do not reflect how expert users would use explanations [1]. A few studies evaluate explainable ML methods on their intended deployment settings where domain expert users perform the intended task [10, 20] (dubbed application-grounded evaluation studies in [6]). However, even in those, we argue that experimental design flaws (e.g., not isolating the incremental impact of explanations in [20]) and seemingly trivial design choices that deviate experimental settings from the deployment context (e.g., using metrics that do not reflect the task objectives in [10]), limit the applicability of drawn conclusions. We elaborate on these limitations in Section 2. In this work, we seek to bridge this critical gap by conducting a study that evaluates explainable ML methods in a setting consistent with the intended deployment context. Our study builds on the e-commerce fraud detection setting used in a previous evaluation study [10] consisting of professional fraud analysts tasked with reviewing e-commerce transactions to detect fraud when the ML model is uncertain about the outcome. We identify several simplifying assumptions made by the previous study that deviated from the deployment context and modify the setup to relax those assumptions (summarized in Table 1 and discussed in detail in Section 3.2). These modifications make the experimental setup faithful to the deployment setting and equipped to evaluate the utility of the explainable ML methods considered. Our setup results in dramatically different conclusions of the relative utility of ML model scores and explanations compared to the earlier work [10].


A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

arXiv.org Artificial Intelligence

Binary classification is an important problem in many areas such as computer vision, natural language processing, and bioinformatics. Binary classification learning algorithms result in a function that outputs a real-valued predicted score (larger for more likely to be in the positive class). The prediction accuracy of learned binary classification models can be quantified using the zero-one loss, which corresponds to thresholding the predicted score at zero. Because it only considers one prediction threshold (the default), this evaluation metric can be problematic and/or misleading in some cases (data sets with extreme class imbalance, models with different false positive rates). A more comprehensive and fair evaluation method involves the Receiver Operating Characteristic (ROC) Curve, which involves plotting True Positive Rate versus False Positive Rate, for all thresholds of the predicted score [Egan and Egan, 1975]. The Area Under the ROC Curve (AUC) takes values between zero and one; constant/random/un-informed predictions yield AUC=0.5 and a set of perfect predictions would achieve AUC=1. It is therefore desirable to create learning algorithms that maximize AUC, and that criterion is often used for hyper-parameter selection. However, for gradient descent learning it is impossible to directly use the AUC since it is a piecewise constant function of the predicted values (the gradient is zero almost everywhere). Various authors have proposed to work around this issue by using convex relaxations of the Mann-Whitney statistic [Bamber, 1975], which involves a double sum over all pairs of positive and negative examples.


Valid Inference for Machine Learning Model Parameters

arXiv.org Artificial Intelligence

The parameters of a machine learning model are typically learned by minimizing a loss function on a set of training data. However, this can come with the risk of overtraining; in order for the model to generalize well, it is of great importance that we are able to find the optimal parameter for the model on the entire population -- not only on the given training sample. In this paper, we construct valid confidence sets for this optimal parameter of a machine learning model, which can be generated using only the training data without any knowledge of the population. We then show that studying the distribution of this confidence set allows us to assign a notion of confidence to arbitrary regions of the parameter space, and we demonstrate that this distribution can be well-approximated using bootstrapping techniques.