Goto

Collaborating Authors

 Performance Analysis


Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees

arXiv.org Artificial Intelligence

Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose a method based on metrics computed from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example. We focus on datasets containing mostly tabular or structured data, for which the use of Decision Trees ensembles are still the state-of-the-art in terms of performance. We show results on detecting noisy labels in order to either remove them, improving models' metrics in synthetic and real datasets, as well as a productive dataset. Our methods achieved the best results overall when compared with confident learning and heuristics.


Inferring Implicit Relations in Complex Questions with Language Models

arXiv.org Artificial Intelligence

A prominent challenge for modern language understanding systems is the ability to answer implicit reasoning questions, where the required reasoning steps for answering the question are not mentioned in the text explicitly. In this work, we investigate why current models struggle with implicit reasoning question answering (QA) tasks, by decoupling inference of reasoning steps from their execution. We define a new task of implicit relation inference and construct a benchmark, IMPLICITRELATIONS, where given a question, a model should output a list of concept-relation pairs, where the relations describe the implicit reasoning steps required for answering the question. Using IMPLICITRELATIONS, we evaluate models from the GPT-3 family and find that, while these models struggle on the implicit reasoning QA task, they often succeed at inferring implicit relations. This suggests that the challenge in implicit reasoning questions does not stem from the need to plan a reasoning strategy alone, but to do it while also retrieving and reasoning over relevant information.


Representative Teacher Keys for Knowledge Distillation Model Compression Based on Attention Mechanism for Image Classification

arXiv.org Artificial Intelligence

With the improvement of AI chips (e.g., GPU, TPU, and NPU) and the fast development of the Internet of Things (IoT), some robust deep neural networks (DNNs) are usually composed of millions or even hundreds of millions of parameters. Such a large model may not be suitable for directly deploying on low computation and low capacity units (e.g., edge devices). Knowledge distillation (KD) has recently been recognized as a powerful model compression method to decrease the model parameters effectively. The central concept of KD is to extract useful information from the feature maps of a large model (i.e., teacher model) as a reference to successfully train a small model (i.e., student model) in which the model size is much smaller than the teacher one. Although many KD methods have been proposed to utilize the information from the feature maps of intermediate layers in the teacher model, most did not consider the similarity of feature maps between the teacher model and the student model. As a result, it may make the student model learn useless information. Inspired by the attention mechanism, we propose a novel KD method called representative teacher key (RTK) that not only considers the similarity of feature maps but also filters out the useless information to improve the performance of the target student model. In the experiments, we validate our proposed method with several backbone networks (e.g., ResNet and WideResNet) and datasets (e.g., CIFAR10, CIFAR100, SVHN, and CINIC10). The results show that our proposed RTK can effectively improve the classification accuracy of the state-of-the-art attention-based KD method.


XC: Exploring Quantitative Use Cases for Explanations in 3D Object Detection

arXiv.org Artificial Intelligence

Explainable AI (XAI) methods are frequently applied to obtain qualitative insights about deep models' predictions. However, such insights need to be interpreted by a human observer to be useful. In this paper, we aim to use explanations directly to make decisions without human observers. We adopt two gradient-based explanation methods, Integrated Gradients (IG) and backprop, for the task of 3D object detection. Then, we propose a set of quantitative measures, named Explanation Concentration (XC) scores, that can be used for downstream tasks. These scores quantify the concentration of attributions within the boundaries of detected objects. We evaluate the effectiveness of XC scores via the task of distinguishing true positive (TP) and false positive (FP) detected objects in the KITTI and Waymo datasets. The results demonstrate an improvement of more than 100\% on both datasets compared to other heuristics such as random guesses and the number of LiDAR points in the bounding box, raising confidence in XC's potential for application in more use cases. Our results also indicate that computationally expensive XAI methods like IG may not be more valuable when used quantitatively compare to simpler methods.


Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience. Most recent studies in OOD detection utilize the information from a single representation that resides in the penultimate layer to determine whether the input is anomalous or not. Although such a method is straightforward, the potential of diverse information in the intermediate layers is overlooked. In this paper, we propose a novel framework based on contrastive learning that encourages intermediate features to learn layer-specialized representations and assembles them implicitly into a single representation to absorb rich information in the pre-trained language model. Extensive experiments in various intent classification and OOD datasets demonstrate that our approach is significantly more effective than other works.


FairEGM: Fair Link Prediction and Recommendation via Emulated Graph Modification

arXiv.org Artificial Intelligence

As machine learning becomes more widely adopted across domains, it is critical that researchers and ML engineers think about the inherent biases in the data that may be perpetuated by the model. Recently, many studies have shown that such biases are also imbibed in Graph Neural Network (GNN) models if the input graph is biased, potentially to the disadvantage of underserved and underrepresented communities. In this work, we aim to mitigate the bias learned by GNNs by jointly optimizing two different loss functions: one for the task of link prediction and one for the task of demographic parity. We further implement three different techniques inspired by graph modification approaches: the Global Fairness Optimization (GFO), Constrained Fairness Optimization (CFO), and Fair Edge Weighting (FEW) models. These techniques mimic the effects of changing underlying graph structures within the GNN and offer a greater degree of interpretability over more integrated neural network methods. Our proposed models emulate microscopic or macroscopic edits to the input graph while training GNNs and learn node embeddings that are both accurate and fair under the context of link recommendations. We demonstrate the effectiveness of our approach on four real world datasets and show that we can improve the recommendation fairness by several factors at negligible cost to link prediction accuracy.


Deep Scattering Spectrum germaneness to Fault Detection and Diagnosis for Component-level Prognostics and Health Management (PHM)

arXiv.org Artificial Intelligence

In fault detection and diagnosis of prognostics and health management (PHM) systems, most of the methodologies utilize machine learning (ML) or deep learning (DL) through which either some features are extracted beforehand (in the case of ML) or filters are used to extract features autonomously (in case of DL) to perform the critical classification task. Particularly in the fault detection and diagnosis of industrial robots where electric current, vibration or acoustic emissions signals are the primary sources of information, a feature domain that can map the signals into their constituent components with compressed information at different levels can reduce the complexities and size of typical ML and DL-based frameworks. The Deep Scattering Spectrum (DSS) is one of the strategies that use the Wavelet Transform (WT) analogy to separate and extract the information encoded in a signal's various temporal and frequency domains. As a result, the focus of this work is on the study of the DSS's relevance to fault detection and daignosis for mechanical components of industrail robots. We used multiple industrial robots and distinct mechanical faults to build an approach for classifying the faults using low-variance features extracted from the input signals. The presented approach was implemented on the practical test benches and demonstrated satisfactory performance in fault detection and diagnosis for simple and complex classification problems with a classification accuracy of 99.7% and 88.1%, respectively.


A Robust Pedestrian Detection Approach for Autonomous Vehicles

arXiv.org Artificial Intelligence

Nowadays, utilizing Advanced Driver-Assistance Systems (ADAS) has absorbed a huge interest as a potential solution for reducing road traffic issues. Despite recent technological advances in such systems, there are still many inquiries that need to be overcome. For instance, ADAS requires accurate and real-time detection of pedestrians in various driving scenarios. To solve the mentioned problem, this paper aims to fine-tune the YOLOv5s framework for handling pedestrian detection challenges on the real-world instances of Caltech pedestrian dataset. We also introduce a developed toolbox for preparing training and test data and annotations of Caltech pedestrian dataset into the format recognizable by YOLOv5. Experimental results of utilizing our approach show that the mean Average Precision (mAP) of our fine-tuned model for pedestrian detection task is more than 91 percent when performing at the highest rate of 70 FPS. Moreover, the experiments on the Caltech pedestrian dataset samples have verified that our proposed approach is an effective and accurate method for pedestrian detection and can outperform other existing methodologies.


On the Perils of Cascading Robust Classifiers

arXiv.org Artificial Intelligence

Ensembling certifiably robust neural networks is a promising approach for improving the \emph{certified robust accuracy} of neural models. Black-box ensembles that assume only query-access to the constituent models (and their robustness certifiers) during prediction are particularly attractive due to their modular structure. Cascading ensembles are a popular instance of black-box ensembles that appear to improve certified robust accuracies in practice. However, we show that the robustness certifier used by a cascading ensemble is unsound. That is, when a cascading ensemble is certified as locally robust at an input $x$ (with respect to $\epsilon$), there can be inputs $x'$ in the $\epsilon$-ball centered at $x$, such that the cascade's prediction at $x'$ is different from $x$ and thus the ensemble is not locally robust. Our theoretical findings are accompanied by empirical results that further demonstrate this unsoundness. We present \emph{cascade attack} (CasA), an adversarial attack against cascading ensembles, and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set. Our work reveals a critical pitfall of cascading certifiably robust models by showing that the seemingly beneficial strategy of cascading can actually hurt the robustness of the resulting ensemble. Our code is available at \url{https://github.com/TristaChi/ensembleKW}.


Graph Regularized Probabilistic Matrix Factorization for Drug-Drug Interactions Prediction

arXiv.org Artificial Intelligence

Co-administration of two or more drugs simultaneously can result in adverse drug reactions. Identifying drug-drug interactions (DDIs) is necessary, especially for drug development and for repurposing old drugs. DDI prediction can be viewed as a matrix completion task, for which matrix factorization (MF) appears as a suitable solution. This paper presents a novel Graph Regularized Probabilistic Matrix Factorization (GRPMF) method, which incorporates expert knowledge through a novel graph-based regularization strategy within an MF framework. An efficient and sounded optimization algorithm is proposed to solve the resulting non-convex problem in an alternating fashion. The performance of the proposed method is evaluated through the DrugBank dataset, and comparisons are provided against state-of-the-art techniques. The results demonstrate the superior performance of GRPMF when compared to its counterparts.