Goto

Collaborating Authors

 Accuracy


Certifiable Robustness to Graph Perturbations

arXiv.org Machine Learning

Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy.


Transforming AML with AI - Feedzai

#artificialintelligence

Current anti-money laundering solutions rely on techniques that generate excessive false positive rates which require burdensome manual reviews. Legacy money laundering solutions cannot keep pace with the increasingly sophisticated layering schemes, as well as growing compliance requirements from regulators. Can your bank/financial institution successfully navigate the growing regulatory demands? Is your bank equipped with the newest digital and AI based tools to stay ahead of the new global trends in money laundering?


A Classifiers Voting Model for Exit Prediction of Privately Held Companies

arXiv.org Machine Learning

The difficulty of the problem stems from the lack of reliable, quantitative and publicly available data. In this paper, we contribute to this endeavour by constructing an exit predictor model based on qualitative data, which blends the outcomes of three classifiers, namely, a Logistic Regression model, a Random Forest model, and a Support V ector Machine model. The output of the combined model is selected on the basis of the majority of the output classes of the component models. The models are trained using data extracted from the Thomson Reuters Eikon repository of 54697 US and European companies over the 1996-2011 time span. Experiments have been conducted for predicting whether the company eventually either gets acquired or goes public (IPO), against the complementary event that it remains private or goes bankrupt, in the considered time window. Our model achieves a 63% predictive accuracy, which is quite a valuable figure for Private Equity investors, who typically expect very high returns from successful investments.


Sobolev Independence Criterion

arXiv.org Machine Learning

We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y . SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection. SIC can be seen as a gradient regularized Integral Probability Metric (IPM) between the joint distribution of the two random variables and the product of their marginals. We use sparsity inducing gradient penalties to promote input sparsity of the critic of the IPM. In the kernel version we show that SIC can be cast as a convex optimization problem by introducing auxiliary variables that play an important role in feature selection as they are normalized feature importance scores. We then present a neural version of SIC where the critic is parameterized as a homogeneous neural network, improving its representation power as well as its interpretability. We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at http://github.com/ibm/sic.


What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers

arXiv.org Machine Learning

The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as race and gender, seek to rectify inequity but can yield non-uniform degradation in performance for skewed datasets. In certain domains, imbalanced degradation of performance can yield another form of unintentional bias. In the spirit of constructing fairness-aware algorithms as societal imperative, we explore an alternative: Pareto-Efficient Fairness (PEF). Theoretically, we prove that PEF identifies the operating point on the Pareto curve of subgroup performances closest to the fairness hyperplane, maximizing multiple subgroup accuracy. Empirically we demonstrate that PEF outperforms by achieving Pareto levels in accuracy for all subgroups compared to strict fairness constraints in several UCI datasets.


Unsupervised inference approach to facial attractiveness

arXiv.org Machine Learning

The perception of facial beauty is a complex phenomenon depending on many, detailed and global facial features influencing each other. In the machine learning community this problem is typically tackled as a problem of supervised inference. However, it has been conjectured that this approach does not capture the complexity of the phenomenon. A recent original experiment (Ib\'a\~nez-Berganza et al., Scientific Reports 9, 8364, 2019) allowed different human subjects to navigate the face-space and "sculpt" their preferred modification of a reference facial portrait. Here we present an unsupervised inference study of the set of sculpted facial vectors in that experiment. We first infer minimal, interpretable, and faithful probabilistic models (through Maximum Entropy and artificial neural networks) of the preferred facial variations, that capture the origin of the observed inter-subject diversity in the sculpted faces. The application of such generative models to the supervised classification of the gender of the sculpting subjects, reveals an astonishingly high prediction accuracy. This result suggests that much relevant information regarding the subjects may influence (and be elicited from) her/his facial preference criteria, in agreement with the multiple motive theory of attractiveness proposed in previous works.


DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning

arXiv.org Machine Learning

We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the agent decides dynamically to either collect more information from the set of available features or to stop and predict using the information that is currently available. Building on previous work exploring adversarial representation learning, we attain group fairness (demographic parity) by rewarding the agent with the adversary's loss, computed over the final feature set. Importantly, however, the framework provides a more general starting point for fair or private dynamic information discovery. Finally, we demonstrate empirically, using two real-world datasets, that we can trade-off fairness and predictive performance


Distilling Black-Box Travel Mode Choice Model for Behavioral Interpretation

arXiv.org Machine Learning

Machine learning has proved to be very successful for making predictions in travel behavior modeling. However, most machine-learning models have complex model structures and offer little or no explanation as to how they arrive at these predictions. Interpretations about travel behavior models are essential for decision makers to understand travelers' preferences and plan policy interventions accordingly. Therefore, this paper proposes to apply and extend the model distillation approach, a model-agnostic machine-learning interpretation method, to explain how a black-box travel mode choice model makes predictions for the entire population and subpopulations of interest. Model distillation aims at compressing knowledge from a complex model (teacher) into an understandable and interpretable model (student). In particular, the paper integrates model distillation with market segmentation to generate more insights by accounting for heterogeneity. Furthermore, the paper provides a comprehensive comparison of student models with the benchmark model (decision tree) and the teacher model (gradient boosting trees) to quantify the fidelity and accuracy of the students' interpretations.


Learning Without Loss

arXiv.org Machine Learning

We explore a new approach for training neural networks where all lo ss functions are replaced by hard constraints. The same approach is very successfu l in phase retrieval, where signals are reconstructed from magnitude constraints and gener al characteristics (sparsity, support, etc.). Instead of taking gradient steps, the optimizer in the constraint based approach, called relaxed-reflect-reflect (RRR), derives its step s from projections to local constraints. In neural networks one such projection makes the minimal modification to the inputs x, the associated weights w, and the pre-activation value y at each neuron, to satisfy the equation x · w y . These projections, along with a host of other local projections (constraining pre-and post-activations, etc.) can be partitioned into two sets such that all the projections in each set can be applied concurrently -- across th e network and across all data in the training batch. This partitioning into two sets is analogous to the situation in phase retrieval and the setting for which the general purpose RR R optimizer was designed. Owing to the novelty of the method, this paper also serves as a self-contained tutorial. Starting with a single-layer network that performs nonnegative m atrix factorization, and concluding with a generative model comprising an autoencoder and c lassifier, all applications and their implementations by projections are described in comp lete detail. Although the new approach has the potential to extend the scope of neura l networks (e.g. by defining activation not through functions but constraint sets), most o f the featured models are standard to allow comparison with stochastic gradient descent.


Weakly-supervised Deep Anomaly Detection with Pairwise Relation Learning

arXiv.org Machine Learning

This paper studies a rarely explored but critical anomaly detection problem: weakly-supervised anomaly detection with limited labeled anomalies and a large unlabeled data set. This problem is very important because it (i) enables anomaly-informed modeling which helps identify anomalies of interests and address the notorious high false positives in unsupervised anomaly detection, and (ii) eliminates the reliance on large-scale and complete labeled anomaly data in fully-supervised settings. However, the problem is especially challenging since we have only limited labeled data for a single class, and moreover, the seen anomalies often cannot cover all types of anomalies (i.e., unseen anomalies). We address this problem by formulating the problem as a pairwise relation learning task. Particularly, our approach defines a two-stream ordinal regression network to learn the relation of randomly selected instance pairs, i.e., whether the instance pair contains labeled anomalies or just unlabeled data instances. The resulting model leverages both the labeled and unlabeled data to effectively augment the data and learn generalized representations of both normality and abnormality. Extensive empirical results show that our approach (i) significantly outperforms state-of-the-art competing methods in detecting both seen and unseen anomalies and (ii) is substantially more data-efficient. Introduction Anomaly detection aims at identifying exceptional data instances that have a significant deviation from the majority of data instances, which can offer important insights into broad applications, such as identifying fraudulent transactions or insider trading, detecting network intrusions, and early detection of diseases.