Country
A Graph-Based Approach for Active Learning in Regression
Zhang, Hongjing, Ravi, S. S., Davidson, Ian
Active learning aims to reduce labeling efforts by selectively asking humans to annotate the most important data points from an unlabeled pool and is an example of human-machine interaction. Though active learning has been extensively researched for classification and ranking problems, it is relatively understudied for regression problems. Most existing active learning for regression methods use the regression function learned at each active learning iteration to select the next informative point to query. This introduces several challenges such as handling noisy labels, parameter uncertainty and overcoming initially biased training data. Instead, we propose a feature-focused approach that formulates both sequential and batch-mode active regression as a novel bipartite graph optimization problem. We conduct experiments on both noise-free and noisy settings. Our experimental results on benchmark data sets demonstrate the effectiveness of our proposed approach.
Adversarial Attacks on Convolutional Neural Networks in Facial Recognition Domain
Alparslan, Yigit, Keim-Shenk, Jeremy, Khade, Shweta, Greenstadt, Rachel
Numerous recent studies have demonstrated how Deep Neural Network (DNN) classifiers can be fooled by adversarial examples, in which an attacker adds perturbations to an original sample, causing the classifier to misclassify the sample. Adversarial attacks that render DNNs vulnerable in real life represent a serious threat, given the consequences of improperly functioning autonomous vehicles, malware filters, or biometric authentication systems. In this paper, we apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier that we trained ourselves, to analyze transferability of this method. Next, we craft a variety of different attack algorithms on a facial image dataset, with the intention of developing untargeted black-box approaches assuming minimal adversarial knowledge, to further assess the robustness of DNNs in the facial recognition realm. We explore modifying single optimal pixels by a large amount, or modifying all pixels by a smaller amount, or combining these two attack approaches. While our single-pixel attacks achieved about a 15% average decrease in classifier confidence level for the actual class, the all-pixel attacks were more successful and achieved up to an 84% average decrease in confidence, along with an 81.6% misclassification rate, in the case of the attack that we tested with the highest levels of perturbation. Even with these high levels of perturbation, the face images remained fairly clearly identifiable to a human. We hope our research may help to advance the study of adversarial attacks on DNNs and defensive mechanisms to counteract them, particularly in the facial recognition domain.
Multi-Marginal Optimal Transport Defines a Generalized Metric
Abstract--We prove that the multi-marginal optimal transport (MMOT) problem defines a generalized metric. In addition, we prove that the distance induced by MMOT satisfies a generaliz ed triangle inequality that, to leading order, cannot be impro ved. The Optimal Transport (OT) problem dates back to 1781, when Monge [1] raised the problem of finding a way to transport one distribution of points (formally a probabili ty distribution) into another one at minimal cost. OT theory wa s greatly developed in the past century, especially assisted by Kantorovich [2] in 1941 and Brenier [3] in 1991, and, in part thanks to contemporary fast OT solvers, e.g. In this case, the minimum cost induced by (1) is called the W asserstein distance (WD), and it is a metric on the space of probability measures. The WD has gained increasing popularity in the past two decades thanks to its superiority over other metrics, and divergences, in many applications, e.g.
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Zhang, Shangtong, Liu, Bo, Whiteson, Shimon
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems with GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced, so primal-dual algorithms are not guaranteed to find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE's original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE's use of divergence. Consequently, nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.
Simulation of electron-proton scattering events by a Feature-Augmented and Transformed Generative Adversarial Network (FAT-GAN)
Alanazi, Yasir, Sato, N., Liu, Tianbo, Melnitchouk, W., Kuchera, Michelle P., Pritchard, Evan, Robertson, Michael, Strauss, Ryan, Velasco, Luisa, Li, Yaohang
We apply generative adversarial network (GAN) technology to build an event generator that simulates particle production in electron-proton scattering that is free of theoretical assumptions about underlying particle dynamics. The difficulty of efficiently training a GAN event simulator lies in learning the complicated patterns of the distributions of the particles physical properties. We develop a GAN that selects a set of transformed features from particle momenta that can be generated easily by the generator, and uses these to produce a set of augmented features that improve the sensitivity of the discriminator. The new Feature-Augmented and Transformed GAN (FA T -GAN) is able to faithfully reproduce the distribution of final state electron momenta in inclusive electron scattering, without the need for input derived from domain-based theoretical assumptions. The developed technology can play a significant role in boosting the science of the Jefferson Lab 12 GeV program and the future Electron-Ion Collider.
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
Wang, Zhecheng, Li, Haoyuan, Rajagopal, Ram
Understanding intrinsic patterns and predicting spatiotemporal characteristics of cities require a comprehensive representation of urban neighborhoods. Existing works relied on either inter- or intra-region connectivities to generate neighborhood representations but failed to fully utilize the informative yet heterogeneous data within neighborhoods. In this work, we propose Urban2Vec, an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest (POI) data to learn neighborhood embeddings. Specifically, we use a convolutional neural network to extract visual features from street view images while preserving geospatial similarity. Furthermore, we model each POI as a bag-of-words containing its category, rating, and review information. Analog to document embedding in natural language processing, we establish the semantic similarity between neighborhood ("document") and the words from its surrounding POIs in the vector space. By jointly encoding visual, textual, and geospatial information into the neighborhood representation, Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks. Extensive experiments on three U.S. metropolitan areas also demonstrate the model interpretability, generalization capability, and its value in neighborhood similarity analysis.
stream-learn -- open-source Python library for difficult data stream batch analysis
Ksieniewicz, Paweล, Zyblewski, Paweล
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. I ts main component is a stream generator, which allows to produce a synthet ic data stream that may incorporate each of the three main concept drift typ es (i.e. The package allows conducting experiments following estab lished evaluation methodologies (i.e. In addition, estimators adapted for data stream classification have been implem ented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utili ses its own implementations of prediction metrics for imbalanced binary cla ssification tasks. Keywords: Data stream, Concept drift, Imbalanced data, Dynamic class imbalance 1. Motivation and significance Pattern recognition research increasingly goes beyond the usual pattern of building classification models on stationary data sets an d focuses on data stream processing where class distributions, and hence als o decision boundaries, may change over time [1].
Safe Predictors for Enforcing Input-Output Specifications
Mell, Stephen, Brown, Olivia, Goodwin, Justin, Son, Sung-Hyun
We present an approach for designing correct-by-construction neural networks (and other machine learning models) that are guaranteed to be consistent with a collection of input-output specifications before, during, and after algorithm training. Our method involves designing a constrained predictor for each set of compatible constraints, and combining them safely via a convex combination of their predictions. We demonstrate our approach on synthetic datasets and an aircraft collision avoidance problem.
Functional Sequential Treatment Allocation with Covariates
Kock, Anders Bredahl, Preinerstorfer, David, Veliyev, Bezirgen
The classical multi-armed bandit literature considers a sequential d ecision problem in which a policy maker attempts to assign subjects to the treatment with the highest expected outcome. Two practically relevant generalizations of this se tting have attracted much attention: (i) a problem where the decision maker can incorpor ate a vector of covariates in the assignment of each subject, cf. Woodroofe ( 1979), Yang et al. ( 2002), Rigollet and Zeevi ( 2010) and Perchet and Rigollet ( 2013); (ii) problems where instead of targeting the outcome distribution with highest expectation, the d ecision maker is interested in targeting another functional such as a quantile, a risk mea sure, or other characteristics of the distribution, cf.
The Case for Bayesian Deep Learning
The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.