Not enough data to create a plot.
Try a different view from the menu above.
Taylor, Gavin
Dynamic Contrastive Learning for Time Series Representation
Shamba, Abdul-Kazeem, Bach, Kerstin, Taylor, Gavin
Understanding events in time series is an important task in a variety of contexts. However, human analysis and labeling are expensive and time-consuming. Therefore, it is advantageous to learn embeddings for moments in time series in an unsupervised way, which allows for good performance in classification or detection tasks after later minimal human labeling. In this paper, we propose dynamic contrastive learning (DynaCL), an unsupervised contrastive representation learning framework for time series that uses temporal adjacent steps to define positive pairs. DynaCL adopts N-pair loss to dynamically treat all samples in a batch as positive or negative pairs, enabling efficient training and addressing the challenges of complicated sampling of positives. We demonstrate that DynaCL embeds instances from time series into semantically meaningful clusters, which allows superior performance on downstream tasks on a variety of public time series datasets. Our findings also reveal that high scores on unsupervised clustering metrics do not guarantee that the representations are useful in downstream tasks. A common task in time series (TS) analysis is to split the series into many small windows and identify or label the event taking place in each window. Learning a good representation for these moments eases the time and domain expertise needed for this data annotation. Self-supervised learning, which produces descriptive and intelligible representations in natural language processing (NLP) and computer vision (CV), has emerged as a promising path for learning TS representation.
Execute Order 66: Targeted Data Poisoning for Reinforcement Learning
Foley, Harrison, Fowl, Liam, Goldstein, Tom, Taylor, Gavin
Reinforcement Learning (RL) has quickly achieved impressive results in a wide variety of control problems, from video games to more real-world applications like autonomous driving and cyberdefense [Vinyals et al., 2019, Galias et al., 2019, Nguyen and Reddi, 2019]. However, as RL becomes integrated into more high risk application areas, security vulnerabilities become more pressing. One such security risk is data poisoning, wherein an attacker maliciously modifies training data to achieve certain adversarial goals. In this work, we carry out a novel data poisoning attack for RL agents, which involves imperceptibly altering a small amount of training data. The effect is the trained agent performs its task normally until it encounters a particular state chosen by the attacker, where it misbehaves catastrophically. Although the complex mechanics of RL have historically made data poisoning for RL challenging, we successfully apply gradient alignment, an approach from supervised learning, to RL [Geiping et al., 2020]. Specifically, we attack RL agents playing Atari games, and demonstrate that we can produce agents that effectively play the game, until shown a particular cue. We demonstrate that effective cues include a specific target state of the attacker's choosing, or, more subtly, a translucent watermark appearing on a portion of any state.
Probabilistic Deep Learning to Quantify Uncertainty in Air Quality Forecasting
Murad, Abdulmajid, Kraemer, Frank Alexander, Bach, Kerstin, Taylor, Gavin
Data-driven forecasts of air quality have recently achieved more accurate short-term predictions. Despite their success, most of the current data-driven solutions lack proper quantifications of model uncertainty that communicate how much to trust the forecasts. Recently, several practical tools to estimate uncertainty have been developed in probabilistic deep learning. However, there have not been empirical applications and extensive comparisons of these tools in the domain of air quality forecasts. Therefore, this work applies state-of-the-art techniques of uncertainty quantification in a real-world setting of air quality forecasts. Through extensive experiments, we describe training probabilistic models and evaluate their predictive uncertainties based on empirical performance, reliability of confidence estimate, and practical applicability. We also propose improving these models using "free" adversarial training and exploiting temporal and spatial correlation inherent in air quality data. Our experiments demonstrate that the proposed models perform better than previous works in quantifying uncertainty in data-driven air quality forecasts. Overall, Bayesian neural networks provide a more reliable uncertainty estimate but can be challenging to implement and scale. Other scalable methods, such as deep ensemble, Monte Carlo (MC) dropout, and stochastic weight averaging-Gaussian (SWAG), can perform well if applied correctly but with different tradeoffs and slight variations in performance metrics. Finally, our results show the practical impact of uncertainty estimation and demonstrate that, indeed, probabilistic models are more suitable for making informed decisions. Code and dataset are available at \url{https://github.com/Abdulmajid-Murad/deep_probabilistic_forecast}
FLAG: Adversarial Data Augmentation for Graph Neural Networks
Kong, Kezhi, Li, Guohao, Ding, Mucong, Wu, Zuxuan, Zhu, Chen, Ghanem, Bernard, Taylor, Gavin, Goldstein, Tom
Data augmentation helps neural networks generalize better, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks). While most existing graph regularizers focus on augmenting graph topological structures by adding/removing edges, we offer a novel direction to augment in the input node feature space for better performance. We propose a simple but effective solution, FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training, and boosts performance at test time. Empirically, FLAG can be easily implemented with a dozen lines of code and is flexible enough to function with any GNN backbone, on a wide variety of large-scale datasets, and in both transductive and inductive settings. Without modifying a model's architecture or training setup, FLAG yields a consistent and salient performance boost across both node and graph classification tasks. Using FLAG, we reach state-of-the-art performance on the large-scale ogbg-molpcba, ogbg-ppa, and ogbg-code datasets. Graph Neural Networks (GNNs) have emerged as powerful architectures for learning and analyzing graph representations. The Graph Convolutional Network (GCN) (Kipf & Welling, 2016) and its variants have been applied to a wide range of tasks, including visual recognition (Zhao et al., 2019; Shen et al., 2018), meta-learning (Garcia & Bruna, 2017), social analysis (Qiu et al., 2018; Li & Goldwasser, 2019), and recommender systems (Ying et al., 2018).
Information-Driven Adaptive Sensing Based on Deep Reinforcement Learning
Murad, Abdulmajid, Kraemer, Frank Alexander, Bach, Kerstin, Taylor, Gavin
In order to make better use of deep reinforcement learning in the creation of sensing policies for resource-constrained IoT devices, we present and study a novel reward function based on the Fisher information value. This reward function enables IoT sensor devices to learn to spend available energy on measurements at otherwise unpredictable moments, while conserving energy at times when measurements would provide little new information. This is a highly general approach, which allows for a wide range of use cases without significant human design effort or hyper-parameter tuning. We illustrate the approach in a scenario of workplace noise monitoring, where results show that the learned behavior outperforms a uniform sampling strategy and comes close to a near-optimal oracle solution.
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
Zhu, Chen, Huang, W. Ronny, Shafahi, Ali, Li, Hengduo, Taylor, Gavin, Studer, Christoph, Goldstein, Tom
Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network's outputs, architecture, or (in some cases) training data. To achieve this, we propose a new "polytope attack" in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set.
Autonomous Management of Energy-Harvesting IoT Nodes Using Deep Reinforcement Learning
Murad, Abdulmajid, Kraemer, Frank Alexander, Bach, Kerstin, Taylor, Gavin
Reinforcement learning (RL) is capable of managing wireless, energy-harvesting IoT nodes by solving the problem of autonomous management in non-stationary, resource-constrained settings. We show that the state-of-the-art policy-gradient approaches to RL are appropriate for the IoT domain and that they outperform previous approaches. Due to the ability to model continuous observation and action spaces, as well as improved function approximation capability, the new approaches are able to solve harder problems, permitting reward functions that are better aligned with the actual application goals. We show such a reward function and use policy-gradient approaches to learn capable policies, leading to behavior more appropriate for IoT nodes with less manual design effort, increasing the level of autonomy in IoT.
Adversarial Training for Free!
Shafahi, Ali, Najibi, Mahyar, Ghiasi, Amin, Xu, Zheng, Dickerson, John, Studer, Christoph, Davis, Larry S., Taylor, Gavin, Goldstein, Tom
Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves state-of-the-art robustness on CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks.
Visualizing the Loss Landscape of Neural Nets
Li, Hao, Xu, Zheng, Taylor, Gavin, Studer, Christoph, Goldstein, Tom
Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Visualizing the Loss Landscape of Neural Nets
Li, Hao, Xu, Zheng, Taylor, Gavin, Studer, Christoph, Goldstein, Tom
Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.