Banff
Value Function Based Performance Optimization of Deep Learning Workloads
Steiner, Benoit, Cummins, Chris, He, Horace, Leather, Hugh
As machine learning techniques become ubiquitous, the efficiency of neural network implementations is becoming correspondingly paramount. Frameworks, such as Halide and TVM, separate out the algorithmic representation of the network from the schedule that determines its implementation. Finding good schedules, however, remains extremely challenging. We model this scheduling problem as a sequence of optimization choices, and present a new technique to accurately predict the expected performance of a partial schedule. By leveraging these predictions we can make these optimization decisions greedily and rapidly identify an efficient schedule. This enables us to find schedules that improve the throughput of deep neural networks by 2.6x over Halide and 1.5x over TVM. Moreover, our technique is two to three orders of magnitude faster than that of these tools, and completes in seconds instead of hours.
A Study on the Uncertainty of Convolutional Layers in Deep Neural Networks
Shen, Haojing, Chen, Sihong, Wang, Ran
This paper shows a Min-Max property existing in the connection weights of the convolutional layers in a neural network structure, i.e., the LeNet. Specifically, the Min-Max property means that, during the back propagation-based training for LeNet, the weights of the convolutional layers will become far away from their centers of intervals, i.e., decreasing to their minimum or increasing to their maximum. From the perspective of uncertainty, we demonstrate that the Min-Max property corresponds to minimizing the fuzziness of the model parameters through a simplified formulation of convolution. It is experimentally confirmed that the model with the Min-Max property has a stronger adversarial robustness, thus this property can be incorporated into the design of loss function. This paper points out a changing tendency of uncertainty in the convolutional layers of LeNet structure, and gives some insights to the interpretability of convolution.
Unsupervised Object Keypoint Learning using Local Spatial Predictability
Gopalakrishnan, Anand, van Steenkiste, Sjoerd, Schmidhuber, Jürgen
Hence, which layer(s) we choose as our feature embedding will have an effect on the outcome of the local spatial prediction problem. While more abstract high-level features are expected to better capture the internal predictive structure of an object, it will be more difficult to attribute the error of the prediction network to the exact image location. On the other hand, while more low-level features can be localized more accurately, they may lack the expressiveness to capture high-level properties of objects. Nonetheless, in practice we find that a spatial feature embedding based on earlier layers of the encoder works well (see also Section 5.3 for an ablation). Local Spatial Prediction Task Using the learned spatial feature embedding we seek out salient regions of the input image that correspond to object parts. Our approach is based on the idea that objects correspond to local regions in feature space that have high internal predictive structure, which allows us to formulate the following local spatial prediction (LSP) task. For each location in the learned spatial feature embedding, we seek to predict the value of the features (across the feature maps) from its neighbouring feature values. When neighbouring areas correspond to the same object-(part), i.e. they regularly appear together, we expect that this prediction problem is easy (green arrow in Figure 3).
Adversarially Robust Classification based on GLRT
Puranik, Bhagyashree, Madhow, Upamanyu, Pedarsani, Ramtin
Machine learning models are vulnerable to adversarial attacks that can often cause misclassification by introducing small but well designed perturbations. In this paper, we explore, in the setting of classical composite hypothesis testing, a defense strategy based on the generalized likelihood ratio test (GLRT), which jointly estimates the class of interest and the adversarial perturbation. We evaluate the GLRT approach for the special case of binary hypothesis testing in white Gaussian noise under $\ell_{\infty}$ norm-bounded adversarial perturbations, a setting for which a minimax strategy optimizing for the worst-case attack is known. We show that the GLRT approach yields performance competitive with that of the minimax approach under the worst-case attack, and observe that it yields a better robustness-accuracy trade-off under weaker attacks, depending on the values of signal components relative to the attack budget. We also observe that the GLRT defense generalizes naturally to more complex models for which optimal minimax classifiers are not known.
On the Transferability of VAE Embeddings using Relational Knowledge with Semi-Supervision
Strömfelt, Harald, Dickens, Luke, Garcez, Artur d'Avila, Russo, Alessandra
When dealing with complex data, the effectiveness of a classifier/predictor is limited by its ability to extract useful information. As such, representations that clearly expose the semantics of the data should then be most amenable to downstream learning [1, 2]. This is often referred to as a challenge of acquiring a disentangled representation over the factors of the data [3]. A popular recent trend that has had significant success in this regard uses semi-supervised Variational AutoEncoders (VAE) [4, 5, 6, 7, 8, 9]. Whilst fully unsupervised VAE methods have been shown to require strong inductive bias [10], semi-supervised methods achieve disentanglement by training additional auxiliary tasks that are defined on the factors, alongside the standard VAE objective (see Appendix Eqn. 3).
CAN: Revisiting Feature Co-Action for Click-Through Rate Prediction
Zhou, Guorui, Bian, Weijie, Wu, Kailun, Ren, Lejian, Pi, Qi, Zhang, Yujing, Xiao, Can, Sheng, Xiang-Rong, Mou, Na, Luo, Xinchen, Zhang, Chi, Qiao, Xianjie, Xiang, Shiming, Gai, Kun, Zhu, Xiaoqiang, Xu, Jian
Inspired by the success of deep learning, recent industrial Click-Through Rate (CTR) prediction models have made the transition from traditional shallow approaches to deep approaches. Deep Neural Networks (DNNs) are known for its ability to learn non-linear interactions from raw feature automatically, however, the non-linear feature interaction is learned in an implicit manner. The non-linear interaction may be hard to capture and explicitly model the \textit{co-action} of raw feature is beneficial for CTR prediction. \textit{Co-action} refers to the collective effects of features toward final prediction. In this paper, we argue that current CTR models do not fully explore the potential of feature co-action. We conduct experiments and show that the effect of feature co-action is underestimated seriously. Motivated by our observation, we propose feature Co-Action Network (CAN) to explore the potential of feature co-action. The proposed model can efficiently and effectively capture the feature co-action, which improves the model performance while reduce the storage and computation consumption. Experiment results on public and industrial datasets show that CAN outperforms state-of-the-art CTR models by a large margin. Up to now, CAN has been deployed in the Alibaba display advertisement system, obtaining averaging 12\% improvement on CTR and 8\% on RPM.
LADA: Look-Ahead Data Acquisition via Augmentation for Active Learning
Kim, Yoon-Yeong, Song, Kyungwoo, Jang, JoonHo, Moon, Il-Chul
Active learning effectively collects data instances for training deep learning models when the labeled dataset is limited and the annotation cost is high. Besides active learning, data augmentation is also an effective technique to enlarge the limited amount of labeled instances. However, the potential gain from virtual instances generated by data augmentation has not been considered in the acquisition process of active learning yet. Looking ahead the effect of data augmentation in the process of acquisition would select and generate the data instances that are informative for training the model. Hence, this paper proposes Look-Ahead Data Acquisition via augmentation, or LADA, to integrate data acquisition and data augmentation. LADA considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation, in advance of the acquisition process. Moreover, to enhance the informativeness of the virtual data instances, LADA optimizes the data augmentation policy to maximize the predictive acquisition score, resulting in the proposal of InfoMixup and InfoSTN. As LADA is a generalizable framework, we experiment with the various combinations of acquisition and augmentation methods. The performance of LADA shows a significant improvement over the recent augmentation and acquisition baselines which were independently applied to the benchmark datasets.
Predictive Analysis of Diabetic Retinopathy with Transfer Learning
Labhsetwar, Shreyas Rajesh, Salvi, Raj Sunil, Kolte, Piyush Arvind, venkatesh, Veerasai Subramaniam, Baretto, Alistair Michael
With the prevalence of Diabetes, the Diabetes Mellitus Retinopathy (DR) is becoming a major health problem across the world. The long-term medical complications arising due to DR have a significant impact on the patient as well as the society, as the disease mostly affects individuals in their most productive years. Early detection and treatment can help reduce the extent of damage to the patients. The rise of Convolutional Neural Networks for predictive analysis in the medical field paves the way for a robust solution to DR detection. This paper studies the performance of several highly efficient and scalable CNN architectures for Diabetic Retinopathy Classification with the help of Transfer Learning. The research focuses on VGG16, Resnet50 V2 and EfficientNet B0 models. The classification performance is analyzed using several performance metrics including True Positive Rate, False Positive Rate, Accuracy, etc. Also, several performance graphs are plotted for visualizing the architecture performance including Confusion Matrix, ROC Curve, etc. The results indicate that Transfer Learning with ImageNet weights using VGG 16 model demonstrates the best classification performance with the best Accuracy of 95%. It is closely followed by ResNet50 V2 architecture with the best Accuracy of 93%. This paper shows that predictive analysis of DR from retinal images is achieved with Transfer Learning on Convolutional Neural Networks.
Overcoming Negative Transfer: A Survey
Zhang, Wen, Deng, Lingfei, Zhang, Lei, Wu, Dongrui
Transfer learning (TL) tries to utilize data or knowledge from one or more source domains to facilitate the learning in a target domain. It is particularly useful when the target domain has few or no labeled data, due to annotation expense, privacy concerns, etc. Unfortunately, the effectiveness of TL is not always guaranteed. Negative transfer (NT), i.e., the source domain data/knowledge cause reduced learning performance in the target domain, has been a long-standing and challenging problem in TL. Various approaches to overcome NT have been proposed in the literature. However, there has not been a systematic survey on overcoming NT. This paper fills the gap, by categorizing and reviewing near 100 approaches for combating NT, from four perspectives: source data quality, target data quality, domain divergence, and integrated algorithms. NT in related fields, e.g., multi-task learning, multilingual models, and lifelong learning, is also discussed.
Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability
Nguyen, Hai, Daley, Brett, Song, Xinchao, Amato, Christopher, Platt, Robert
Many important robotics problems are partially observable in the sense that a single visual or force-feedback measurement is insufficient to reconstruct the state. Standard approaches involve learning a policy over beliefs or observation-action histories. However, both of these have drawbacks; it is expensive to track the belief online, and it is hard to learn policies directly over histories. We propose a method for policy learning under partial observability called the Belief-Grounded Network (BGN) in which an auxiliary belief-reconstruction loss incentivizes a neural network to concisely summarize its input history. Since the resulting policy is a function of the history rather than the belief, it can be executed easily at runtime. We compare BGN against several baselines on classic benchmark tasks as well as three novel robotic touch-sensing tasks. BGN outperforms all other tested methods and its learned policies work well when transferred onto a physical robot.