Goto

Collaborating Authors

 loss network


" Training your image restoration network better with random weight network as optimization function " Supplementary Material

Neural Information Processing Systems

Section 0.1 provides the quantitative results for pan-sharpening. Section 0.2 provides the qualitative experimental results. Section 0.3 provides more provides more quantitative experimental results over ablation studies. In our work, the default initialization strategy is Kaiming initialization. All of the loss networks are implemented by convolution network as default.


A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions

Pihlgren, Gustav Grund, Nikolaidou, Konstantina, Chhipa, Prakash Chandra, Abid, Nosheen, Saini, Rajkumar, Sandin, Fredrik, Liwicki, Marcus

arXiv.org Artificial Intelligence

Abstract--Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years, the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs, such as image synthesis, segmentation, depth prediction, and more. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The use cases of perceptual similarity, super-resolution, image segmentation, and dimensionality reduction, are evaluated through benchmarks. The procedure followed in this work. In the last decade, machine learning for computer vision One of the methods that makes use of loss networks that have has evolved significantly. This evolution is mainly due to proven effective for a range of computer vision applications the developments in artificial neural networks. An essential is deep perceptual loss. Deep perceptual loss aims to create focus of these developments has been the calculation of the loss functions for machine learning models that mimic human loss used to train models. One group of loss functions build perception of similarity. This is typically done by calculating on using a neural network to calculate the loss for another the similarity of the output image and ground truth by how machine learning model. Among these methods are milestone similar the deep features (activations) of the loss network are achievements such as adversarial examples [1], generative when either image is used as input. Benchmark 3 [7] Deep perceptual loss is well suited for image synthesis, where evaluates U-nets [21] trained with the loss networks on the it is utilized to improve performance on various tasks.


Perceptual Loss for Robust Unsupervised Homography Estimation

Koguciuk, Daniel, Arani, Elahe, Zonooz, Bahram

arXiv.org Artificial Intelligence

Homography estimation is often an indispensable step in many computer vision tasks. The existing approaches, however, are not robust to illumination and/or larger viewpoint changes. In this paper, we propose bidirectional implicit Homography Estimation (biHomE) loss for unsupervised homography estimation. biHomE minimizes the distance in the feature space between the warped image from the source viewpoint and the corresponding image from the target viewpoint. Since we use a fixed pre-trained feature extractor and the only learnable component of our framework is the homography network, we effectively decouple the homography estimation from representation learning. We use an additional photometric distortion step in the synthetic COCO dataset generation to better represent the illumination variation of the real-world scenarios. We show that biHomE achieves state-of-the-art performance on synthetic COCO dataset, which is also comparable or better compared to supervised approaches. Furthermore, the empirical results demonstrate the robustness of our approach to illumination variation compared to existing methods.


Towards adversarial robustness with 01 loss neural networks

Xue, Yunzhe, Xie, Meiyan, Roshan, Usman

arXiv.org Machine Learning

Motivated by the general robustness properties of the 01 loss we propose a single hidden layer 01 loss neural network trained with stochastic coordinate descent as a defense against adversarial attacks in machine learning. One measure of a model's robustness is the minimum distortion required to make the input adversarial. This can be approximated with the Boundary Attack (Brendel et. al. 2018) and HopSkipJump (Chen et. al. 2019) methods. We compare the minimum distortion of the 01 loss network to the binarized neural network and the standard sigmoid activation network with cross-entropy loss all trained with and without Gaussian noise on the CIFAR10 benchmark binary classification between classes 0 and 1. Both with and without noise training we find our 01 loss network to have the largest adversarial distortion of the three models by non-trivial margins. To further validate these results we subject all models to substitute model black box attacks under different distortion thresholds and find that the 01 loss network is the hardest to attack across all distortions. At a distortion of 0.125 both sigmoid activated cross-entropy loss and binarized networks have almost 0% accuracy on adversarial examples whereas the 01 loss network is at 40%. Even though both 01 loss and the binarized network use sign activations their training algorithms are different which in turn give different solutions for robustness. Finally we compare our network to simple convolutional models under substitute model black box attacks and find their accuracies to be comparable. Our work shows that the 01 loss network has the potential to defend against black box adversarial attacks better than convex loss and binarized networks.


How AI learned to be creative

#artificialintelligence

With the success of deep learning, algorithms have pushed into another domain that humans thought was safe from automation: the creation of compelling art. AI-generated art has improved dramatically over the past several years, and the results can be seen in competitions like RobotArt and NVIDIA's DeepArt: But while these models are certainly an impressive technical accomplishment, a contentious point of discussion is whether AI and machine learning models are truly creative in the way humans are. Some have argued that it isn't really creative to build mathematical models of pixels in an image or to identify sequential dependencies in the structure of songs. AI, they claim, lacks the human touch. But it's also not clear that human brains are doing anything more impressive. How do we know that the artistic spark of a painter or musician isn't actually a mathematical model, trained -- like a neural network -- through constant practice?


VIABLE: Fast Adaptation via Backpropagating Learned Loss

Feng, Leo, Zintgraf, Luisa, Peng, Bei, Whiteson, Shimon

arXiv.org Machine Learning

In few-shot learning, typically, the loss function which is applied at test time is the one we are ultimately interested in minimising, such as the mean-squared-error loss for a regression problem. However, given that we have few samples at test time, we argue that the loss function that we are interested in minimising is not necessarily the loss function most suitable for computing gradients in a few-shot setting. We propose VIABLE, a generic meta-learning extension that builds on existing meta-gradient-based methods by learning a differentiable loss function, replacing the pre-defined inner-loop loss function in performing task-specific updates. We show that learning a loss function capable of leveraging relational information between samples reduces underfitting, and significantly improves performance and sample efficiency on a simple regression task. Furthermore, we show VIABLE is scalable by evaluating on the Mini-Imagenet dataset.


RadGrad: Active learning with loss gradients

Budnarain, Paul, Junior, Renato Ferreira Pinto, Kogan, Ilan

arXiv.org Artificial Intelligence

Solving sequential decision prediction problems, including those in imitation learning settings, requires mitigating the problem of covariate shift. The standard approach, DAgger, relies on capturing expert behaviour in all states that the agent reaches. In real-world settings, querying an expert is costly. We propose a new active learning algorithm that selectively queries the expert, based on both a prediction of agent error and a proxy for agent risk, that maintains the performance of unrestrained expert querying systems while substantially reducing the number of expert queries made. We show that our approach, RadGrad, has the potential to improve upon existing safety-aware algorithms, and matches or exceeds the performance of DAgger and variants (i.e., SafeDAgger) in one simulated environment. However, we also find that a more complex environment poses challenges not only to our proposed method, but also to existing safety-aware algorithms, which do not match the performance of DAgger in our experiments.


Autoencoder Based Architecture For Fast & Real Time Audio Style Transfer

Ramani, Dhruv, Karmakar, Samarjit, Panda, Anirban, Ahmed, Asad, Tangri, Pratham

arXiv.org Machine Learning

Recently, there has been great interest in the field of audio style transfer, where a stylized audio is generated by imposing the style of a reference audio on the content of a target audio. We improve on the current approaches which use neural networks to extract the content and the style of the audio signal and propose a new autoencoder based architecture for the task. This network generates a stylized audio for a content audio in a single forward pass. The proposed network architecture proves to be advantageous over the quality of audio produced and the time taken to train the network. The network is experimented on speech signals to confirm the validity of our proposal.


Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer

Wang, Xin, Oxholm, Geoffrey, Zhang, Da, Wang, Yuan-Fang

arXiv.org Artificial Intelligence

Transferring artistic styles onto everyday photographs has become an extremely popular task in both academia and industry. Recently, offline training has replaced on-line iterative optimization, enabling nearly real-time stylization. When those stylization networks are applied directly to high-resolution images, however, the style of localized regions often appears less similar to the desired artistic style. This is because the transfer process fails to capture small, intricate textures and maintain correct texture scales of the artworks. Here we propose a multimodal convolutional neural network that takes into consideration faithful representations of both color and luminance channels, and performs stylization hierarchically with multiple losses of increasing scales. Compared to state-of-the-art networks, our network can also perform style transfer in nearly real-time by conducting much more sophisticated training offline. By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones. That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.


ZM-Net: Real-time Zero-shot Image Manipulation Network

Wang, Hao, Liang, Xiaodan, Zhang, Hao, Yeung, Dit-Yan, Xing, Eric P.

arXiv.org Machine Learning

Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. A holy-grail solution towards generic image manipulation should be able to efficiently alter an input image with any personalized signals (even signals unseen during training), such as diverse paintings and arbitrary descriptive attributes. However, existing methods are either inefficient to simultaneously process multiple signals (let alone generalize to unseen signals), or unable to handle signals from other modalities. In this paper, we make the first attempt to address the zero-shot image manipulation task. We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a fully-differentiable architecture that jointly optimizes an image-transformation network (TNet) and a parameter network (PNet). The PNet learns to generate key transformation parameters for the TNet given any guiding signal while the TNet performs fast zero-shot image manipulation according to both signal-dependent parameters from the PNet and signal-invariant parameters from the TNet itself. Extensive experiments show that our ZM-Net can perform high-quality image manipulation conditioned on different forms of guiding signals (e.g. style images and attributes) in real-time (tens of milliseconds per image) even for unseen signals. Moreover, a large-scale style dataset with over 20,000 style images is also constructed to promote further research.