Goto

Collaborating Authors

 tnet



NeuralTangentKernelMaximumMeanDiscrepancy

Neural Information Processing Systems

Inthisregime, thetraining timeissufficiently short,andnetworksaresufficiently parametrized suchthatnetwork parameters stay close to the randomized initial values over the training process.



Hard-Attention for Scalable Image Classification

Neural Information Processing Systems

Can we leverage high-resolution information without the unsustainable quadratic complexity to input scale? We propose Traversal Network (TNet), a novel multi-scale hard-attention architecture, which traverses image scale-space in a top-down fashion, visiting only the most informative image regions along the way. TNet offers an adjustable trade-off between accuracy and complexity, by changing the number of attended image locations. We compare our model against hard-attention baselines on ImageNet, achieving higher accuracy with less resources (FLOPs, processing time and memory). We further test our model on fMoW dataset, where we process satellite images of size up to $896 \times 896$ px, getting up to $2.5$x faster processing compared to baselines operating on the same resolution, while achieving higher accuracy as well. TNet is modular, meaning that most classification models could be adopted as its backbone for feature extraction, making the reported performance gains orthogonal to benefits offered by existing optimized deep models. Finally, hard-attention guarantees a degree of interpretability to our model's predictions, without any extra cost beyond inference.


Learning Symmetric Rules with SA TNet

Neural Information Processing Systems

SA TNet is a differentiable constraint solver with a custom backpropagation algorithm, which can be used as a layer in a deep-learning system. It is a promising proposal for bridging deep learning and logical reasoning. In fact, SA TNet has been successfully applied to learn, among others, the rules of a complex logical puzzle, such as Sudoku, just from input and output pairs where inputs are given as images. In this paper, we show how to improve the learning of SA TNet by exploiting symmetries in the target rules of a given but unknown logical puzzle or more generally a logical formula. We present SymSA TNet, a variant of SA T - Net that translates the given symmetries of the target rules to a condition on the parameters of SA TNet and requires that the parameters should have a particular parametric form that guarantees the condition. The requirement dramatically reduces the number of parameters to learn for the rules with enough symmetries, and makes the parameter learning of SymSA TNet much easier than that of SA TNet.



Hard-Attention for Scalable Image Classification

Neural Information Processing Systems

Can we leverage high-resolution information without the unsustainable quadratic complexity to input scale? We propose Traversal Network (TNet), a novel multi-scale hard-attention architecture, which traverses image scale-space in a top-down fashion, visiting only the most informative image regions along the way. TNet offers an adjustable trade-off between accuracy and complexity, by changing the number of attended image locations. We compare our model against hard-attention baselines on ImageNet, achieving higher accuracy with less resources (FLOPs, processing time and memory). We further test our model on fMoW dataset, where we process satellite images of size up to 896 \times 896 px, getting up to 2.5 x faster processing compared to baselines operating on the same resolution, while achieving higher accuracy as well.


ZM-Net: Real-time Zero-shot Image Manipulation Network

Wang, Hao, Liang, Xiaodan, Zhang, Hao, Yeung, Dit-Yan, Xing, Eric P.

arXiv.org Machine Learning

Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. A holy-grail solution towards generic image manipulation should be able to efficiently alter an input image with any personalized signals (even signals unseen during training), such as diverse paintings and arbitrary descriptive attributes. However, existing methods are either inefficient to simultaneously process multiple signals (let alone generalize to unseen signals), or unable to handle signals from other modalities. In this paper, we make the first attempt to address the zero-shot image manipulation task. We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a fully-differentiable architecture that jointly optimizes an image-transformation network (TNet) and a parameter network (PNet). The PNet learns to generate key transformation parameters for the TNet given any guiding signal while the TNet performs fast zero-shot image manipulation according to both signal-dependent parameters from the PNet and signal-invariant parameters from the TNet itself. Extensive experiments show that our ZM-Net can perform high-quality image manipulation conditioned on different forms of guiding signals (e.g. style images and attributes) in real-time (tens of milliseconds per image) even for unseen signals. Moreover, a large-scale style dataset with over 20,000 style images is also constructed to promote further research.