One-shot neural architecture search features fast training of a supernet in a single run. A pivotal issue for this weight-sharing approach is the lacking of scalability. A simple adjustment with identity block renders a scalable supernet but it arouses unstable training, which makes the subsequent model ranking unreliable. In this paper, we introduce linearly equivalent transformation to soothe training turbulence, providing with the proof that such transformed path is identical with the original one as per representational power. The overall method is named as SCARLET (SCAlable supeRnet with Linearly Equivalent Transformation). We show through experiments that linearly equivalent transformations can indeed harmonize the supernet training. With an EfficientNet-like search space and a multi-objective reinforced evolutionary backend, it generates a series of competitive models: Scarlet-A achieves 76.9% Top-1 accuracy on ImageNet which outperforms EfficientNet-B0 by a large margin; the shallower Scarlet-B exemplifies the proposed scalability which attains the same accuracy 76.3% as EfficientNet-B0 with much fewer FLOPs; Scarlet-C scores competitive 75.6% with comparable sizes. The models and evaluation code are released online https://github.com/xiaomi-automl/ScarletNAS .
Recently, Neural Architecture Search has achieved great success in large-scale image classification. In contrast, there have been limited works focusing on architecture search for object detection, mainly because the costly ImageNet pretraining is always required for detectors. Training from scratch, as a substitute, demands more epochs to converge and brings no computation saving. To overcome this obstacle, we introduce a practical neural architecture transformation search(NATS) algorithm for object detection in this paper. Instead of searching and constructing an entire network, NATS explores the architecture space on the base of existing network and reusing its weights.
Cai, Han (Shanghai Jiao Tong University) | Chen, Tianyao (Shanghai Jiao Tong University) | Zhang, Weinan (Shanghai Jiao Tong University) | Yu, Yong (Shanghai Jiao Tong University) | Wang, Jun (University College London)
Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.
One of the fundamental problems in supervised classification and in machine learning in general, is the modelling of non-parametric invariances that exist in data. Most prior art has focused on enforcing priors in the form of invariances to parametric nuisance transformations that are expected to be present in data. Learning non-parametric invariances directly from data remains an important open problem. In this paper, we introduce a new architectural layer for convolutional networks which is capable of learning general invariances from data itself. This layer can learn invariance to non-parametric transformations and interestingly, motivates and incorporates permanent random connectomes, thereby being called Permanent Random Connectome Non-Parametric Transformation Networks (PRC-NPTN). PRC-NPTN networks are initialized with random connections (not just weights) which are a small subset of the connections in a fully connected convolution layer. Importantly, these connections in PRC-NPTNs once initialized remain permanent throughout training and testing. Permanent random connectomes make these architectures loosely more biologically plausible than many other mainstream network architectures which require highly ordered structures. We motivate randomly initialized connections as a simple method to learn invariance from data itself while invoking invariance towards multiple nuisance transformations simultaneously. We find that these randomly initialized permanent connections have positive effects on generalization, outperform much larger ConvNet baselines and the recently proposed Non-Parametric Transformation Network (NPTN) on benchmarks that enforce learning invariances from the data itself.
--Multivariate time series prediction has applications in a wide variety of domains and is considered to be a very challenging task, especially when the variables have correlations and exhibit complex temporal patterns, such as seasonality and trend. Many existing methods suffer from strong statistical assumptions, numerical issues with high dimensionality, manual feature engineering efforts, and scalability. In this work, we present a novel deep learning architecture, known as T emporal T ensor Transformation Network, which transforms the original multivariate time series into a higher order of tensor through the proposed T emporal-Slicing Stack Transformation. This yields a new representation of the original multivariate time series, which enables the convolution kernel to extract complex and nonlinear features as well as variable interactional signals from a relatively large temporal region. Experimental results show that T emporal T ensor Transformation Network outperforms several state-of-the-art methods on window-based predictions across various tasks. The proposed architecture also demonstrates robust prediction performance through an extensive sensitivity analysis. Index T erms--multivariate time series, prediction, convolution, deep learning, tensor transformation I. I NTRODUCTION Multivariate time series analysis has gained wide spread applications in many fields, e.g., financial market prediction, weather forecasting, and energy consumption prediction. It is used to model and explain the underlying temporal patterns among a group of time series variables in dynamical systems. V arious methods have been proposed to predict multivariate time series based on statistical modeling and deep neural networks. Classical statistical models assume that the time series is stationary, i.e., the summary statistics of data points are consistent over time. Preprocessing procedures are usually needed to remove trend, seasonality, and other time-dependent structures from the raw series in order to make the data stationary. In addition, these models also assume the independence condition in the underlying linear regression problem, i.e., the random errors in the model are not correlated over time.