Designing the architecture for an artificial neural network is a cumbersome task because of the numerous parameters to configure, including activation functions, layer types, and hyper-parameters. With the large number of parameters for most networks nowadays, it is intractable to find a good configuration for a given task by hand. In this paper an Efficient Global Optimization (EGO) algorithm is adapted to automatically optimize and configure convolutional neural network architectures. A configurable neural network architecture based solely on convolutional layers is proposed for the optimization. Without using any knowledge on the target problem and not using any data augmentation techniques, it is shown that on several image classification tasks this approach is able to find competitive network architectures in terms of prediction accuracy, compared to the best hand-crafted ones in literature. In addition, a very small training budget (200 evaluations and 10 epochs in training) is spent on each optimized architectures in contrast to the usual long training time of hand-crafted networks. Moreover, instead of the standard sequential evaluation in EGO, several candidate architectures are proposed and evaluated in parallel, which saves the execution overheads significantly and leads to an efficient automation for deep neural network design.
In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech front-end processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.
Neural Architecture Search (NAS) has shown great potentials in finding a better neural network design than human design. Sample-based NAS is the most fundamental method aiming at exploring the search space and evaluating the most promising architecture. However, few works have focused on improving the sampling efficiency for a multi-objective NAS. Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor. Specifically, we apply GCN as a surrogate model to adaptively discover and incorporate nodes structure to approximate the performance of the architecture. Our method further considers an efficient multi-objective search which can be flexibly injected into any sample-based NAS pipelines to efficiently find the best speed/accuracy tradeoff. Extensive experiments are conducted to verify the effectiveness of our method over many competing methods, e.g. Recently Neural Architecture Search (NAS) has aroused a surge of interest by its potentials of freeing the researchers from tedious and time-consuming architecture tuning for each new task and dataset. Specifically, NAS has already shown some competitive results comparing with handcrafted architectures in computer vision: classification (Real et al., 2019b), detection, segmentation (Ghiasi et al., 2019; Chen et al., 2019; Liu et al., 2019a) and super-resolution (Chu et al., 2019). Meanwhile, NAS has also achieved remarkable results in natural language processing tasks (Luong et al., 2018; So et al., 2019). A variety of search strategies have been proposed, which may be categorized into two groups: one-shot NAS algorithms (Liu et al., 2019b; Pham et al., 2018; Luo et al., 2018), and sample-based algorithms (Zoph & Le, 2017; Liu et al., 2018a; Real et al., 2019b).
The problem of predicting the motion of a vehicle has been investigated by several researchers. Many have used Kalman filter techniques based on the equations of vehicle motion; these techniques most accurately predict shortterm motion. In contrast, my dissertation (Krozel 1992)1 presents a methodology for intelligent path prediction, where predicting the motion of an observed vehicle is performed by reasoning about the decision-making strategy of the vehicle's operator.
Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the space of architectures. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with smaller Hessian spectrum and with better generalization properties. Based on these observations we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling. We provide our implementation and scripts to facilitate reproducibility.