AITopics

2301.08727

Country:

Europe (1.00)
North America > United States (0.45)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

arXiv.org Artificial IntelligenceJul-8-2021

Bag of Tricks for Neural Architecture Search

Elsken, Thomas, Staffler, Benedikt, Zela, Arber, Metzen, Jan Hendrik, Hutter, Frank

This allows to search for architectures by using alternating stochastic gradient descent, which (in each batch) iterates While neural architecture search methods have been successful updates of the network parameters and the real-valued in previous years and led to new state-of-the-art performance weights parameterizing the architecture. However, directly on various problems, they have also been criticized using this alternating optimization has been reported to lead for being unstable, being highly sensitive with respect to premature convergence in the architectural space [26].

architecture, artificial intelligence, neural network, (17 more...)

2107.03719

Country: Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

arXiv.org Artificial IntelligenceMay-3-2021

Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization

Guerrero-Viu, Julia, Hauns, Sven, Izquierdo, Sergio, Miotto, Guilherme, Schrodi, Simon, Biedenkapp, Andre, Elsken, Thomas, Deng, Difan, Lindauer, Marius, Hutter, Frank

Neural architecture search (NAS) and hyperparameter optimization (HPO) make deep learning accessible to non-experts by automatically finding the architecture of the deep neural network to use and tuning the hyperparameters of the used training pipeline. While both NAS and HPO have been studied extensively in recent years, NAS methods typically assume fixed hyperparameters and vice versa - there exists little work on joint NAS + HPO. Furthermore, NAS has recently often been framed as a multi-objective optimization problem, in order to take, e.g., resource requirements into account. In this paper, we propose a set of methods that extend current approaches to jointly optimize neural architectures and hyperparameters with respect to multiple objectives. We hope that these methods will serve as simple baselines for future research on multi-objective joint NAS + HPO. To facilitate this, all our code is available at https://github.com/automl/multi-obj-baselines.

deep learning, neural network, optimization, (19 more...)

2105.01015

Country:

Europe > Germany (0.15)
Oceania > Australia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-15-2020

Neural Ensemble Search for Performant and Calibrated Predictions

Zaidi, Sheheryar, Zela, Arber, Elsken, Thomas, Holmes, Chris, Hutter, Frank, Teh, Yee Whye

Ensembles of neural networks achieve superior performance compared to stand-alone networks not only in terms of accuracy on in-distribution data but also on data with distributional shift alongside improved uncertainty calibration. Diversity among networks in an ensemble is believed to be key for building strong ensembles, but typical approaches only ensemble different weight vectors of a fixed architecture. Instead, we investigate neural architecture search (NAS) for explicitly constructing ensembles to exploit diversity among networks of varying architectures and to achieve robustness against distributional shift. By directly optimizing ensemble performance, our methods implicitly encourage diversity among networks, without the need to explicitly define diversity. We find that the resulting ensembles are more diverse compared to ensembles composed of a fixed architecture and are therefore also more powerful. We show significant improvements in ensemble performance on image classification tasks both for in-distribution data and during distributional shift with better uncertainty calibration.

deep learning, ensemble, neural network, (20 more...)

2006.08573

Country:

Oceania > Australia (0.14)
North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report > Experimental Study (0.35)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningSep-30-2019

Automated design of error-resilient and hardware-efficient deep neural networks

Schorn, Christoph, Elsken, Thomas, Vogel, Sebastian, Runge, Armin, Guntoro, Andre, Ascheid, Gerd

Applying deep neural networks (DNNs) in mobile and safety-critical systems, such as autonomous vehicles, demands a reliable and efficient execution on hardware. Optimized dedicated hardware accelerators are being developed to achieve this. However, the design of efficient and reliable hardware has become increasingly difficult, due to the increased complexity of modern integrated circuit technology and its sensitivity against hardware faults, such as random bit-flips. It is thus desirable to exploit optimization potential for error resilience and efficiency also at the algorithmic side, e.g. by optimizing the architecture of the DNN. Since there are numerous design choices for the architecture of DNNs, with partially opposing effects on the preferred characteristics (such as small error rates at low latency), multi-objective optimization strategies are necessary. In this paper, we develop an evolutionary optimization technique for the automated design of hardware-optimized DNN architectures. For this purpose, we derive a set of easily computable objective functions, which enable the fast evaluation of DNN architectures with respect to their hardware efficiency and error resilience solely based on the network topology. We observe a strong correlation between predicted error resilience and actual measurements obtained from fault injection simulations. Keywords Neural Network Hardware · Error Resilience · Hardware Faults · Neural Architecture Search · Multi-Objective Optimization · AutoML 1 Introduction The application of deep neural networks (DNNs) in safety-critical perception systems, for example autonomous vehicles (A Vs), poses some challenges on the design of the underlying hardware platforms. On the one hand, efficient and fast accelerators are needed, since DNNs for computer vision exhibit massive computational requirements [55]. On the other hand, resilience against random hardware faults has to be ensured. In many driving scenarios, entering a fail-safe state is not sufficient, but fail-operational behavior and fault tolerance are required [48]. However, fault tolerance techniques at the hardware level often entail large redundancy overheads in silicon area, latency, and power consumption. These overheads stand in contrast to the low-power and low-latency requirements of embedded real-time DNN accelerators. Reliability concerns in nanoscale integrated circuits, for instance soft errors in memory and logic, represent an additional challenge for the realization of fault tolerance mechanisms at the hardware level [2, 33, 36, 68, 83]. Moreover, techniques such as near-threshold computing [26] and approximate computing [65] are desirable to meet power constraints, but can further increase error rates.

deep learning, international conference, neural network, (20 more...)

1909.13844

Country:

Europe > Germany (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Semiconductors & Electronics (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-20-2019

Understanding and Robustifying Differentiable Architecture Search

Zela, Arber, Elsken, Thomas, Saikia, Tonmoy, Marrakchi, Yassine, Brox, Thomas, Hutter, Frank

Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the space of architectures. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with smaller Hessian spectrum and with better generalization properties. Based on these observations we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling. We provide our implementation and scripts to facilitate reproducibility.

architecture, neural network, optimization problem, (19 more...)

1909.09656

Country:

Oceania > Australia (0.14)
Europe > Sweden (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

arXiv.org Machine LearningSep-5-2018

Neural Architecture Search: A Survey

Elsken, Thomas, Metzen, Jan Hendrik, Hutter, Frank

Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Because of this, there is growing interest in automated neural architecture search methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.

architecture, deep learning, neural network, (20 more...)

1808.05377

Country: Europe (0.68)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

arXiv.org Machine LearningApr-24-2018

Multi-objective Architecture Search for CNNs

Elsken, Thomas, Metzen, Jan Hendrik, Hutter, Frank

Architecture search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have come close to matching the predictive performance of manually designed architectures for image recognition, these approaches are problematic under constrained resources for two reasons: first, the architecture search itself requires vast computational resources for most proposed methods. Secondly, the found neural architectures are solely optimized for high predictive performance without penalizing excessive resource consumption. We address the first shortcoming by proposing NASH, an architecture search which considerable reduces the computational resources required for training novel architectures by applying network morphisms and aggressive learning rate schedules. On CIFAR10, NASH finds architectures with errors below 4% in only 3 days. We address the second shortcoming by proposing Pareto-NASH, a method for multi-objective architecture search that allows approximating the Pareto-front of architectures under multiple objective, such as predictive performance and number of parameters, in a single run of the method. Within 56 GPU days of architecture search, Pareto-NASH finds a model with 4M parameters and test error of 3.5%, as well as a model with less than 1M parameters and test error of 4.6%.

architecture search, deep learning, neural network, (18 more...)

1804.09081

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Machine LearningNov-13-2017

Simple And Efficient Architecture Search for Convolutional Neural Networks

Elsken, Thomas, Metzen, Jan-Hendrik, Hutter, Frank

Neural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.

algorithm, deep learning, neural network, (14 more...)

1711.04528

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)