metaparameter
- Leisure & Entertainment > Games (0.68)
- Leisure & Entertainment > Sports (0.67)
Optimizing ML Training with Metagradient Descent
Engstrom, Logan, Ilyas, Andrew, Chen, Benjamin, Feldmann, Axel, Moses, William, Madry, Aleksander
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (2 more...)
- Transportation (0.46)
- Energy (0.46)
- Information Technology (0.45)
- Health & Medicine > Diagnostic Medicine (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Accelerated training of deep learning surrogate models for surface displacement and flow, with application to MCMC-based history matching of CO2 storage operations
Han, Yifu, Hamon, Francois P., Durlofsky, Louis J.
Deep learning surrogate modeling shows great promise for subsurface flow applications, but the training demands can be substantial. Here we introduce a new surrogate modeling framework to predict CO2 saturation, pressure and surface displacement for use in the history matching of carbon storage operations. Rather than train using a large number of expensive coupled flow-geomechanics simulation runs, training here involves a large number of inexpensive flow-only simulations combined with a much smaller number of coupled runs. The flow-only runs use an effective rock compressibility, which is shown to provide accurate predictions for saturation and pressure for our system. A recurrent residual U-Net architecture is applied for the saturation and pressure surrogate models, while a new residual U-Net model is introduced to predict surface displacement. The surface displacement surrogate accepts, as inputs, geomodel quantities along with saturation and pressure surrogate predictions. Median relative error for a diverse test set is less than 4% for all variables. The surrogate models are incorporated into a hierarchical Markov chain Monte Carlo history matching workflow. Surrogate error is included using a new treatment involving the full model error covariance matrix. A high degree of prior uncertainty, with geomodels characterized by uncertain geological scenario parameters (metaparameters) and associated realizations, is considered. History matching results for a synthetic true model are generated using in-situ monitoring-well data only, surface displacement data only, and both data types. The enhanced uncertainty reduction achieved with both data types is quantified. Posterior saturation and surface displacement fields are shown to correspond well with the true solution.
- Research Report (0.64)
- Instructional Material (0.50)
Additive regularization schedule for neural architecture search
Potanin, Mark, Vayser, Kirill, Strijov, Vadim
Neural network structures have a critical impact on the accuracy and stability of forecasting. Neural architecture search procedures help design an optimal neural network according to some loss function, which represents a set of quality criteria. This paper investigates the problem of neural network structure optimization. It proposes a way to construct a loss function, which contains a set of additive elements. Each element is called the regularizer. It corresponds to some part of the neural network structure and represents a criterion to optimize. The optimization procedure changes the structure in iterations. To optimize various parts of the structure, the procedure changes the set of regularizers according to some schedule. The authors propose a way to construct the additive regularization schedule. By comparing regularized models with non-regularized ones for a collection of datasets the computational experiments show that the proposed method finds efficient neural network structure and delivers accurate networks of low complexity.
Smooth Mathematical Function from Compact Neural Networks
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Bounding generalization error with input compression: An empirical study with infinite-width networks
Galloway, Angus, Golubeva, Anna, Salem, Mahmoud, Nica, Mihai, Ioannou, Yani, Taylor, Graham W.
Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.
- North America > Canada > Ontario > Wellington County > Guelph (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
A deep understanding of deep learning (with Python intro)
Deep learning is increasingly dominating technology and has major implications for society. From self-driving cars to medical diagnoses, from face recognition to deep fakes, and from language translation to music generation, deep learning is spreading like wildfire throughout all areas of modern technology. But deep learning is not only about super-fancy, cutting-edge, highly sophisticated applications. Deep learning is increasingly becoming a standard tool in machine-learning, data science, and statistics. Deep learning is used by small startups for data mining and dimension reduction, by governments for detecting tax evasion, and by scientists for detecting patterns in their research data.