AITopics | initializing

MetaInit: Initializing learning by learning to initialize

Neural Information Processing SystemsFeb-6-2026, 09:21:38 GMT

Deep learning models frequently trade handcrafted features for deep features learned with much less human intervention using gradient descent. While this paradigm has been enormously successful, deep networks are often difficult to train and performance can depend crucially on the initial choice of parameters. In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning. Our approach is based on a hypothesis that good initializations make gradient descent easier by starting in regions that look locally linear with minimal second order effects. We formalize this notion via a quantity that we call the gradient quotient, which can be computed with any architecture or dataset. MetaInit minimizes this quantity efficiently by using gradient descent to tune the norms of the initial weight matrices. We conduct experiments on plain and residual networks and show that the algorithm can automatically recover from a class of bad initializations. MetaInit allows us to train networks and achieve performance competitive with the state-of-the-art without batch normalization or residual connections. In particular, we find that this approach outperforms normalization for networks without skip connections on CIFAR-10 and can scale to Resnet-50 models on Imagenet.

artificial intelligence, machine learning, metainit, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Reviews: MetaInit: Initializing learning by learning to initialize

Neural Information Processing SystemsJan-25-2025, 09:04:27 GMT

Update: The authors have addressed my questions. I hope in the camera ready there is a clear discussion on Taylor expansion VS. finite difference (at least in the appendix). I also second the other reviewer on the importance of comparing in the batch norm case, since the method should be used as a general purpose initialization scheme. Longer Summary: - Authors introduce the GradientDeviation criterion, which characterizes how much gradient changes after gradient step. Simple and avoids full Hessian () - They use meta-learning to learn the scale of initialization such that GradientDeviation is minimized.

initialization scheme, initialize, initializing, (4 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

MetaInit: Initializing learning by learning to initialize

Neural Information Processing SystemsOct-10-2024, 10:56:22 GMT

Deep learning models frequently trade handcrafted features for deep features learned with much less human intervention using gradient descent. While this paradigm has been enormously successful, deep networks are often difficult to train and performance can depend crucially on the initial choice of parameters. In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning. Our approach is based on a hypothesis that good initializations make gradient descent easier by starting in regions that look locally linear with minimal second order effects. We formalize this notion via a quantity that we call the gradient quotient, which can be computed with any architecture or dataset.

gradient descent, initializing, metainit, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls

Bensaoud, Ahmed, Kalita, Jugal

arXiv.org Artificial IntelligenceMay-3-2024

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.

classification, configuration, dataset, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.knosys.2024.111543

2405.02548

Country:

North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Greece (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Artificial Neural Network for Regression

#artificialintelligenceAug-11-2020, 12:25:16 GMT

Udemy Coupon ED Artificial Neural Network for Regression Build an ANN Regression model to predict the electrical energy output of a Combined Cycle Power Plant - Free Course. Get Course What You'll Learn How to implement an Artificial Neural Network in Python How to do Regression How to use Google Colab Description Are you ready to flex your Deep Learning skills by learning how to build and implement an Artificial Neural Network using Python from scratch? Testing your skills with practical courses is one of the best and most enjoyable ways to learn data science…and now we're giving you that chance for FREE. In this free course, AI expert Hadelin de Ponteves guides you through a case study that shows you how to build an ANN Regression model to predict the electrical energy output of a Combined Cycle Power Plant. The objective is to create a data model that predicts the net hourly electrical energy output (EP) of the plant using available hourly average ambient variables.

artificial intelligence, artificial neural network, machine learning, (15 more...)

#artificialintelligence

Industry:

Energy > Power Industry (0.87)
Education > Educational Setting (0.79)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MetaInit: Initializing learning by learning to initialize

Dauphin, Yann N., Schoenholz, Samuel

Neural Information Processing SystemsMar-19-2020, 01:48:01 GMT

Deep learning models frequently trade handcrafted features for deep features learned with much less human intervention using gradient descent. While this paradigm has been enormously successful, deep networks are often difficult to train and performance can depend crucially on the initial choice of parameters. In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning. Our approach is based on a hypothesis that good initializations make gradient descent easier by starting in regions that look locally linear with minimal second order effects. We formalize this notion via a quantity that we call the gradient quotient, which can be computed with any architecture or dataset.

gradient descent, initializing, metainit, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Filters

Collaborating Authors

initializing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

MetaInit: Initializing learning by learning to initialize

Reviews: MetaInit: Initializing learning by learning to initialize

MetaInit: Initializing learning by learning to initialize

CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls

Artificial Neural Network for Regression

MetaInit: Initializing learning by learning to initialize