Goto

Collaborating Authors

 initializing


Reviews: MetaInit: Initializing learning by learning to initialize

Neural Information Processing Systems

Update: The authors have addressed my questions. I hope in the camera ready there is a clear discussion on Taylor expansion VS. finite difference (at least in the appendix). I also second the other reviewer on the importance of comparing in the batch norm case, since the method should be used as a general purpose initialization scheme. Longer Summary: - Authors introduce the GradientDeviation criterion, which characterizes how much gradient changes after gradient step. Simple and avoids full Hessian () - They use meta-learning to learn the scale of initialization such that GradientDeviation is minimized.


MetaInit: Initializing learning by learning to initialize

Neural Information Processing Systems

Deep learning models frequently trade handcrafted features for deep features learned with much less human intervention using gradient descent. While this paradigm has been enormously successful, deep networks are often difficult to train and performance can depend crucially on the initial choice of parameters. In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning. Our approach is based on a hypothesis that good initializations make gradient descent easier by starting in regions that look locally linear with minimal second order effects. We formalize this notion via a quantity that we call the gradient quotient, which can be computed with any architecture or dataset.


CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls

Bensaoud, Ahmed, Kalita, Jugal

arXiv.org Artificial Intelligence

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.


Artificial Neural Network for Regression

#artificialintelligence

Udemy Coupon ED Artificial Neural Network for Regression Build an ANN Regression model to predict the electrical energy output of a Combined Cycle Power Plant - Free Course. Get Course What You'll Learn How to implement an Artificial Neural Network in Python How to do Regression How to use Google Colab Description Are you ready to flex your Deep Learning skills by learning how to build and implement an Artificial Neural Network using Python from scratch? Testing your skills with practical courses is one of the best and most enjoyable ways to learn data science…and now we're giving you that chance for FREE. In this free course, AI expert Hadelin de Ponteves guides you through a case study that shows you how to build an ANN Regression model to predict the electrical energy output of a Combined Cycle Power Plant. The objective is to create a data model that predicts the net hourly electrical energy output (EP) of the plant using available hourly average ambient variables.


MetaInit: Initializing learning by learning to initialize

Dauphin, Yann N., Schoenholz, Samuel

Neural Information Processing Systems

Deep learning models frequently trade handcrafted features for deep features learned with much less human intervention using gradient descent. While this paradigm has been enormously successful, deep networks are often difficult to train and performance can depend crucially on the initial choice of parameters. In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning. Our approach is based on a hypothesis that good initializations make gradient descent easier by starting in regions that look locally linear with minimal second order effects. We formalize this notion via a quantity that we call the gradient quotient, which can be computed with any architecture or dataset.