AITopics | training deep learning model

Collaborating Authors

training deep learning model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Training Deep Learning Models with Norm-Constrained LMOs

Pethick, Thomas, Xie, Wanyun, Antonakopoulos, Kimon, Zhu, Zhenyu, Silveti-Falls, Antonio, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-11-2025

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision.

artificial intelligence, machine learning, training deep learning model, (15 more...)

arXiv.org Artificial Intelligence

2502.07529

Country:

North America > United States (0.15)
Asia > Middle East > Jordan (0.05)

Genre: Research Report (1.00)

Industry: Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to estimate carbon footprint when training deep learning models? A guide and review

Heguerte, Lucia Bouza, Bugeau, Aurélie, Lannelongue, Loïc

arXiv.org Artificial IntelligenceSep-25-2023

Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool. We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure.

consumption, energy consumption, experiment, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/2515-7620/acf81b

2306.08323

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California (0.04)
Europe > Denmark (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (0.88)

Industry:

Health & Medicine (1.00)
Energy (1.00)
Information Technology > Services (0.46)
Information Technology > Hardware (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding the Keras Library

#artificialintelligenceJan-29-2023, 21:50:36 GMT

Keras is a high-level neural networks API that provides an easy-to-use interface for building and training deep learning models. It is built on top of other popular deep learning frameworks, such as TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK), and provides a simpler and more user-friendly interface for building and training models. The main problem domain that Keras solves is the complexity and verbosity of building and training deep learning models using low-level frameworks such as TensorFlow and Theano. Keras provides a consistent and simple API that makes it easy to build and train models, and it abstracts the low-level details of the underlying framework. This makes it a great choice for beginners who want to get started with deep learning quickly and easily.

artificial intelligence, machine learning, training deep learning model, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fundamentals of Deep Learning for Multi-GPUs (Day 2)

#artificialintelligenceSep-7-2022, 02:02:50 GMT

Note: By registering for Day 1 you will automatically be registered for Day 2. You cannot register for Day 2. This page is a placeholder. This workshop teaches you techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you'll learn concepts for implementing PyTorch multi-GPUs to reduce the complexity of writing efficient distributed software and to maintain accuracy when training a model across many GPUs. Workshop format: Interactive presentation with hands-on exercises Target audience: This workshop is intended for researchers that would like to use multiple GPUs to train deep learning models in PyTorch. Knowledge prerequisites: Participants should be comfortable with training deep learning models using a single GPU.

deep learning model, multi-gpus, training deep learning model, (4 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AWS Deep Learning Challenge sees innovative and impactful use of Amazon EC2 DL1 instances

#artificialintelligenceAug-24-2022, 16:41:50 GMT

In the AWS Deep Learning Challenge held from January 5, 2022, to March 1, 2022, participants from academia, startups, and enterprise organizations joined to test their skills and train a deep learning model of their choice using Amazon Elastic Compute Cloud (Amazon EC2) DL1 instances and Habana's SynapseAI SDK. The EC2 DL1 instances powered by Gaudi accelerators from Habana Labs, an Intel company, are designed specifically for training deep learning models. Participants were able to realize the significant price/performance benefits that DL1 offers over GPU-based instances. We are excited to announce the winners and showcase some of the machine learning (ML) models that were trained in this hackathon. You will learn about some of the deep learning use cases that are supported by EC2 DL1 instances, including computer vision, natural language processing, and acoustic modeling.

deep learning model, impactful use, implementation, (12 more...)

#artificialintelligence

Genre: Instructional Material (0.56)

Industry:

Retail > Online (0.40)
Health & Medicine (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text-To-Image Generation

#artificialintelligenceJul-24-2022, 00:25:22 GMT

DALL-E is an artificial intelligence program that can generate images from textual descriptions, revealed by OpenAI on January 5, 2021. It uses a 12-billion parameter training version of the GPT-3 transformer model to interpret the natural language inputs and generate corresponding images. DALL-E is a text to image generation algorithm that can produce images from textual descriptions. DALL-E is a neural network that can generate images from textual descriptions, and it's pretty darn cool. With DALL-E, you can give the algorithm a textual description of an image, and it will generate a corresponding image. For example, if you were to describe a "dog" to DALL-E, it would generate an image of a dog.

algorithm, dall-e, textual description, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Transfer Learning: A Shortcut for Training Deep Learning Models

#artificialintelligenceNov-21-2020, 16:15:27 GMT

When to use Transfer Learning? In this approach, the last few fully connected layers of the pre-trained model are removed and replaced with a shallow neural network. The layers of the pre-trained model are frozen, and only the shallow neural network is trained with the available target dataset. The features extracted by the pre-trained model help the shallow to learn and perform well on the target task. The benefit of this approach is the low chance of overfitting, as we are only training the last few layers of the model, keeping the initial layers fixed.

pre-trained model, training deep learning model, transfer learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Transfer Learning: A Shortcut for Training Deep Learning Models

#artificialintelligenceNov-13-2020, 10:55:29 GMT

pre-trained model, training deep learning model, transfer learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models

#artificialintelligenceJul-12-2020, 08:28:12 GMT

The power and energy monitoring in carbontracker is limited to a few main components of computational systems. Additional power consumed by the supporting infrastructure, such as that used for cooling or power delivery, is accounted for by multiplying the measured power by the pue of the data center hosting the compute, as suggested by Strubell2019. Previous research has examined pue and its shortcomings (Yuventi2013). These shortcomings may largely be resolved by data centers reporting an average pue instead of a minimum observed value. In our work, we use a pue of 1.58, the global average for data centers in 2018 as reported by Ascierto2018.222Early

artificial intelligence, machine learning, training deep learning model, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

AWS re:Invent 2019 - Predictions And A Wishlist

#artificialintelligenceDec-1-2019, 03:51:27 GMT

With less than a week to go, the excitement and anticipation are building up for industry's largest cloud computing conference - AWS re:Invent. As an analyst, I have been attempting to predict the announcements from re:Invent (2018, 2017) with decent accuracy. But with each passing year, it's becoming increasingly tough to predict the year-end news from Vegas. Amazon is venturing into new areas that are least expected by the analysts, customers, and its competitors. AWS Ground Station is an example of how creative the teams at Amazon can get in conceiving new products and services.

amazon, customer, invent 2019, (14 more...)

#artificialintelligence

Industry:

Information Technology > Services (0.49)
Information Technology > Software (0.33)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback