AITopics | smaller network

Collaborating Authors

smaller network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization

Neural Information Processing SystemsApr-25-2026, 09:37:53 GMT

What makes a classifier have the ability to generalize? There have been a lot of important attempts to address this question, but a clear answer is still elusive. Proponents of complexity theory find that the complexity of the classifier's function space is key to deciding generalization, whereas other recent work reveals that classifiers which extract invariant feature representations are likely to generalize better. Recent theoretical and empirical studies, however, have shown that even within a classifier's function space, there can be significant differences in the ability to generalize. Specifically, empirical studies have shown that among functions which have a good training data fit, functions with lower Kolmogorov complexity (KC) are likely to generalize better, while the opposite is true for functions of higher KC.

artificial intelligence, machine learning, regularization, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

51200d29d1fc15f5a71c1dab4bb54f7c-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 22:16:06 GMT

artificial intelligence, dataset, policy selection, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

A multilevel approach to accelerate the training of Transformers

Lauga, Guillaume, Chaumette, Maël, Desainte-Maréville, Edgar, Lasalle, Étienne, Lebeurrier, Arthur

arXiv.org Artificial IntelligenceApr-29-2025

In this article, we investigate the potential of multilevel approaches to accelerate the training of transformer architectures. Using an ordinary differential equation (ODE) interpretation of these architectures, we propose an appropriate way of varying the discretization of these ODE Transformers in order to accelerate the training. We validate our approach experimentally by a comparison with the standard training procedure.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.1859

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.91)

Add feedback

Learning to grow machine-learning models

#artificialintelligenceMar-24-2023, 11:24:49 GMT

It's no secret that OpenAI's ChatGPT has some incredible capabilities -- for instance, the chatbot can write poetry that resembles Shakespearean sonnets or debug code for a computer program. These abilities are made possible by the massive machine-learning model that ChatGPT is built upon. Researchers have found that when these types of models become large enough, extraordinary capabilities emerge. But bigger models also require more time and money to train. The training process involves showing hundreds of billions of examples to a model.

larger model, machine-learning model, smaller model, (15 more...)

#artificialintelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
North America > United States > Texas > Travis County > Austin (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficiently Learning Small Policies for Locomotion and Manipulation

Hegde, Shashank, Sukhatme, Gaurav S.

arXiv.org Artificial IntelligenceSep-30-2022

Neural control of memory-constrained, agile robots requires small, yet highly performant models. We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning resulting in networks that are two orders of magnitude smaller than commonly used networks yet encode policies comparable to those encoded by much larger networks trained on the same task. We show that our method can be appended to any off-policy reinforcement learning algorithm, without any change in hyperparameters, by showing results across locomotion and manipulation tasks. Further, we obtain an array of working policies, with differing numbers of parameters, allowing us to pick an optimal network for the memory constraints of a system. Training multiple policies with our method is as sample efficient as training a single policy. Finally, we provide a method to select the best architecture, given a constraint on the number of parameters. Project website: https://sites.google.com/usc.edu/graphhyperpolicy

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2210.0014

Country:

North America > United States > California (0.14)
Europe > France (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.31)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Photos to 3D Scenes in Milliseconds

#artificialintelligenceApr-9-2022, 08:00:20 GMT

As if taking a picture wasn't a challenging enough technological prowess, we are now doing the opposite: modeling the world from pictures. I've covered amazing AI-based models that could take images and turn them into high-quality scenes. A challenging task consists of taking a few images in the 2-dimensional picture world to create how the object or person would look in the real world. You can easily see how useful this technology is for many industries like video games, animation movies, or advertising. Take a few pictures and instantly have a realistic model to insert into your product.

nerf, representation, smaller network, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

How knowledge distillation compresses neural networks

#artificialintelligenceOct-28-2020, 02:15:05 GMT

If you've ever used a neural network to solve a complex problem, you know they can be enormous in size, containing millions of parameters. For instance, the famous BERT model has about 110 million. To illustrate the point, this is the number of parameters for the most common architectures in (natural language processing) NLP, as summarized in the recent State of AI Report 2020 by Nathan Benaich and Ian Hogarth. In Kaggle competitions, the winner models are often ensembles, composed of several predictors. Although they can beat simple models by a large margin in terms of accuracy, their enormous computational costs make them utterly unusable in practice. Is there any way to somehow leverage these powerful but massive models to train state of the art models, without scaling the hardware?

artificial intelligence, machine learning, natural language, (15 more...)

#artificialintelligence

Genre: Research Report > Promising Solution (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Better Together: Resnet-50 accuracy with $13 \times$ fewer parameters and at $3\times$ speed

Nath, Utkarsh, Kushagra, Shrinu

arXiv.org Machine LearningOct-9-2020

Recent research on compressing deep neural networks has focused on reducing the number of parameters. Smaller networks are easier to export and deploy on edge-devices. We introduce Adjoined networks as a training approach that can regularize and compress any CNN-based neural architecture. Our one-shot learning paradigm trains both the original and the smaller networks together. The parameters of the smaller network are shared across both the architectures. We prove strong theoretical guarantees on the regularization behavior of the adjoint training paradigm. We complement our theoretical analysis by an extensive empirical evaluation of both the compression and regularization behavior of adjoint networks. For resnet-50 trained adjointly on Imagenet, we are able to achieve a $13.7x$ reduction in the number of parameters (For size comparison, we ignore the parameters in the last linear layer as it varies by dataset and are typically dropped during fine-tuning. Else, the reductions are $11.5x$ and $95x$ for imagenet and cifar-100 respectively.) and a $3x$ improvement in inference time without any significant drop in accuracy. For the same architecture on CIFAR-100, we are able to achieve a $99.7x$ reduction in the number of parameters and a $5x$ improvement in inference time. On both these datasets, the original network trained in the adjoint fashion gains about $3\%$ in top-1 accuracy as compared to the same network trained in the standard fashion.

artificial intelligence, machine learning, training paradigm, (17 more...)

arXiv.org Machine Learning

2006.05624

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IBM's AI classifies seizures with 98.4% accuracy using EEG data

#artificialintelligenceApr-7-2020, 00:51:06 GMT

In a paper published on the preprint server Arxiv.org this week, IBM researchers describe SeizureNet, a machine learning framework that learns the features of seizures to classify various types. They say that it achieves state-of-the-art classification accuracy on a popular data set, and that it helps to improve the classification accuracy of smaller networks for applications with low memory and faster inference. If the claims stand up to academic scrutiny, the framework could, for instance, help the over 3.4 million people with epilepsy better understand the factors that trigger their seizures. The World Health Organization estimates that up to 70% of people living with epilepsy could live seizure-free if properly diagnosed and treated. SeizureNet is a machine learning framework consisting of individual classifiers (specifically convolutional neural networks) that learn the features of electroencephalograms (EEGs) -- i.e., tests that evaluate the electrical activity in the brain -- to predict seizure types.

accuracy, seizure, seizurenet, (10 more...)

#artificialintelligence

Country:

Oceania > Australia (0.06)
Asia > Bangladesh (0.06)

Genre: Research Report (0.74)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Research Guide: Model Distillation Techniques for Deep Learning

#artificialintelligenceDec-20-2019, 20:04:54 GMT

Knowledge distillation is a model compression technique whereby a small network (student) is taught by a larger trained neural network (teacher). The smaller network is trained to behave like the large neural network. This enables the deployment of such models on small devices such as mobile phones or other edge devices. In this guide, we'll look at a couple of papers that attempt to tackle this challenge. In this paper, a small model is trained to generalize in the same way as the larger teacher model.

model distillation technique, research guide, smaller network, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback