AITopics | distilling

Collaborating Authors

distilling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Moonshine: Distilling with Cheap Convolutions

Neural Information Processing SystemsNov-20-2025, 22:09:02 GMT

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

distilling, moonshine, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Distilling Named Entity Recognition Models for Endangered Species from Large Language Models

Atuhurra, Jesse, Dujohn, Seiveright Cargill, Kamigaito, Hidetaka, Shindo, Hiroyuki, Watanabe, Taro

arXiv.org Artificial IntelligenceMar-13-2024

Natural language processing (NLP) practitioners are leveraging large language models (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge. At the same time, ecological experts are searching for a variety of means to preserve biodiversity. To contribute to these efforts, we focused on endangered species and through in-context learning, we distilled knowledge from GPT-4. In effect, we created datasets for both named entity recognition (NER) and relation extraction (RE) via a two-stage process: 1) we generated synthetic data from GPT-4 of four classes of endangered species, 2) humans verified the factual accuracy of the synthetic data, resulting in gold data. Eventually, our novel dataset contains a total of 3.6K sentences, evenly divided between 1.8K NER and 1.8K RE sentences. The constructed dataset was then used to fine-tune both general BERT and domain-specific BERT variants, completing the knowledge distillation process from GPT-4 to BERT, because GPT-4 is resource intensive. Experiments show that our knowledge transfer approach is effective at creating a NER model suitable for detecting endangered species from texts.

evaluation, gpt-4, information, (14 more...)

arXiv.org Artificial Intelligence

2403.1543

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.51)

Industry: Law > Environmental Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distilling the Knowledge in Data Pruning

Ben-Baruch, Emanuel, Botach, Adam, Kviatkovsky, Igor, Aggarwal, Manoj, Medioni, Gérard

arXiv.org Artificial IntelligenceMar-12-2024

With the increasing size of datasets used for training neural networks, data pruning becomes an attractive field of research. However, most current data pruning algorithms are limited in their ability to preserve accuracy compared to models trained on the full data, especially in high pruning regimes. In this paper we explore the application of data pruning while incorporating knowledge distillation (KD) when training on a pruned subset. That is, rather than relying solely on ground-truth labels, we also use the soft predictions from a teacher network pre-trained on the complete data. By integrating KD into training, we demonstrate significant improvement across datasets, pruning methods, and on all pruning fractions. We first establish a theoretical motivation for employing self-distillation to improve training on pruned data. Then, we empirically make a compelling and highly practical observation: using KD, simple random pruning is comparable or superior to sophisticated pruning methods across all pruning regimes. On ImageNet for example, we achieve superior accuracy despite training on a random subset of only 50% of the data. Additionally, we demonstrate a crucial connection between the pruning factor and the optimal knowledge distillation weight. This helps mitigate the impact of samples with noisy labels and low-quality images retained by typical pruning algorithms. Finally, we make an intriguing observation: when using lower pruning fractions, larger teachers lead to accuracy degradation, while surprisingly, employing teachers with a smaller capacity than the student's may improve results. Our code will be made available.

dataset, distillation, semanticscholar, (15 more...)

arXiv.org Artificial Intelligence

2403.07854

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.47)
Information Technology (0.34)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Distilling What We Know

Communications of the ACMAug-24-2023, 13:30:20 GMT

The sheer size and complexity of today's generative pretrained transformer (GPT) models is nothing less than astounding. OpenAI's GPT-3, for example, possesses somewhere in the neighborhood of 175 billion parameters, and there is speculation GPT-4 could have as many as 10 trillion parameters.a All of this introduces enormous overhead in terms of required cloud resources, including compute cycles and energy consumption. At the moment, the computer power required to train state-of-the-art artificial intelligence (AI) models is rising at a rate of 15x every two years.b The cost of training a large GPT model can run into the millions of dollars.c

accuracy, gpt model, quantization, (16 more...)

Communications of the ACM

Country:

North America > United States > Oregon > Clackamas County > West Linn (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.30)

Industry: Information Technology (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

How knowledge distillation compresses neural networks

#artificialintelligenceOct-28-2020, 02:15:05 GMT

If you've ever used a neural network to solve a complex problem, you know they can be enormous in size, containing millions of parameters. For instance, the famous BERT model has about 110 million. To illustrate the point, this is the number of parameters for the most common architectures in (natural language processing) NLP, as summarized in the recent State of AI Report 2020 by Nathan Benaich and Ian Hogarth. In Kaggle competitions, the winner models are often ensembles, composed of several predictors. Although they can beat simple models by a large margin in terms of accuracy, their enormous computational costs make them utterly unusable in practice. Is there any way to somehow leverage these powerful but massive models to train state of the art models, without scaling the hardware?

artificial intelligence, machine learning, natural language, (15 more...)

#artificialintelligence

Genre: Research Report > Promising Solution (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Moonshine: Distilling with Cheap Convolutions

Crowley, Elliot J., Gray, Gavin, Storkey, Amos J.

Neural Information Processing SystemsFeb-14-2020, 11:43:42 GMT

cheap convolution, distilling, moonshine, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Distilling a Neural Network into a soft decision tree

#artificialintelligenceFeb-5-2019, 01:45:02 GMT

As part of the commitment to continuous (& cutting edge) research at Razorthink Inc, we are coming up with a series of review papers which will screen through the best of research done in the field of deep learning, machine learning, data science and artificial intelligence in general, across the globe. Each week, we will pick up one research paper, break it down to make it easier to understand, take you through the entire research approach, major takeaways and finally bring in the applicability in real use-cases. Our first pick in the series is "Distilling a Neural Network into a soft decision tree" (download link at the bottom) originally written by Nicholas Frosst & Geoffrey Hinton (Google Brain Team). Deep Neural networks have been proven to be very effective in performing tasks that involve classification and prediction based on the complexity of the data. Most importantly, it is highly useful in situations where the input data has a complex relationship with the target variable and the dimensions of the input data is very high.

artificial intelligence, decision tree, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Distilling a Neural Network Into a Soft Decision Tree

Frosst, Nicholas, Hinton, Geoffrey

arXiv.org Machine LearningNov-27-2017

Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large [Szegedy et al., 2015, Wu et al., 2016, Jozefowicz et al., 2016, Graves et al., 2013]. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data.

artificial intelligence, decision tree, machine learning, (17 more...)

arXiv.org Machine Learning

1711.09784

Country: North America (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback