AITopics | Menghani, Gaurav

Collaborating Authors

Menghani, Gaurav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LAuReL: Learned Augmented Residual Layer

Menghani, Gaurav, Kumar, Ravi, Kumar, Sanjiv

arXiv.org Artificial IntelligenceNov-13-2024

One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce \emph{Learned Augmented Residual Layer} (LAuReL) -- a novel generalization of the canonical residual connection -- with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using \laurel can help boost performance for both vision and language models. For example, on the ResNet-50, ImageNet 1K task, it achieves $60\%$ of the gains from adding an extra layer, while only adding $0.003\%$ more parameters, and matches it while adding $2.6\times$ fewer parameters.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.07501

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Kontonis, Vasilis, Iliopoulos, Fotis, Trinh, Khoa, Baykal, Cenk, Menghani, Gaurav, Vee, Erik

arXiv.org Artificial IntelligenceJun-8-2023

Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates ``soft'' pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called ``forward loss-adjustment" methods.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.03806

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Robust Active Distillation

Baykal, Cenk, Trinh, Khoa, Iliopoulos, Fotis, Menghani, Gaurav, Vee, Erik

arXiv.org Artificial IntelligenceFeb-4-2023

Distilling knowledge from a large teacher model to a lightweight one is a widely successful approach for generating compact, powerful models in the semi-supervised learning setting where a limited amount of labeled data is available. In large-scale applications, however, the teacher tends to provide a large number of incorrect soft-labels that impairs student performance. The sheer size of the teacher additionally constrains the number of soft-labels that can be queried due to prohibitive computational and/or financial costs. The difficulty in achieving simultaneous efficiency (i.e., minimizing soft-label queries) and robustness (i.e., avoiding student inaccuracies due to incorrect labels) hurts the widespread application of knowledge distillation to many modern tasks. In this paper, we present a parameter-free approach with provable guarantees to query the soft-labels of points that are simultaneously informative and correctly labeled by the teacher. At the core of our work lies a game-theoretic formulation that explicitly considers the inherent trade-off between the informativeness and correctness of input instances. We establish bounds on the expected performance of our approach that hold even in worst-case distillation instances. We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods. Deep neural network models have been unprecedentedly successful in many high-impact application areas such as Natural Language Processing (Ramesh et al., 2021; Brown et al., 2020) and Computer Vision (Ramesh et al., 2021; Niemeyer & Geiger, 2021). However, this has come at the cost of using increasingly large labeled data sets and high-capacity network models that tend to contain billions of parameters (Devlin et al., 2018).

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.01213

Country: North America > United States > Oregon (0.14)

Genre: Research Report (1.00)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Menghani, Gaurav

arXiv.org Artificial IntelligenceJun-21-2021

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3578938

2106.08962

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > California > Santa Clara County (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Overview (0.68)
Research Report (0.50)

Industry:

Education (0.93)
Information Technology > Services (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback