AITopics | glue benchmark

Collaborating Authors

glue benchmark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6f5216f8d89b086c18298e043bfe48ed-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 20:42:31 GMT

dynabert, glue benchmark, multiplier, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6f5216f8d89b086c18298e043bfe48ed-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 05:02:08 GMT

accuracy, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Experimental Details

Neural Information Processing SystemsAug-17-2025, 16:26:37 GMT

The accuracy drop varies a lot under different benchmarks and scenarios.

large language model, machine learning, quantization, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Catastrophic Forgetting in LLMs: A Comparative Analysis Across Language Tasks

Haque, Naimul

arXiv.org Artificial IntelligenceApr-1-2025

Large Language Models (LLMs) have significantly advanced Natural Language Processing (NLP), particularly in Natural Language Understanding (NLU) tasks. As we progress toward an agentic world where LLM-based agents autonomously handle specialized tasks, it becomes crucial for these models to adapt to new tasks without forgetting previously learned information - a challenge known as catastrophic forgetting. This study evaluates the continual fine-tuning of various open-source LLMs with different parameter sizes (specifically models under 10 billion parameters) on key NLU tasks from the GLUE benchmark, including SST-2, MRPC, CoLA, and MNLI. By employing prompt engineering and task-specific adjustments, we assess and compare the models' abilities to retain prior knowledge while learning new tasks. Our results indicate that models such as Phi-3.5-mini exhibit minimal forgetting while maintaining strong learning capabilities, making them well-suited for continual learning environments. Additionally, models like Orca-2-7b and Qwen2.5-7B demonstrate impressive learning abilities and overall performance after fine-tuning. This work contributes to understanding catastrophic forgetting in LLMs and highlights prompting engineering to optimize model performance for continual learning scenarios.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.01241

Country:

North America > United States (0.14)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.35)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models

Hegde, Suhas G, Kaur, Shilpy, Tiwari, Aruna

arXiv.org Artificial IntelligenceMar-25-2025

Popular PEFT methods achieve parameter efficiency by assuming that incremental weight updates are inherently low-rank, which often leads to a performance gap compared to full fine-tuning. While recent methods have attempted to address this limitation, they typically lack sufficient parameter and memory efficiency. We propose VectorFit, an effective and easily deployable approach that adaptively trains the singular vectors and biases of pre-trained weight matrices. We demonstrate that the utilization of structural and transformational characteristics of pre-trained weights enables high-rank updates comparable to those of full fine-tuning. As a result, VectorFit achieves superior performance with 9X less trainable parameters compared to state-of-the-art PEFT methods. Through extensive experiments over 17 datasets spanning diverse language and vision tasks such as natural language understanding and generation, question answering, image classification, and image generation, we exhibit that VectorFit consistently outperforms baselines, even in extremely low-budget scenarios.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.1953

Country:

North America > United States (0.04)
North America > Dominican Republic (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
(2 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks

Suganthan, Paul, Moiseev, Fedor, Yan, Le, Wu, Junru, Ni, Jianmo, Han, Jay, Zitouni, Imed, Alfonseca, Enrique, Wang, Xuanhui, Dong, Zhe

arXiv.org Artificial IntelligenceMar-4-2025

Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in tasks like classification, regression, and ranking. This is primarily due to the inherent structure of decoder-based models, which limits their direct applicability to these tasks. In this paper, we introduce Gemma Encoder, adapting the powerful Gemma decoder model to an encoder architecture, thereby unlocking its potential for a wider range of non-generative applications. To optimize the adaptation from decoder to encoder, we systematically analyze various pooling strategies, attention mechanisms, and hyperparameters (e.g., dropout rate). Furthermore, we benchmark Gemma Encoder against established approaches on the GLUE benchmarks, and MS MARCO ranking benchmark, demonstrating its effectiveness and versatility.

benchmark, decoder-based language model, gemma encoder, (12 more...)

arXiv.org Artificial Intelligence

2503.02656

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

EmbBERT-Q: Breaking Memory Barriers in Embedded NLP

Bravin, Riccardo, Pavan, Massimo, Shalby, Hazem Hesham Yousef, Pittorino, Fabrizio, Roveri, Manuel

arXiv.org Artificial IntelligenceFeb-14-2025

Large Language Models (LLMs) have revolutionized natural language processing, setting new standards across a wide range of applications. However, their relevant memory and computational demands make them impractical for deployment on technologically-constrained tiny devices such as wearable devices and Internet-of-Things units. To address this limitation, we introduce EmbBERT-Q, a novel tiny language model specifically designed for tiny devices with stringent memory constraints. EmbBERT-Q achieves state-of-the-art (SotA) accuracy in Natural Language Processing tasks in this scenario, with a total memory footprint (weights and activations) of just 781 kB, representing a 25x reduction in size with respect to SotA models. By combining architectural innovations with hardware-compatible 8-bit quantization, EmbBERT-Q consistently outperforms several baseline models scaled down to a 2 MB memory budget (i.e., the maximum memory typically available in tiny devices), including heavily compressed versions of BERT and MAMBA. Extensive experimental evaluations on both a selected benchmark dataset, TinyNLP, specifically curated to evaluate Tiny Language Models in NLP tasks and real-world scenarios, and the GLUE benchmark, demonstrate EmbBERT-Q ability to deliver competitive accuracy with respect to existing approaches, achieving an unmatched balance between memory and performance. To ensure the complete and immediate reproducibility of all our results, we release all code, scripts, and model checkpoints at https://github.com/RiccardoBravin/tiny-LLM.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.10001

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Smart Houses & Appliances (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Hu, Bokai, Somayajula, Sai Ashish, Pan, Xin, Huang, Zihan, Xie, Pengtao

arXiv.org Artificial IntelligenceOct-21-2024

Large language models (LLMs), built on decoder-only transformers, excel in natural language generation and adapt to diverse tasks using zero-shot and few-shot prompting. However, these prompting methods often struggle on natural language understanding (NLU) tasks, where encoder-only models like BERT-base outperform LLMs on benchmarks like GLUE and SuperGLUE. This paper explores two approaches-supervised fine-tuning (SFT) and proximal policy optimization (PPO)-to enhance LLMs' NLU abilities. To reduce the cost of full-model fine-tuning, we integrate low-rank adaptation (LoRA) layers, limiting updates to these layers during both SFT and PPO. In SFT, task-specific prompts are concatenated with input queries and ground-truth labels, optimizing with next-token prediction. Despite this, LLMs still underperform compared to models like BERT-base on several NLU tasks. To close this gap, we apply PPO, a reinforcement learning technique that treats each token generation as an action and uses a reward function based on alignment with ground-truth answers. PPO then updates the model to maximize these rewards, aligning outputs with correct labels. Our experiments with LLAMA2-7B show that PPO improves performance, with a 6.3-point gain over SFT on GLUE. PPO exceeds zero-shot by 38.7 points and few-shot by 26.1 points on GLUE, while surpassing these by 28.8 and 28.5 points on SuperGLUE. Additionally, PPO outperforms BERT-large by 2.7 points on GLUE and 9.3 points on SuperGLUE. The improvements are consistent across models like Qwen2.5-7B and MPT-7B, highlighting PPO's robustness in enhancing LLMs' NLU capabilities.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.1102

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.68)

Technology: