Optimising Language Models for Downstream Tasks: A Post-Training Perspective
–arXiv.org Artificial Intelligence
Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often underutilizes available unlabelled data, leads to overfitting on small task-specific sets, and imposes significant computational costs. These limitations hamper their application to the open-ended landscape of real-world language tasks. This thesis proposes a series of methods to better adapt LMs to downstream applications. First, we explore strategies for extracting task-relevant knowledge from unlabelled data, introducing a novel continued pre-training technique that outperforms state-of-the-art semi-supervised approaches. Next, we present a parameter-efficient fine-tuning method that substantially reduces memory and compute costs while maintaining competitive performance. We also introduce improved supervised fine-tuning methods that enable LMs to better follow instructions, especially when labelled data is scarce, enhancing their performance across a range of NLP tasks, including open-ended generation. Finally, we develop new evaluation methods and benchmarks, such as multi-hop spatial reasoning tasks, to assess LM capabilities and adaptation more comprehensively. Through extensive empirical studies across diverse NLP tasks, our results demonstrate that these approaches substantially improve LM robustness, efficiency, and generalization, making them more adaptable to a broad range of applications. These advances mark a significant step towards more robust and efficient LMs, bringing us closer to the goal of artificial general intelligence.
arXiv.org Artificial Intelligence
Jun-27-2025
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- Europe
- Czechia > Prague (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Norway > Western Norway
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy
- Calabria > Catanzaro Province
- Catanzaro (0.04)
- Tuscany > Florence (0.04)
- Calabria > Catanzaro Province
- France (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Greece > Attica
- Athens (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Switzerland (0.04)
- Germany > Berlin (0.04)
- Austria (0.04)
- North America
- Canada
- Dominican Republic (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- Washington > King County
- Seattle (0.27)
- California > Los Angeles County
- Long Beach (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Colorado > Denver County
- Denver (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Oceania > Australia
- Africa > Ethiopia
- Genre:
- Research Report
- New Finding (1.00)
- Promising Solution (0.92)
- Research Report
- Industry:
- Education (0.92)
- Health & Medicine > Consumer Health (0.67)
- Information Technology > Security & Privacy (0.67)
- Law (0.67)
- Leisure & Entertainment (1.00)
- Media > Film (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Problem Solving (0.67)
- Machine Learning
- Inductive Learning (0.69)
- Neural Networks > Deep Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Machine Translation (0.67)
- Text Processing (1.00)
- Representation & Reasoning > Commonsense Reasoning (0.67)
- Information Technology > Artificial Intelligence