Optimising Language Models for Downstream Tasks: A Post-Training Perspective

Jun-27-2025–arXiv.org Artificial Intelligence

Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often underutilizes available unlabelled data, leads to overfitting on small task-specific sets, and imposes significant computational costs. These limitations hamper their application to the open-ended landscape of real-world language tasks. This thesis proposes a series of methods to better adapt LMs to downstream applications. First, we explore strategies for extracting task-relevant knowledge from unlabelled data, introducing a novel continued pre-training technique that outperforms state-of-the-art semi-supervised approaches. Next, we present a parameter-efficient fine-tuning method that substantially reduces memory and compute costs while maintaining competitive performance. We also introduce improved supervised fine-tuning methods that enable LMs to better follow instructions, especially when labelled data is scarce, enhancing their performance across a range of NLP tasks, including open-ended generation. Finally, we develop new evaluation methods and benchmarks, such as multi-hop spatial reasoning tasks, to assess LM capabilities and adaptation more comprehensively. Through extensive empirical studies across diverse NLP tasks, our results demonstrate that these approaches substantially improve LM robustness, efficiency, and generalization, making them more adaptable to a broad range of applications. These advances mark a significant step towards more robust and efficient LMs, bringing us closer to the goal of artificial general intelligence.

large language model, machine learning, pcp outperform state-of-the-art semi-supervised approach, (21 more...)

arXiv.org Artificial Intelligence

Jun-27-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Colorado > Denver County
      - Denver (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - Utah > Salt Lake County
      - Salt Lake City (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
    - Washington > King County
      - Seattle (0.27)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
    - British Columbia > Vancouver (0.04)
- Europe
  - France (0.04)
  - Austria (0.04)
  - Germany > Berlin (0.04)
  - Czechia > Prague (0.04)
  - Switzerland (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Greece > Attica
    - Athens (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Norway > Western Norway
    - Rogaland > Stavanger (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Promising Solution (0.92)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Education (0.92)
- Law (0.67)
- Health & Medicine > Consumer Health (0.67)
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Commonsense Reasoning (0.67)
  - Cognitive Science > Problem Solving (0.67)
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (1.00)
    - Chatbot (1.00)
    - Machine Translation (0.67)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Inductive Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found