Efficient Text Encoders for Labor Market Analysis

Decorte, Jens-Joris, Van Hautte, Jeroen, Develder, Chris, Demeester, Thomas

Jul-30-2025–arXiv.org Artificial Intelligence

Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose \textbf{ConTeXT-match}, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. \textbf{ConTeXT-match} significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce \textbf{Skill-XL}, a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the large label space. Finally, we present \textbf{JobBERT V2}, an improved job title normalization model that leverages extracted skills to produce high-quality job title representations. Experiments demonstrate that our models are efficient, accurate, and scalable, making them ideal for large-scale, real-time labor market analysis.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-30-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Minnesota (0.28)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Marketing (1.00)
- Information Technology (0.93)
- Banking & Finance > Economy (0.91)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Text Processing (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found