Modeling Text-Label Alignment for Hierarchical Text Classification

Sep-1-2024–arXiv.org Artificial Intelligence

Hierarchical Text Classification (HTC) aims to categorize text data based on a structured label hierarchy, resulting in predicted labels forming a sub-hierarchy tree. The semantics of the text should align with the semantics of the labels in this sub-hierarchy. With the sub-hierarchy changing for each sample, the dynamic nature of text-label alignment poses challenges for existing methods, which typically process text and labels independently. To overcome this limitation, we propose a Text-Label Alignment (TLA) loss specifically designed to model the alignment between text and labels. We obtain a set of negative labels for a given text and its positive label set. By leveraging contrastive learning, the TLA loss pulls the text closer to its positive label and pushes it away from its negative label in the embedding space. This process aligns text representations with related labels while distancing them from unrelated ones. Building upon this framework, we introduce the Hierarchical Text-Label Alignment (HTLA) model, which leverages BERT as the text encoder and GPTrans as the graph encoder and integrates text-label embeddings to generate hierarchy-aware representations. Experimental results on benchmark datasets and comparison with existing baselines demonstrate the effectiveness of HTLA for HTC.

classification, dataset, hierarchy, (14 more...)

arXiv.org Artificial Intelligence

Sep-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.05)
  - New Mexico > Santa Fe County
    - Santa Fe (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - China > Hong Kong (0.04)
  - India > Uttarakhand
    - Roorkee (0.04)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Classification (0.73)
    - Text Processing (0.46)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found