Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry
Zhukova, Anastasia, Lührs, Jonas, Lobmüller, Christian E., Gipp, Bela
–arXiv.org Artificial Intelligence
Recent trends in NLP utilize knowledge graphs (KGs) to enhance pretrained language models by incorporating additional knowledge from the graph structures to learn domain-specific terminology or relationships between documents that might otherwise be overlooked. This paper explores how SciNCL, a graph-aware neighborhood contrastive learning methodology originally designed for scientific publications, can be applied to the process industry domain, where text logs contain crucial information about daily operations and are often structured as sparse KGs. Our experiments demonstrate that language models fine-tuned with triplets derived from graph embeddings (GE) outperform a state-of-the-art mE5-large text encoder by 9.8-14.3% (5.45-7.96p) on the proprietary process industry text embedding benchmark (PITEB) while having 3 times fewer parameters.
arXiv.org Artificial Intelligence
Oct-8-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Germany > Lower Saxony
- Gottingen (0.14)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Switzerland (0.04)
- Germany > Lower Saxony
- North America > United States
- California
- Los Angeles County > Long Beach (0.04)
- Santa Clara County > Palo Alto (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Technology: