Goto

Collaborating Authors

 pretrained language model




CoLLAT: On Adding Fine-grained Audio Understanding to Language Models using Token-Level Locked-Language Tuning

Neural Information Processing Systems

Humans can easily understand various audio concepts, but conventional audio classification models fail due to their inability to predict unseen classes during training. To address this challenge, recent literature has explored contrastive language-audio pretraining to learn an audio understanding model using natural language supervision from a pretrained language model. However, despite their reasonable zero-shot performance in audio understanding, these models typically fail to achieve optimal performance while preserving the text understanding capabilities of the pretrained language model. They also perform poorly when comprehending audio clips with multiple audio concepts.


On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey

Zhang, Meishan, Zhang, Xin, Zhao, Xinping, Huang, Shouzheng, Hu, Baotian, Zhang, Min

arXiv.org Artificial Intelligence

Text embeddings have attracted growing interest due to their effectiveness across a wide range of natural language processing (NLP) tasks, including retrieval, classification, clustering, bitext mining, and summarization. With the emergence of pretrained language models (PLMs), general-purpose text embeddings (GPTE) have gained significant traction for their ability to produce rich, transferable representations. The general architecture of GPTE typically leverages PLMs to derive dense text representations, which are then optimized through contrastive learning on large-scale pairwise datasets. In this survey, we provide a comprehensive overview of GPTE in the era of PLMs, focusing on the roles PLMs play in driving its development. We first examine the fundamental architecture and describe the basic roles of PLMs in GPTE, i.e., embedding extraction, expressivity enhancement, training strategies, learning objectives, and data construction. We then describe advanced roles enabled by PLMs, including multilingual support, multimodal integration, code understanding, and scenario-specific adaptation. Finally, we highlight potential future research directions that move beyond traditional improvement goals, including ranking integration, safety considerations, bias mitigation, structural information incorporation, and the cognitive extension of embeddings. This survey aims to serve as a valuable reference for both newcomers and established researchers seeking to understand the current state and future potential of GPTE.


ESGBench: A Benchmark for Explainable ESG Question Answering in Corporate Sustainability Reports

George, Sherine, Saji, Nithish

arXiv.org Artificial Intelligence

We present ESGBench, a benchmark dataset and evaluation framework designed to assess explainable ESG question answering systems using corporate sustainability reports. The benchmark consists of domain-grounded questions across multiple ESG themes, paired with human-curated answers and supporting evidence to enable fine-grained evaluation of model reasoning. We analyze the performance of state-of-the-art LLMs on ESGBench, highlighting key challenges in factual consistency, traceability, and domain alignment. ESGBench aims to accelerate research in transparent and accountable ESG-focused AI systems.




A Broader impact

Neural Information Processing Systems

Attention modules have been demonstrated the effectiveness in start-of-the-art neural network models. Our proposed method shows the improvements on five representative tasks indicating its efficacy and general applicability. We hope that our work will encourage the community to pay more attention to key and query distributions in existing attention networks. The gap between the training data and testing data might be large. Therefore, an undue trust in deep learning models by incautious usage or imprecise interpretation of model output might lead to unexpected false consequences.