Xiao, Chenghao
Length is a Curse and a Blessing for Document-level Semantics
Xiao, Chenghao, Li, Yizhi, Hudson, G Thomas, Lin, Chenghua, Moubayed, Noura Al
In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.
Audio Contrastive based Fine-tuning
Wang, Yang, Liang, Qibin, Xiao, Chenghao, Li, Yizhi, Moubayed, Noura Al, Lin, Chenghua
Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuning (AudioConFit), an efficient approach characterised by robust generalisability. Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach, which achieves state-of-the-art results in various settings.
Effective Distillation of Table-based Reasoning Ability from LLMs
Yang, Bohao, Tang, Chen, Zhao, Kun, Xiao, Chenghao, Lin, Chenghua
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their remarkable parameter size and their impressive high requirement of computing resources pose challenges for their practical deployment. Recent research has revealed that specific capabilities of LLMs, such as numerical reasoning, can be transferred to smaller models through distillation. Some studies explore the potential of leveraging LLMs to perform table-based reasoning. Nevertheless, prior to our work, there has been no investigation into the prospect of specialising table reasoning skills in smaller models specifically tailored for table-to-text generation tasks. In this paper, we propose a novel table-based reasoning distillation, with the aim of distilling distilling LLMs into tailored, smaller models specifically designed for table-based reasoning task. Experimental results have shown that a 0.22 billion parameter model (Flan-T5-base) fine-tuned using distilled data, not only achieves a significant improvement compared to traditionally fine-tuned baselines but also surpasses specific LLMs like gpt-3.5-turbo on the scientific table-to-text generation dataset (SciGen). The code and data are released in https://github.com/Bernard-Yang/TableDistill.
On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning
Xiao, Chenghao, Long, Yang, Moubayed, Noura Al
Incorporating contrastive learning objectives in sentence representation learning (SRL) has yielded significant improvements on many sentence-level NLP tasks. However, it is not well understood why contrastive learning works for learning sentence-level semantics. In this paper, we aim to help guide future designs of sentence representation learning methods by taking a closer look at contrastive SRL through the lens of isotropy, contextualization and learning dynamics. We interpret its successes through the geometry of the representation shifts and show that contrastive learning brings isotropy, and drives high intra-sentence similarity: when in the same sentence, tokens converge to similar positions in the semantic space. We also find that what we formalize as "spurious contextualization" is mitigated for semantically meaningful tokens, while augmented for functional ones. We find that the embedding space is directed towards the origin during training, with more areas now better defined. We ablate these findings by observing the learning dynamics with different training temperatures, batch sizes and pooling methods.