Xing, Hanwen
Continual learning via probabilistic exchangeable sequence modelling
Xing, Hanwen, Yau, Christopher
Continual learning (CL) refers to the ability to continuously learn and accumulate new knowledge while retaining useful information from past experiences. Although numerous CL methods have been proposed in recent years, it is not straightforward to deploy them directly to real-world decision-making problems due to their computational cost and lack of uncertainty quantification. To address these issues, we propose CL-BRUNO, a probabilistic, Neural Process-based CL model that performs scalable and tractable Bayesian update and prediction. Our proposed approach uses deep-generative models to create a unified probabilistic framework capable of handling different types of CL problems such as task- and class-incremental learning, allowing users to integrate information across different CL scenarios using a single model. Our approach is able to prevent catastrophic forgetting through distributional and functional regularisation without the need of retaining any previously seen samples, making it appealing to applications where data privacy or storage capacity is of concern. Experiments show that CL-BRUNO outperforms existing methods on both natural image and biomedical data sets, confirming its effectiveness in real-world applications.
GPTON: Generative Pre-trained Transformers enhanced with Ontology Narration for accurate annotation of biological data
Li, Rongbin, Chen, Wenbo, Li, Jinbo, Xing, Hanwen, Xu, Hua, Li, Zhao, Zheng, W. Jim
By leveraging GPT-4 for ontology narration, we developed GPTON to infuse structured knowledge into LLMs through verbalized ontology terms, achieving accurate text and ontology annotations for over 68% of gene sets in the top five predictions. Manual evaluations confirm GPTON's robustness, highlighting its potential to harness LLMs and structured knowledge to significantly advance biomedical research beyond gene set annotation.
Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor
Xu, Chuankai, Zhao, Dongming, Wang, Bo, Xing, Hanwen
Despite the prevalence of retrieval-augmented language models (RALMs), the seamless integration of these models with retrieval mechanisms to enhance performance in document-based tasks remains challenging. While some post-retrieval processing Retrieval-Augmented Generation (RAG) methods have achieved success, most still lack the ability to distinguish pertinent from extraneous information, leading to potential inconsistencies and reduced precision in the generated output, which subsequently affects the truthfulness of the language model's responses. To address these limitations, this work proposes a novel two-stage consistency learning approach for retrieved information compression in retrieval-augmented language models to enhance performance. By incorporating consistency learning, the aim is to generate summaries that maintain coherence and alignment with the intended semantic representations of a teacher model while improving faithfulness to the original retrieved documents. The proposed method is empirically validated across multiple datasets, demonstrating notable enhancements in precision and efficiency for question-answering tasks. It outperforms existing baselines and showcases the synergistic effects of combining contrastive and consistency learning paradigms within the retrieval-augmented generation framework.
Improving Bridge estimators via $f$-GAN
Xing, Hanwen
Bridge sampling is a powerful Monte Carlo method for estimating ratios of normalizing constants. Various methods have been introduced to improve its efficiency. These methods aim to increase the overlap between the densities by applying appropriate transformations to them without changing their normalizing constants. In this paper, we first give a new estimator of the asymptotic relative mean square error (RMSE) of the optimal Bridge estimator by equivalently estimating an $f$-divergence between the two densities. We then utilize this framework and propose $f$-GAN-Bridge estimator ($f$-GB) based on a bijective transformation that maps one density to the other. Such transformation is chosen to minimize a specific $f$-divergence between them using an $f$-GAN \citep{nowozin2016f}. We show it is equivalent to minimizing the asymptotic RMSE of the optimal Bridge estimator with respect to the densities. In other words, $f$-GB is optimal in the sense that asymptotically, it can achieve an RMSE lower than that achieved by Bridge estimators based on any transformed density within the class of densities generated by the candidate transformations. Numerical experiments show that $f$-GB outperforms existing methods in simulated and real-world examples. In addition, we discuss how Bridge estimators naturally arise from the problem of $f$-divergence estimation.