Goto

Collaborating Authors

 blob


Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

Xiang, Haotian, Li, Bingcong, Lu, Qin

arXiv.org Machine Learning

When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for downstream domain-specific tasks with limited data. Existing methods to alleviate this issue either rely on Laplace approximation based post-hoc framework, which may yield suboptimal calibration depending on the training trajectory, or variational Bayesian training that requires multiple complete forward passes through the entire LLM backbone at inference time for Monte Carlo estimation, posing scalability challenges for deployment. To address these limitations, we build on the Bayesian last layer (BLL) model, where the LLM-based deterministic feature extractor is followed by random last layer parameters for uncertainty reasoning. Since existing low-rank adapters (LoRA) for PEFT have limited expressiveness due to rank collapse, we address this with Polar-decomposed Low-rank Adapter Representation (PoLAR), an orthogonalized parameterization paired with Riemannian optimization to enable more stable and expressive adaptation. Building on this PoLAR-BLL model, we leverage the variational (V) inference framework to put forth a scalable Bayesian fine-tuning approach which jointly seeks the PoLAR parameters and approximate posterior of the last layer parameters via alternating optimization. The resulting PoLAR-VBLL is a flexible framework that nicely integrates architecture-enhanced optimization with scalable Bayesian inference to endow LLMs with well-calibrated UQ. Our empirical results verify the effectiveness of PoLAR-VBLL in terms of generalization and uncertainty estimation on both in-distribution and out-of-distribution data for various common-sense reasoning tasks.




BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

Neural Information Processing Systems

Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.


Hubble spots massive sandwich shaped blob in deep-space

Popular Science

Nicknamed Dracula's Chivito, the disk is 1,000 light-years away from Earth. Breakthroughs, discoveries, and DIY tips sent every weekday. Scientists are leaving space fans with one more treat before the year comes to a close. Using the Hubble Space Telescope, astronomers captured a stunning image of the largest protoplanetary disk ever observed, which just happens to be shaped like a giant celestial sandwich. The massive formation of dust and gas, which astronomers call Dracula's Chivito, resides about 1,000 light-years from Earth and spans roughly 400 billion miles.


There Is Only One AI Company. Welcome to the Blob

WIRED

There Is Only One AI Company. As Nvidia, OpenAI, Google, and Microsoft forge partnerships and deals, the AI industry is looking more like one interconnected machine. What does that mean for all of us? It all began, as many things do, with Elon Musk . In the early 2010s he realized that AI was on a track to become perhaps the most powerful technology of all time.


C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models

Rahmati, Amir Hossein, Jantre, Sanket, Zhang, Weifeng, Wang, Yucheng, Yoon, Byung-Jun, Urban, Nathan M., Qian, Xiaoning

arXiv.org Artificial Intelligence

Low-Rank Adaptation (LoRA) offers a cost-effective solution for fine-tuning large language models (LLMs), but it often produces overconfident predictions in data-scarce few-shot settings. To address this issue, several classical statistical learning approaches have been repurposed for scalable uncertainty-aware LoRA fine-tuning. However, these approaches neglect how input characteristics affect the predictive uncertainty estimates. To address this limitation, we propose Contextual Low-Rank Adaptation (C-LoRA) as a novel uncertainty-aware and parameter efficient fine-tuning approach, by developing new lightweight LoRA modules contextualized to each input data sample to dynamically adapt uncertainty estimates. Incorporating data-driven contexts into the parameter posteriors, C-LoRA mitigates overfitting, achieves well-calibrated uncertainties, and yields robust predictions. Extensive experiments on LLaMA2-7B models demonstrate that C-LoRA consistently outperforms the state-of-the-art uncertainty-aware LoRA methods in both uncertainty quantification and model generalization. Ablation studies further confirm the critical role of our contextual modules in capturing sample-specific uncertainties. C-LoRA sets a new standard for robust, uncertainty-aware LLM fine-tuning in few-shot regimes. Although our experiments are limited to 7B models, our method is architecture-agnostic and, in principle, applies beyond this scale; studying its scaling to larger models remains an open problem. Our code is available at https://github.com/ahra99/c_lora.


BiMax: Bidirectional MaxSim Score for Document-Level Alignment

Wang, Xiaotian, Utsuro, Takehito, Nagata, Masaaki

arXiv.org Artificial Intelligence

Document alignment is necessary for the hierarchical mining (Bañón et al., 2020; Morishita et al., 2022), which aligns documents across source and target languages within the same web domain. Several high precision sentence embedding-based methods have been developed, such as TK-PERT (Thompson and Koehn, 2020) and Optimal Transport (OT) (Clark et al., 2019; El-Kishky and Guzmán, 2020). However, given the massive scale of web mining data, both accuracy and speed must be considered. In this paper, we propose a cross-lingual Bidirectional Maxsim score (BiMax) for computing doc-to-doc similarity, to improve efficiency compared to the OT method. Consequently, on the WMT16 bilingual document alignment task, BiMax attains accuracy comparable to OT with an approximate 100-fold speed increase. Meanwhile, we also conduct a comprehensive analysis to investigate the performance of current state-of-the-art multilingual sentence embedding models. All the alignment methods in this paper are publicly available as a tool called EmbDA (https://github.com/EternalEdenn/EmbDA).



BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning

Wang, Xin, Oliver, Carlos

arXiv.org Artificial Intelligence

Protein function is driven by coherent substructures which vary in size and topology, yet current protein representation learning models (PRL) distort these signals by relying on rigid substructures such as k-hop and fixed radius neighbourhoods. We introduce BioBlobs, a plug-and-play, fully differentiable module that represents proteins by dynamically partitioning structures into flexibly-sized, non-overlapping substructures ("blobs"). The resulting blobs are quantized into a shared and interpretable codebook, yielding a discrete vocabulary of function-relevant protein substructures used to compute protein embeddings. We show that BioBlobs representations improve the performance of widely used protein encoders such as GVP-GNN across various PRL tasks. Our approach highlights the value of architectures that directly capture function-relevant protein substructures, enabling both improved predictive performance and mechanistic insight into protein function.