AITopics | adapter layer

Collaborating Authors

adapter layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COMPACTER: Efficient Low-Rank Hypercomplex Adapter Layers

Neural Information Processing SystemsApr-24-2026, 13:31:16 GMT

Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Incorporating BERT into Parallel Sequence Decoding with Adapters

Neural Information Processing SystemsOct-3-2025, 08:01:36 GMT

Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic.

adapter, arxiv preprint arxiv, bert, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Anhui Province (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

Sami, Hasin Us, Sen, Swapneel, Roy-Chowdhury, Amit K., Krishnamurthy, Srikanth V., Guler, Basak

arXiv.org Artificial IntelligenceJun-6-2025

Federated learning (FL) allows multiple data-owners to collaboratively train machine learning models by exchanging local gradients, while keeping their private data on-device. To simultaneously enhance privacy and training efficiency, recently parameter-efficient fine-tuning (PEFT) of large-scale pretrained models has gained substantial attention in FL. While keeping a pretrained (backbone) model frozen, each user fine-tunes only a few lightweight modules to be used in conjunction, to fit specific downstream applications. Accordingly, only the gradients with respect to these lightweight modules are shared with the server. In this work, we investigate how the privacy of the fine-tuning data of the users can be compromised via a malicious design of the pretrained model and trainable adapter modules. We demonstrate gradient inversion attacks on a popular PEFT mechanism, the adapter, which allow an attacker to reconstruct local data samples of a target user, using only the accessible adapter gradients. Via extensive experiments, we demonstrate that a large batch of fine-tuning images can be retrieved with high fidelity. Our attack highlights the need for privacy-preserving mechanisms for PEFT, while opening up several future directions. Our code is available at https://github.com/info-ucr/PEFTLeak.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2506.04453

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PaCA: Partial Connection Adaptation for Efficient Fine-Tuning

Woo, Sunghyeon, Namkung, Sol, Lee, Sunwoo, Jeong, Inho, Kim, Beomseok, Jeon, Dongsuk

arXiv.org Artificial IntelligenceMar-11-2025

Prior parameter-efficient fine-tuning (PEFT) algorithms reduce memory usage and computational costs of fine-tuning large neural network models by training only a few additional adapter parameters, rather than the entire model. However, the reduction in computational costs due to PEFT does not necessarily translate to a reduction in training time; although the computational costs of the adapter layers are much smaller than the pretrained layers, it is well known that those two types of layers are processed sequentially on GPUs, resulting in significant latency overhead. LoRA and its variants merge low-rank adapter matrices with pretrained weights during inference to avoid latency overhead, but during training, the pretrained weights remain frozen while the adapter matrices are continuously updated, preventing such merging. To mitigate this issue, we propose Partial Connection Adaptation (PaCA), which fine-tunes randomly selected partial connections within the pretrained weights instead of introducing adapter layers in the model. PaCA not only enhances training speed by eliminating the time overhead due to the sequential processing of the adapter and pretrained layers but also reduces activation memory since only partial activations, rather than full activations, need to be stored for gradient computation. Compared to LoRA, PaCA reduces training time by 22% and total memory usage by 16%, while maintaining comparable accuracy across various fine-tuning scenarios, such as fine-tuning on the MMLU dataset and instruction tuning on the Oasst1 dataset. PaCA can also be combined with quantization, enabling the fine-tuning of large models such as LLaMA3.1-70B. In addition, PaCA enables training with 23% longer sequence and improves throughput by 16% on both NVIDIA A100 GPU and INTEL Gaudi2 HPU compared to LoRA. The code is available at https://github.com/WooSunghyeon/paca.

paca, pretrained weight, training time, (16 more...)

arXiv.org Artificial Intelligence

2503.01905

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
(14 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Akan, Adil Kaan, Yemez, Yucel

arXiv.org Artificial IntelligenceJan-28-2025

We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to other slot-based generative methods in the literature. The real world is inherently structured with distinct, composable parts and objects that can be combined in various ways; this compositional characteristic is essential for cognitive functions like reasoning, understanding causality, and ability to generalize beyond training data (Lake et al., 2017; Bottou, 2014; Schölkopf et al., 2021; Bahdanau et al., 2019; Fodor & Pylyshyn, 1988). While language clearly reflects this modularity through sentences made up of distinct words and tokens, the compositional structure is less obvious in visual data. Object-centric learning (OCL) offers a promising approach to uncover this latent structure by grouping related features into coherent object representations without supervision (Kahneman et al., 1992; Greff et al., 2020). By decomposing complex scenes into separate objects and their interactions, OCL mimics how humans interpret their environment (Spelke & Kinzler, 2007), potentially improving the robustness and interpretability of AI systems (Lake et al., 2017; Schölkopf et al., 2021). This approach shifts from traditional pixelbased feature extraction to a more meaningful segmentation of visual data, which is key for better generalization and supporting high-level reasoning tasks. Recent advances in OCL have shown the potential to incorporate powerful generative models, such as transformers and diffusion models, into the OCL framework as image decoders. Notably, models such as Latent Slot Diffusion (LSD) (Jiang et al., 2023) and SlotDiffusion (Wu et al., 2023b) have considerably improved performance in object discovery and visual generation tasks in real-world settings by employing slot-conditioned diffusion models.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.15878

Genre: Research Report > Promising Solution (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Navigating the Designs of Privacy-Preserving Fine-tuning for Large Language Models

Shi, Haonan, Ouyang, Tu, Wang, An

arXiv.org Artificial IntelligenceJan-8-2025

Instruction tuning has proven effective in enhancing Large Language Models' (LLMs) performance on downstream tasks. However, real-world fine-tuning faces inherent conflicts between model providers' intellectual property protection, clients' data privacy requirements, and tuning costs. While recent approaches like split learning and offsite tuning demonstrate promising architectures for privacy-preserving fine-tuning, there is a gap in systematically addressing the multidimensional trade-offs required for diverse real-world deployments. We propose several indicative evaluation metrics to guide design trade-offs for privacy-preserving fine-tuning and a series of example designs, collectively named GuardedTuning; they result from novel combinations of system architectures with adapted privacy-enhancement methods and emerging computation techniques. Each design represents distinct trade-offs across model utility, privacy guarantees, and costs. Experimental results demonstrate that these designs protect against data reconstruction attacks while maintaining competitive fine-tuning performance.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.04323

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SALSA: Speedy ASR-LLM Synchronous Aggregation

Mittal, Ashish, Prabhu, Darshan, Sarawagi, Sunita, Jyothi, Preethi

arXiv.org Artificial IntelligenceAug-29-2024

Harnessing pre-trained LLMs to improve ASR systems, particularly for low-resource languages, is now an emerging area of research. Existing methods range from using LLMs for ASR error correction to tightly coupled systems that replace the ASR decoder with the LLM. These approaches either increase decoding time or require expensive training of the cross-attention layers. We propose SALSA, which couples the decoder layers of the ASR to the LLM decoder, while synchronously advancing both decoders. Such coupling is performed with a simple projection of the last decoder state, and is thus significantly more training efficient than earlier approaches. A challenge of our proposed coupling is handling the mismatch between the tokenizers of the LLM and ASR systems. We handle this mismatch using cascading tokenization with respect to the LLM and ASR vocabularies. We evaluate SALSA on 8 low-resource languages in the FLEURS benchmark, yielding substantial WER reductions of up to 38%.

decoder, llm, salsa, (14 more...)

arXiv.org Artificial Intelligence

2408.16542

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling

Ortiz-Barajas, Jesus-German, Gomez-Adorno, Helena, Solorio, Thamar

arXiv.org Artificial IntelligenceJul-2-2024

We use the encoder-decoder T5 model only a small number of parameters is updated to (Raffel et al., 2020) for all experiments to take a downstream task (Houlsby et al., 2019; Stickland advantage of modelling the tasks as sequence-tosequence and Murray, 2019; Karimi Mahabadi et al., tasks. We test our model in seven datasets 2021a). These methods aim to achieve comparable from two Sequence Labelling tasks. The first task performance to full fine-tuning by updating as few is Named Entity Recognition, a valuable tool in parameters as possible. However, a less studied research various real-world scenarios in the era of large language direction related to these methods is whether models such as healthcare and medical research one can perform better than full fine-tuning with (Raza et al., 2022; Hu et al., 2024), Finance fewer parameters (Mao et al., 2022).

computational linguistic, hypernetwork, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2407.01411

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > Dominican Republic (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.55)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

Niklaus, Joel, Matoshi, Veton, Rani, Pooja, Galassi, Andrea, Stürmer, Matthias, Chalkidis, Ilias

arXiv.org Artificial IntelligenceJan-8-2024

Lately, propelled by the phenomenal advances around the transformer architecture, the legal NLP field has enjoyed spectacular growth. To measure progress, well curated and challenging benchmarks are crucial. However, most benchmarks are English only and in legal NLP specifically there is no multilingual benchmark available yet. Additionally, many benchmarks are saturated, with the best models clearly outperforming the best humans and achieving near perfect scores. We survey the legal NLP literature and select 11 datasets covering 24 languages, creating LEXTREME. To provide a fair comparison, we propose two aggregate scores, one based on the datasets and one on the languages. The best baseline (XLM-R large) achieves both a dataset aggregate score a language aggregate score of 61.3. This indicates that LEXTREME is still very challenging and leaves ample room for improvement. To make it easy for researchers and practitioners to use, we release LEXTREME on huggingface together with all the code required to evaluate models and a public Weights and Biases project with all the runs.

benchmark, computational linguistic, dataset, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.findings-emnlp.200

2301.13126

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > United States > New York > New York County > New York City (0.04)
(16 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Law (1.00)
Government > Regional Government > Europe Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Teaching Specific Scientific Knowledge into Large Language Models through Additional Training

Hatakeyama-Sato, Kan, Igarashi, Yasuhiko, Katakami, Shun, Nabae, Yuta, Hayakawa, Teruaki

arXiv.org Artificial IntelligenceDec-17-2023

Through additional training, we explore embedding specialized scientific knowledge into the Llama 2 Large Language Model (LLM). Key findings reveal that effective knowledge integration requires reading texts from multiple perspectives, especially in instructional formats. We utilize text augmentation to tackle the scarcity of specialized texts, including style conversions and translations. Hyperparameter optimization proves crucial, with different size models (7b, 13b, and 70b) reasonably undergoing additional training. Validating our methods, we construct a dataset of 65,000 scientific papers. Although we have succeeded in partially embedding knowledge, the study highlights the complexities and limitations of incorporating specialized information into LLMs, suggesting areas for further improvement.

additional training, hatakeyama, llm, (17 more...)

arXiv.org Artificial Intelligence

2312.0336

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Italy (0.04)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback