pfeiffer
How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters
This paper investigates the optimal use of the multilingual encoder model mDeBERTa for tasks in three Germanic languages -- German, Swedish, and Icelandic -- representing varying levels of presence and likely data quality in mDeBERTas pre-training data. We compare full fine-tuning with the parameter-efficient fine-tuning (PEFT) methods LoRA and Pfeiffer bottleneck adapters, finding that PEFT is more effective for the higher-resource language, German. However, results for Swedish and Icelandic are less consistent. We also observe differences between tasks: While PEFT tends to work better for question answering, full fine-tuning is preferable for named entity recognition. Inspired by previous research on modular approaches that combine task and language adapters, we evaluate the impact of adding PEFT modules trained on unstructured text, finding that this approach is not beneficial.
Adapter-based Approaches to Knowledge-enhanced Language Models -- A Survey
Fichtl, Alexander, Vladika, Juraj, Groh, Georg
Knowledge-enhanced language models (KELMs) have emerged as promising tools to bridge the gap between large-scale language models and domain-specific knowledge. KELMs can achieve higher factual accuracy and mitigate hallucinations by leveraging knowledge graphs (KGs). They are frequently combined with adapter modules to reduce the computational load and risk of catastrophic forgetting. In this paper, we conduct a systematic literature review (SLR) on adapter-based approaches to KELMs. We provide a structured overview of existing methodologies in the field through quantitative and qualitative analysis and explore the strengths and potential shortcomings of individual approaches. We show that general knowledge and domain-specific approaches have been frequently explored along with various adapter architectures and downstream tasks. We particularly focused on the popular biomedical domain, where we provided an insightful performance comparison of existing KELMs. We outline the main trends and propose promising future directions.
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
Diao, Shizhe, Xu, Tianyang, Xu, Ruijia, Wang, Jiawei, Zhang, Tong
Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain. Although continued pre-training on a large domain-specific corpus is effective, it is costly to tune all the parameters on the domain. In this paper, we investigate whether we can adapt PLMs both effectively and efficiently by only tuning a few parameters. Specifically, we decouple the feed-forward networks (FFNs) of the Transformer architecture into two parts: the original pre-trained FFNs to maintain the old-domain knowledge and our novel domain-specific adapters to inject domain-specific knowledge in parallel. Then we adopt a mixture-of-adapters gate to fuse the knowledge from different domain adapters dynamically. Our proposed Mixture-of-Domain-Adapters (MixDA) employs a two-stage adapter-tuning strategy that leverages both unlabeled data and labeled data to help the domain adaptation: i) domain-specific adapter on unlabeled data; followed by ii) the task-specific adapter on labeled data. MixDA can be seamlessly plugged into the pretraining-finetuning paradigm and our experiments demonstrate that MixDA achieves superior performance on in-domain tasks (GLUE), out-of-domain tasks (ChemProt, RCT, IMDB, Amazon), and knowledge-intensive tasks (KILT). Further analyses demonstrate the reliability, scalability, and efficiency of our method. The code is available at https://github.com/Amano-Aki/Mixture-of-Domain-Adapters.
Sequential Hierarchical Least-Squares Programming for Prioritized Non-Linear Optimal Control
Pfeiffer, Kai, Kheddar, Abderrahmane
We present a sequential hierarchical least-squares programming solver with trust-region and hierarchical step-filter tailored to prioritized non-linear optimal control. It is based on a hierarchical step-filter which resolves each priority level of a non-linear hierarchical least-squares programming via a globally convergent sequential quadratic programming step-filter. Leveraging a condition on the trust-region or the filter initialization, our hierarchical step-filter maintains this global convergence property. The hierarchical least-squares programming sub-problems are solved via a sparse nullspace method based interior point method. It is based on an efficient implementation of the turnback algorithm for the computation of nullspace bases for banded matrices. It is also here that we propose a nullspace trust region adaptation method towards a comprehensive hierarchical step-filter. We demonstrate the computational efficiency of the hierarchical solver on typical test functions like the Rosenbrock and Himmelblau's functions, inverse kinematics problems and optimal control.
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Deshpande, Ameet, Sultan, Md Arafat, Ferritto, Anthony, Kalyan, Ashwin, Narasimhan, Karthik, Sil, Avirup
Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.
A Synthetic Prediction Market for Estimating Confidence in Published Work
Rajtmajer, Sarah, Griffin, Christopher, Wu, Jian, Fraleigh, Robert, Balaji, Laxmaan, Squicciarini, Anna, Kwasnica, Anthony, Pennock, David, McLaughlin, Michael, Fritton, Timothy, Nakshatri, Nishanth, Menon, Arjun, Modukuri, Sai Ajay, Nivargi, Rajal, Wei, Xin, Giles, C. Lee
Explainably estimating confidence in published scholarly work offers opportunity for faster and more robust scientific progress. We develop a synthetic prediction market to assess the credibility of published claims in the social and behavioral sciences literature. We demonstrate our system and detail our findings using a collection of known replication projects. We suggest that this work lays the foundation for a research agenda that creatively uses AI for peer review.
Cascading Adaptors to Leverage English Data to Improve Performance of Question Answering for Low-Resource Languages
Pandya, Hariom A., Ardeshna, Bhavik, Bhatt, Dr. Brijesh S.
Transformer based architectures have shown notable results on many down streaming tasks including question answering. The availability of data, on the other hand, impedes obtaining legitimate performance for low-resource languages. In this paper, we investigate the applicability of pre-trained multilingual models to improve the performance of question answering in low-resource languages. We tested four combinations of language and task adapters using multilingual transformer architectures on seven languages similar to MLQA dataset. Additionally, we have also proposed zero-shot transfer learning of low-resource question answering using language and task adapters. We observed that stacking the language and the task adapters improves the multilingual transformer models' performance significantly for low-resource languages.
Boardroom diversity proves mission critical in data security, AI and beyond - SiliconANGLE
The past two years have seen a record number of women elected to board positions. According to a report on U.S. Board Diversity Trends posted by Harvard Law School, 46% of newly elected directors in 2019 were female and women now hold 27% of directorships across the S&P 500 companies. One of those newly elected members is Wendy Pfeiffer (pictured), chief information officer of Nutanix Inc. and board director with Qualys Inc. and Girls in Tech Inc. "When I was recruited for the board [of Qualys] โฆ we didn't talk about the fact that I am female at all. We talked about the fact that I'm an operator, that I'm a technologist," Pfeiffer told Jeff Frick (@JeffFrick), host of theCUBE, SiliconANGLE Media's mobile livestreaming studio during the Qualys Security Conference in Las Vegas. During the interview, Pfeiffer and Frick discussed how the growth of artificial intelligence is helping data security, making having a diverse workforce more critical than ever.
China's Tech Firms Are Mapping Pig Faces
China's biggest tech firms want to pamper pigs, too. Alibaba, the e-commerce giant, and JD.com, its rival, are using cameras to track pigs' faces. Alibaba also uses voice-recognition software to monitor their coughs. Many in China are quick to embrace high-tech solutions to just about any problem. A digital revolution has transformed China into a place where nearly anything -- financial services, spicy takeout, manicures and dog grooming, to name a few -- can be summoned with a smartphone.
Theory and Tools for the Conversion of Analog to Spiking Convolutional Neural Networks
Rueckauer, Bodo, Lungu, Iulia-Alexandra, Hu, Yuhuang, Pfeiffer, Michael
Deep convolutional neural networks (CNNs) have shown great potential for numerous real-world machine learning applications, but performing inference in large CNNs in real-time remains a challenge. We have previously demonstrated that traditional CNNs can be converted into deep spiking neural networks (SNNs), which exhibit similar accuracy while reducing both latency and computational load as a consequence of their data-driven, event-based style of computing. Here we provide a novel theory that explains why this conversion is successful, and derive from it several new tools to convert a larger and more powerful class of deep networks into SNNs. We identify the main sources of approximation errors in previous conversion methods, and propose simple mechanisms to fix these issues. Furthermore, we develop spiking implementations of common CNN operations such as max-pooling, softmax, and batch-normalization, which allow almost loss-less conversion of arbitrary CNN architectures into the spiking domain. Empirical evaluation of different network architectures on the MNIST and CIFAR10 benchmarks leads to the best SNN results reported to date.