oft
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Hyper Compressed Fine-Tuning of Large Foundation Models with Quantum Inspired Adapters
Fine-tuning pre-trained large foundation models for specific tasks has become increasingly challenging due to the computational and storage demands associated with full parameter updates. Parameter-Efficient Fine-Tuning (PEFT) methods address this issue by updating only a small subset of model parameters using adapter modules. In this work, we propose \emph{Quantum-Inspired Adapters}, a PEFT approach inspired by Hamming-weight preserving quantum circuits from quantum machine learning literature. These models can be both expressive and parameter-efficient by operating in a combinatorially large space while simultaneously preserving orthogonality in weight parameters. We test our proposed adapters by adapting large language models and large vision transformers on benchmark datasets. Our method can achieve 99.2\% of the performance of existing fine-tuning methods such LoRA with a 44x parameter compression on language understanding datasets like GLUE and VTAB. Compared to existing orthogonal fine-tuning methods such as OFT or BOFT, we achieve 98\% relative performance with 25x fewer parameters. This demonstrates competitive performance paired with a significant reduction in trainable parameters. Through ablation studies, we determine that combining multiple Hamming-weight orders with orthogonality and matrix compounding are essential for performant fine-tuning. Our findings suggest that Quantum-Inspired Adapters offer a promising direction for efficient adaptation of language and vision models in resource-constrained environments.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Albania > Fier County (0.04)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
Bini, Massimo, Roth, Karsten, Akata, Zeynep, Khoreva, Anna
Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effective, parameter-efficient, and hyperparameter-robust adaptation, we propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. By design, ETHER transformations require a minimal number of parameters, are less likely to deteriorate model performance, and exhibit robustness to hyperparameter and learning rate choices. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters ($\sim$$10$-$100$ times lower than LoRA or OFT) across multiple image synthesis and natural language tasks without exhaustive hyperparameter tuning. Finally, we investigate the recent emphasis on Hyperspherical Energy retention for adaptation and raise questions on its practical utility. The code is available at https://github.com/mwbini/ether.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Query Complexity of Tournament Solutions
A directed graph where there is exactly one edge between every pair of vertices is called a {\em tournament}. Finding the "best" set of vertices of a tournament is a well studied problem in social choice theory. A {\em tournament solution} takes a tournament as input and outputs a subset of vertices of the input tournament. However, in many applications, for example, choosing the best set of drugs from a given set of drugs, the edges of the tournament are given only implicitly and knowing the orientation of an edge is costly. In such scenarios, we would like to know the best set of vertices (according to some tournament solution) by "querying" as few edges as possible. We, in this paper, precisely study this problem for commonly used tournament solutions: given an oracle access to the edges of a tournament T, find $f(T)$ by querying as few edges as possible, for a tournament solution f. We first show that the set of Condorcet non-losers in a tournament can be found by querying $2n-\lfloor \log n \rfloor -2$ edges only and this is tight in the sense that every algorithm for finding the set of Condorcet non-losers needs to query at least $2n-\lfloor \log n \rfloor -2$ edges in the worst case, where $n$ is the number of vertices in the input tournament. We then move on to study other popular tournament solutions and show that any algorithm for finding the Copeland set, the Slater set, the Markov set, the bipartisan set, the uncovered set, the Banks set, and the top cycle must query $\Omega(n^2)$ edges in the worst case. On the positive side, we are able to circumvent our strong query complexity lower bound results by proving that, if the size of the top cycle of the input tournament is at most $k$, then we can find all the tournament solutions mentioned above by querying $O(nk + \frac{n\log n}{\log(1-\frac{1}{k})})$ edges only.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Virginia (0.04)
- (4 more...)
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Qiu, Zeju, Liu, Weiyang, Feng, Haiwen, Xue, Yuxuan, Feng, Yao, Liu, Zhen, Zhang, Dan, Weller, Adrian, Schölkopf, Bernhard
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
- North America > United States > Texas > Loving County (0.04)
- (5 more...)
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Zhang, Kaiyan, Ding, Ning, Qi, Biqing, Zhu, Xuekai, Long, Xinwei, Zhou, Bowen
Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.
- North America > United States > Washington > King County > Seattle (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (18 more...)
- Leisure & Entertainment (1.00)
- Information Technology > Security & Privacy (1.00)
Relational Programming
The two favoured theoretical bases for languages have been lambda calculus as advocated by Landin and others, and predicate calculus as advocated by Kowalski (see Landin (1966) and Kowalski (1973)). In this paper I adopt an approach based on predicate calculus, but in a manner that differs from the existing PROLOG language (Warren 1975 and Battani & Meloni 1973) in that I adopt a "forward inference" approach -- inferring conclusions from premises, rather than the "backward inference" approach of PROLOG, which starts with a desired conclusion and tries to find ways of inferring it. This difference is reflected in the internal structure of the associated implementations, that of PROLOG being a "backtrack search" kind of implementation, while the most obvious implementation of the system proposed here involves a kind of mass operation on tables of data, reminiscent of APL (Iverson 1962) but in fact identical in many respects with the work of Codd (Codd 1970) on relational data bases. Indeed, from one perspective this paper can be seen as an extension of Codd's work into the realm of general purpose computing. As in the case of PROLOG it is necessary for the user of the relational programming system to make statements which are not associated with the logical structure of the problem, but reflect the need to control the computation. In PROLOG these are effected by the use of extra-logical control primitives, but in our system control is exercised by the introduction of predicates for that purpose, which have exactly the same semantics as the predicates relevant to the logical structure of the problem. In later sections I deal with the problem of introducing equality into the system, in a way that reflects the normal mathematical usage of equality.
Relational Programming
A programming language needs simple and well defined semantics. The two favoured theoretical bases for languages have been lambda calculus as advocated by Landin and others, and predicate calculus as advocated by Kowalski (see Landin (1966) and Kowalski (1973)). In this paper I adopt an approach based on predicate calculus, but in a manner that differs from the existing PROLOG language (Warren 1975 and Battani & Meloni 1973) in that I adopt a "forward inference" approach -- inferring conclusions from premises, rather than the "backward inference" approach of PROLOG, which starts with a desired conclusion and tries to find ways of inferring it. This difference is reflected in the internal structure of the associated implementations, that of PROLOG being a "backtrack search" kind of implementation, while the most obvious implementation of the system proposed here involves a kind of mass operation on tables of data, reminiscent of APL (Iverson 1962) but in fact identical in many respects with the work of Codd (Codd 1970) on relational data bases. Indeed, from one perspective this paper can be seen as an extension of Codd's work into the realm of general purpose computing. As in the case of PROLOG it is necessary for the user of the relational programming system to make statements which are not associated with the logical structure of the problem, but reflect the need to control the computation. In PROLOG these are effected by the use of extra-logical control primitives, but in our system control is exercised by the introduction of predicates for that purpose, which have exactly the same semantics as the predicates relevant to the logical structure of the problem.
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- North America > United States > New York (0.04)