Goto

Collaborating Authors

 Optimization


Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts

arXiv.org Artificial Intelligence

Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models, mitigating the significant computational and storage demands associated with multi-task training. Despite the promising performance of TA, conflicts can arise among the task vectors, particularly when different tasks require distinct model adaptations. In this paper, we formally define this issue as knowledge conflicts, characterized by the performance degradation of one task after merging with a model fine-tuned for another task. Restricting parameter merging within this trust region, TATR can effectively alleviate knowledge conflicts. Moreover, TATR serves as both an independent approach and a plug-and-play module compatible with a wide range of TAbased methods. Extensive empirical evaluations on eight distinct datasets robustly demonstrate that TATR improves the multi-task performance of several TA-based model merging methods by an observable margin. The growing adoption of large foundation models is accompanied by significant practical challenges in terms of computational and storage demands (Kaplan et al., 2020). To address these challenges, multi-task model merging (Matena & Raffel, 2022) has emerged as a promising solution. Here task vectors are the difference in model parameters between the pre-trained foundation model and its fine-tuned version on a specific task. This approach builds a high-performance multi-task model by simple arithmetic operations in the model parameter space, thereby reducing computational overheads associated with fine-tuning on multiple tasks. Despite their successes, task arithmetic and its variants (Yadav et al., 2023; Wang et al., 2024; Yang et al., 2024b;a) still suffer from conflicts between task vectors.


Iterative Feature Space Optimization through Incremental Adaptive Evaluation

arXiv.org Artificial Intelligence

Iterative feature space optimization involves systematically evaluating and adjusting the feature space to improve downstream task performance. However, existing works suffer from three key limitations:1) overlooking differences among data samples leads to evaluation bias; 2) tailoring feature spaces to specific machine learning models results in overfitting and poor generalization; 3) requiring the evaluator to be retrained from scratch during each optimization iteration significantly reduces the overall efficiency of the optimization process. To bridge these gaps, we propose a gEneralized Adaptive feature Space Evaluator (EASE) to efficiently produce optimal and generalized feature spaces. This framework consists of two key components: Feature-Sample Subspace Generator and Contextual Attention Evaluator. The first component aims to decouple the information distribution within the feature space to mitigate evaluation bias. To achieve this, we first identify features most relevant to prediction tasks and samples most challenging for evaluation based on feedback from the subsequent evaluator. This decoupling strategy makes the evaluator consistently target the most challenging aspects of the feature space. The second component intends to incrementally capture evolving patterns of the feature space for efficient evaluation. We propose a weighted-sharing multi-head attention mechanism to encode key characteristics of the feature space into an embedding vector for evaluation. Moreover, the evaluator is updated incrementally, retaining prior evaluation knowledge while incorporating new insights, as consecutive feature spaces during the optimization process share partial information. Extensive experiments on fourteen real-world datasets demonstrate the effectiveness of the proposed framework. Our code and data are publicly available.


Federated Domain Generalization with Data-free On-server Gradient Matching

arXiv.org Artificial Intelligence

Domain Generalization (DG) aims to learn from multiple known source domains a model that can generalize well to unknown target domains. One of the key approaches in DG is training an encoder which generates domain-invariant representations. However, this approach is not applicable in Federated Domain Generalization (FDG), where data from various domains are distributed across different clients. In this paper, we introduce a novel approach, dubbed Federated Learning via On-server Matching Gradient (FedOMG), which can \emph{efficiently leverage domain information from distributed domains}. Specifically, we utilize the local gradients as information about the distributed models to find an invariant gradient direction across all domains through gradient inner product maximization. The advantages are two-fold: 1) FedOMG can aggregate the characteristics of distributed models on the centralized server without incurring any additional communication cost, and 2) FedOMG is orthogonal to many existing FL/FDG methods, allowing for additional performance improvements by being seamlessly integrated with them. Extensive experimental evaluations on various settings to demonstrate the robustness of FedOMG compared to other FL/FDG baselines. Our method outperforms recent SOTA baselines on four FL benchmark datasets (MNIST, EMNIST, CIFAR-10, and CIFAR-100), and three FDG benchmark datasets (PACS, VLCS, and OfficeHome).


Light3R-SfM: Towards Feed-forward Structure-from-Motion

arXiv.org Artificial Intelligence

To perform Structure-from-Motion (SfM) is the task of jointly recovering SfM from an image collection, DUSt3R works [22, camera poses and reconstructing the 3D scene 51] first compute stereo reconstruction exhaustively for all structure from a set of unconstrained images. This longstanding image pairs and then obtain globally aligned pointmaps problem is essential to many computer vision applications, for all cameras through joint optimization of pairwise rigid including novel view synthesis via NeRFs [3, transformations and local pointmaps. This baseline has been 29] and 3DGS [20], multi-view stereo (MVS) reconstruction significantly improved by the concurrent work MASt3R- [31, 49], and visual localization [34, 36]. Traditional SfM [12] that leverages image retrieval to drastically reduce SfM methods generally follow two main approaches: incremental the computation overhead, boosts optimization efficiency [37, 41, 56] and global [8, 30, 55] SfM. Both by optimizing only over the sparse pixel correspondences, paradigms rely on key components such as feature detection and appends a global bundle adjustment stage for and matching for correspondence search, 3D triangulation accuracy refinement. While optimization-based alignment to reconstruct geometry from 2D correspondences, has been proven to be the key to accurate 3D reconstruction and joint optimization of camera poses and scene geometry by DUSt3R, MASt3R-SfM and classical SfM methods through bundle adjustment. A major research direction has [25, 30, 37], this comes at the cost of slow runtime and been to replace these components with learning-based modules, extensive memory footprint even for moderately-sized image progressing towards fully end-to-end SfM [7, 40, 50].


Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization

arXiv.org Machine Learning

The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension $d > 1$. Most practical algorithms for approximating the barycenter problem are based on entropic regularization. In this paper, we introduce a nearly linear time $O(m \log{m})$ and linear space complexity $O(m)$ primal-dual algorithm, the Wasserstein-Descent $\dot{\mathbb{H}}^1$-Ascent (WDHA) algorithm, for computing the exact barycenter when the input probability density functions are discretized on an $m$-point grid. The key success of the WDHA algorithm hinges on alternating between two different yet closely related Wasserstein and Sobolev optimization geometries for the primal barycenter and dual Kantorovich potential subproblems. Under reasonable assumptions, we establish the convergence rate and iteration complexity of WDHA to its stationary point when the step size is appropriately chosen. Superior computational efficacy, scalability, and accuracy over the existing Sinkhorn-type algorithms are demonstrated on high-resolution (e.g., $1024 \times 1024$ images) 2D synthetic and real data.


Permutation-based multi-objective evolutionary feature selection for high-dimensional data

arXiv.org Artificial Intelligence

Feature selection is a critical step in the analysis of high-dimensional data, where the number of features often vastly exceeds the number of samples. Effective feature selection not only improves model performance and interpretability but also reduces computational costs and mitigates the risk of overfitting. In this context, we propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach, but extending it to evaluate subsets of attributes rather than individual features. This extension more effectively captures how interactions among features influence model performance. The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets, with the objectives of maximizing the degradation in model performance when the selected features are shuffled, and minimizing the cardinality of the feature subset. The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets for classification and regression tasks, and compared against 9 well-established feature selection methods designed for high-dimensional problems, including the conventional permutation feature importance method. The results demonstrate the ability of our approach in balancing accuracy and computational efficiency, providing a powerful tool for feature selection in complex, high-dimensional datasets.


Random-Key Algorithms for Optimizing Integrated Operating Room Scheduling

arXiv.org Artificial Intelligence

Efficient surgery room scheduling is essential for hospital efficiency, patient satisfaction, and resource utilization. This study addresses this challenge by introducing a novel concept of Random-Key Optimizer (RKO), rigorously tested on literature and new, real-world inspired instances. Our combinatorial optimization problem incorporates multi-room scheduling, equipment scheduling, and complex availability constraints for rooms, patients, and surgeons, facilitating rescheduling and enhancing operational flexibility. The RKO approach represents solutions as points in a continuous space, which are then mapped in the problem solution space via a deterministic function known as a decoder. The core idea is to operate metaheuristics and heuristics in the random-key space, unaware of the original solution space. We design the Biased Random-Key Genetic Algorithm with $Q$-Learning, Simulated Annealing, and Iterated Local Search for use within an RKO framework, employing a single decoder function. The proposed metaheuristics are complemented by lower-bound formulations, providing optimal gaps for evaluating the effectiveness of the heuristic results. Our results demonstrate significant lower and upper bounds improvements for the literature instances, notably proving one optimal result. Furthermore, the best-proposed metaheuristic efficiently generates schedules for the newly introduced instances, even in highly constrained scenarios. This research offers valuable insights and practical solutions for improving surgery scheduling processes, offering tangible benefits to hospitals by optimising resource allocation, reducing patient wait times, and enhancing overall operational efficiency.


Feasible Learning

arXiv.org Artificial Intelligence

We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample. In contrast to the ubiquitous Empirical Risk Minimization (ERM) framework, which optimizes for average performance, FL demands satisfactory performance on every individual data point. Since any model that meets the prescribed performance threshold is a valid FL solution, the choice of optimization algorithm and its dynamics play a crucial role in shaping the properties of the resulting solutions. In particular, we study a primal-dual approach which dynamically re-weights the importance of each sample during training. To address the challenge of setting a meaningful threshold in practice, we introduce a relaxation of FL that incorporates slack variables of minimal norm. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.


Reviews: Learning Reward Machines for Partially Observable Reinforcement Learning

Neural Information Processing Systems

The authors propose a novel approach for solving POMDPs by simultaneously learning and solving reward machines. The method relies on building a finite state machine which properly predicts possible observations and rewards. The authors demonstrate that their method outperforms baselines in three different partially observable gridworlds. Overall, I found the paper clear and well motivated. Learning to solve POMDPs is a very challenging problem and any progress or insight has the potential to have a big impact.


Reviews: High-Dimensional Optimization in Adaptive Random Subspaces

Neural Information Processing Systems

Post-rebuttal update: The author's rebuttal addresses my (minor) concerns well, and my overall score remains the same. The approach is similar to earlier work such as: - M. Pilanci and M. J. Wainwright. The main innovations here are to extend this sketching technique to a wider class of convex objectives and to introduce a data-adaptive sketching technique that greatly improves the error bounds on the solution relative to a data-oblivious sketch. The proposed technique can also be performed iteratively to improve the accuracy of the solution without having to change the sketch matrix, so the sketch on the data only has to be performed once. Overall, I thought this was a high-quality paper.