Goto

Collaborating Authors

 Optimization


Probabilistic Federated Prompt-Tuning with Non-IID and Imbalanced Data

arXiv.org Artificial Intelligence

Fine-tuning pre-trained models is a popular approach in machine learning for solving complex tasks with moderate data. However, fine-tuning the entire pre-trained model is ineffective in federated data scenarios where local data distributions are diversely skewed. To address this, we explore integrating federated learning with a more effective prompt-tuning method, optimizing for a small set of input prefixes to reprogram the pre-trained model's behavior. Our approach transforms federated learning into a distributed set modeling task, aggregating diverse sets of prompts to globally fine-tune the pre-trained model. We benchmark various baselines based on direct adaptations of existing federated model aggregation techniques and introduce a new probabilistic prompt aggregation method that substantially outperforms these baselines. Our reported results on a variety of computer vision datasets confirm that the proposed method is most effective to combat extreme data heterogeneity in federated learning.


Learning Surrogates for Offline Black-Box Optimization via Gradient Matching

arXiv.org Artificial Intelligence

Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside the offline data regime. This challenge raises a fundamental question about the impact of imperfect surrogate model on the performance gap between its optima and the true optima, and to what extent the performance loss can be mitigated. Although prior work developed methods to improve the robustness of surrogate models and their associated optimization processes, a provably quantifiable relationship between an imperfect surrogate and the corresponding performance gap, as well as whether prior methods directly address it, remain elusive. To shed light on this important question, we present a theoretical framework to understand offline black-box optimization, by explicitly bounding the optimization quality based on how well the surrogate matches the latent gradient field that underlines the offline data. Inspired by our theoretical analysis, we propose a principled black-box gradient matching algorithm to create effective surrogate models for offline optimization, improving over prior approaches on various real-world benchmarks.


ARENA: Adaptive Risk-aware and Energy-efficient NAvigation for Multi-Objective 3D Infrastructure Inspection with a UAV

arXiv.org Artificial Intelligence

-- Autonomous robotic inspection missions require balancing multiple conflicting objectives while navigating near costly obstacles. Current multi-objective path planning (MOPP) methods struggle to adapt to evolving risks like localization errors, weather, battery state, and communication issues. This letter presents an Adaptive Risk-aware and Energy-efficient NA vigation (ARENA) MOPP approach for UA Vs in complex 3D environments. Our method enables online trajectory adaptation by optimizing safety, time, and energy using 4D NURBS representation and a genetic-based algorithm to generate the Pareto front. A novel risk-aware voting algorithm ensures adaptivity. Simulations and real-world tests demonstrate the planner's ability to produce diverse, optimized trajectories covering 95% or more of the range defined by single-objective benchmarks and its ability to estimate power consumption with a mean error representing 14% of the full power range. The ARENA framework enhances UA V autonomy and reliability in critical, evolving 3D missions. Uncrewed aerial vehicles (UA Vs) are becoming crucial tools in various scenarios where human involvement can become too risky or incur high costs, such as search and rescue [1], surveillance [2], and inspection [3], [4]. Achieving autonomy in these scenarios heavily relies on the path planning module to generate safe and feasible trajectories. Numerous approaches have been proposed to find the shortest or safest path in a cluttered environment.


CPG-Based Manipulation with Multi-Module Origami Robot Surface

arXiv.org Artificial Intelligence

--Robotic manipulators often face challenges in handling objects of different sizes and materials, limiting their effectiveness in practical applications. This issue is particularly pronounced when manipulating meter-scale objects or those with varying stiffness, as traditional gripping techniques and strategies frequently prove inadequate. In this letter, we introduce a novel surface-based multi-module robotic manipulation framework that utilizes a Central Pattern Generator (CPG)-based motion generator, combined with a simulation-based optimization method to determine the optimal manipulation parameters for a multi-module origami robotic surface (Ori-Pixel). This approach allows for the manipulation of objects ranging from centimeters to meters in size, with varying stiffness and shape. The optimized CPG parameters are tested through both dynamic simulations and a series of prototype experiments involving a wide range of objects differing in size, weight, shape, and material, demonstrating robust manipulation capabilities. Robotic manipulation has made significant strides in recent years, leveraging advanced control and planning algorithms to demonstrate a variety of automated and precise tasks.


Quantum Annealing Feature Selection on Light-weight Medical Image Datasets

arXiv.org Artificial Intelligence

We investigate the use of quantum computing algorithms on real quantum hardware to tackle the computationally intensive task of feature selection for light-weight medical image datasets. Feature selection is often formulated as a k of n selection problem, where the complexity grows binomially with increasing k and n. As problem sizes grow, classical approaches struggle to scale efficiently. Quantum computers, particularly quantum annealers, are well-suited for such problems, offering potential advantages in specific formulations. We present a method to solve larger feature selection instances than previously presented on commercial quantum annealers. Our approach combines a linear Ising penalty mechanism with subsampling and thresholding techniques to enhance scalability. The method is tested in a toy problem where feature selection identifies pixel masks used to reconstruct small-scale medical images. The results indicate that quantum annealing-based feature selection is effective for this simplified use case, demonstrating its potential in high-dimensional optimization tasks. However, its applicability to broader, real-world problems remains uncertain, given the current limitations of quantum computing hardware.


A Multi-Agent DRL-Based Framework for Optimal Resource Allocation and Twin Migration in the Multi-Tier Vehicular Metaverse

arXiv.org Artificial Intelligence

Although multi-tier vehicular Metaverse promises to transform vehicles into essential nodes -- within an interconnected digital ecosystem -- using efficient resource allocation and seamless vehicular twin (VT) migration, this can hardly be achieved by the existing techniques operating in a highly dynamic vehicular environment, since they can hardly balance multi-objective optimization problems such as latency reduction, resource utilization, and user experience (UX). To address these challenges, we introduce a novel multi-tier resource allocation and VT migration framework that integrates Graph Convolutional Networks (GCNs), a hierarchical Stackelberg game-based incentive mechanism, and Multi-Agent Deep Reinforcement Learning (MADRL). The GCN-based model captures both spatial and temporal dependencies within the vehicular network; the Stackelberg game-based incentive mechanism fosters cooperation between vehicles and infrastructure; and the MADRL algorithm jointly optimizes resource allocation and VT migration in real time. By modeling this dynamic and multi-tier vehicular Metaverse as a Markov Decision Process (MDP), we develop a MADRL-based algorithm dubbed the Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MO-MADDPG), which can effectively balances the various conflicting objectives. Extensive simulations validate the effectiveness of this algorithm that is demonstrated to enhance scalability, reliability, and efficiency while considerably improving latency, resource utilization, migration cost, and overall UX by 12.8%, 9.7%, 14.2%, and 16.1%, respectively.


One Set to Rule Them All: How to Obtain General Chemical Conditions via Bayesian Optimization over Curried Functions

arXiv.org Artificial Intelligence

General parameters are highly desirable in the natural sciences - e.g., chemical reaction conditions that enable high yields across a range of related transformations. This has a significant practical impact since those general parameters can be transferred to related tasks without the need for laborious and time-intensive re-optimization. While Bayesian optimization (BO) is widely applied to find optimal parameter sets for specific tasks, it has remained underused in experiment planning towards such general optima. In this work, we consider the real-world problem of condition optimization for chemical reactions to study how performing generality-oriented BO can accelerate the identification of general optima, and whether these optima also translate to unseen examples. This is achieved through a careful formulation of the problem as an optimization over curried functions, as well as systematic evaluations of generality-oriented strategies for optimization tasks on real-world experimental data. We find that for generality-oriented optimization, simple myopic optimization strategies that decouple parameter and task selection perform comparably to more complex ones, and that effective optimization is merely determined by an effective exploration of both parameter and task space.


Distributed Online Task Assignment via Inexact ADMM for unplanned online tasks and its Applications to Security

arXiv.org Artificial Intelligence

In multi-robot system (MRS) applications, efficient task assignment is essential not only for coordinating agents and ensuring mission success but also for maintaining overall system security. In this work, we first propose an optimization-based distributed task assignment algorithm that dynamically assigns mandatory security-critical tasks and optional tasks among teams. Leveraging an inexact Alternating Direction Method of Multipliers (ADMM)-based approach, we decompose the task assignment problem into separable and non-separable subproblems. The non-separable subproblems are transformed into an inexact ADMM update by projected gradient descent, which can be performed through several communication steps within the team. In the second part of this paper, we formulate a comprehensive framework that enables MRS under plan-deviation attacks to handle online tasks without compromising security. The process begins with a security analysis that determines whether an online task can be executed securely by a robot and, if so, the required time and location for the robot to rejoin the team. Next, the proposed task assignment algorithm is used to allocate security-related tasks and verified online tasks. Finally, task fulfillment is managed using a Control Lyapunov Function (CLF)-based controller, while security enforcement is ensured through a Control Barrier Function (CBF)-based security filter. Through simulations, we demonstrate that the proposed framework allows MRS to effectively respond to unplanned online tasks while maintaining security guarantees.


LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM

arXiv.org Machine Learning

We study robust parameter-efficient fine-tuning (PEFT) techniques designed to improve accuracy and generalization while operating within strict computational and memory hardware constraints, specifically focusing on large-language models (LLMs). Existing PEFT methods often lack robustness and fail to generalize effectively across diverse tasks, leading to suboptimal performance in real-world scenarios. To address this, we present a new highly computationally efficient framework called AdaZo-SAM, combining Adam and Sharpness-Aware Minimization (SAM) while requiring only a single-gradient computation in every iteration. This is achieved using a stochastic zeroth-order estimation to find SAM's ascent perturbation. We provide a convergence guarantee for AdaZo-SAM and show that it improves the generalization ability of state-of-the-art PEFT methods. Additionally, we design a low-rank gradient optimization method named LORENZA, which is a memory-efficient version of AdaZo-SAM. LORENZA utilizes a randomized SVD scheme to efficiently compute the subspace projection matrix and apply optimization steps onto the selected subspace. This technique enables full-parameter fine-tuning with adaptive low-rank gradient updates, achieving the same reduced memory consumption as gradient-low-rank-projection methods. We provide a convergence analysis of LORENZA and demonstrate its merits for pre-training and fine-tuning LLMs.


Extremely Greedy Equivalence Search

arXiv.org Machine Learning

The goal of causal discovery is to learn a directed acyclic graph from data. One of the most well-known methods for this problem is Greedy Equivalence Search (GES). GES searches for the graph by incrementally and greedily adding or removing edges to maximize a model selection criterion. It has strong theoretical guarantees on infinite data but can fail in practice on finite data. In this paper, we first identify some of the causes of GES's failure, finding that it can get blocked in local optima, especially in denser graphs. We then propose eXtremely Greedy Equivalent Search (XGES), which involves a new heuristic to improve the search strategy of GES while retaining its theoretical guarantees. In particular, XGES favors deleting edges early in the search over inserting edges, which reduces the possibility of the search ending in local optima. A further contribution of this work is an efficient algorithmic formulation of XGES (and GES). We benchmark XGES on simulated datasets with known ground truth. We find that XGES consistently outperforms GES in recovering the correct graphs, and it is 10 times faster. XGES implementations in Python and C++ are available at https://github.com/ANazaret/XGES.