Optimization
Information Templates: A New Paradigm for Intelligent Active Feature Acquisition
Huang, Hung-Tien, Dinh, Dzung, Oliva, Junier B.
Active feature acquisition (AFA) is an instance-adaptive paradigm in which, at test time, a policy sequentially chooses which features to acquire (at a cost) before predicting. Existing approaches either train reinforcement learning (RL) policies, which deal with a difficult MDP, or greedy policies that cannot account for the joint informativeness of features or require knowledge about the underlying data distribution. To overcome this, we propose Template-based AFA (TAFA), a non-greedy framework that learns a small library of feature templates--a set of features that are jointly informative--and uses this library of templates to guide the next feature acquisitions. Through identifying feature templates, the proposed framework not only significantly reduces the action space considered by the policy but also alleviates the need to estimate the underlying data distribution. Extensive experiments on synthetic and real-world datasets show that TAFA outperforms the existing state-of-the-art baselines while achieving lower overall acquisition cost and computation.
Facilitating Matches on Allocation Platforms
Trabelsi, Yohai, Adiga, Abhijin, Aumann, Yonatan, Kraus, Sarit, Ravi, S. S.
We consider a setting where goods are allocated to agents by way of an allocation platform (e.g., a matching platform). An "allocation facilitator" aims to increase the overall utility/social-good of the allocation by encouraging (some of the) agents to relax (some of) their restrictions. At the same time, the advice must not hurt agents who would otherwise be better off. Additionally, the facilitator may be constrained by a "bound" (a.k.a. 'budget'), limiting the number and/or type of restrictions it may seek to relax. We consider the facilitator's optimization problem of choosing an optimal set of restrictions to request to relax under the aforementioned constraints. Our contributions are three-fold: (i) We provide a formal definition of the problem, including the participation guarantees to which the facilitator should adhere. We define a hierarchy of participation guarantees and also consider several social-good functions.
Trajectory Optimization for UAV-Based Medical Delivery with Temporal Logic Constraints and Convex Feasible Set Collision Avoidance
Chen, Kaiyuan, Suo, Yuhan, Cui, Shaowei, Xia, Yuanqing, Liang, Wannian, Wang, Shuo
This paper addresses the problem of trajectory optimization for unmanned aerial vehicles (UAVs) performing time-sensitive medical deliveries in urban environments. Specifically, we consider a single UAV with 3 degree-of-freedom dynamics tasked with delivering blood packages to multiple hospitals, each with a predefined time window and priority. Mission objectives are encoded using Signal Temporal Logic (STL), enabling the formal specification of spatial-temporal constraints. To ensure safety, city buildings are modeled as 3D convex obstacles, and obstacle avoidance is handled through a Convex Feasible Set (CFS) method. The entire planning problem-combining UAV dynamics, STL satisfaction, and collision avoidance-is formulated as a convex optimization problem that ensures tractability and can be solved efficiently using standard convex programming techniques. Simulation results demonstrate that the proposed method generates dynamically feasible, collision-free trajectories that satisfy temporal mission goals, providing a scalable and reliable approach for autonomous UAV-based medical logistics.
MOCA-HESP: Meta High-dimensional Bayesian Optimization for Combinatorial and Mixed Spaces via Hyper-ellipsoid Partitioning
Ngo, Lam, Ha, Huong, Chan, Jeffrey, Zhang, Hongyu
High-dimensional Bayesian Optimization (BO) has attracted significant attention in recent research. However, existing methods have mainly focused on optimizing in continuous domains, while combinatorial (ordinal and categorical) and mixed domains still remain challenging. In this paper, we first propose MOCA-HESP, a novel high-dimensional BO method for combinatorial and mixed variables. The key idea is to leverage the hyper-ellipsoid space partitioning (HESP) technique with different categorical encoders to work with high-dimensional, combinatorial and mixed spaces, while adaptively selecting the optimal encoders for HESP using a multi-armed bandit technique. Our method, MOCA-HESP, is designed as a \textit{meta-algorithm} such that it can incorporate other combinatorial and mixed BO optimizers to further enhance the optimizers' performance. Finally, we develop three practical BO methods by integrating MOCA-HESP with state-of-the-art BO optimizers for combinatorial and mixed variables: standard BO, CASMOPOLITAN, and Bounce. Our experimental results on various synthetic and real-world benchmarks show that our methods outperform existing baselines. Our code implementation can be found at https://github.com/LamNgo1/moca-hesp
VFOG: Variance-Reduced Fast Optimistic Gradient Methods for a Class of Nonmonotone Generalized Equations
Tran-Dinh, Quoc, Nguyen-Trung, Nghia
We develop a novel optimistic gradient-type algorithmic framework, combining both Nesterov's acceleration and variance-reduction techniques, to solve a class of generalized equations involving possibly nonmonotone operators in data-driven applications. Our framework covers a wide class of stochastic variance-reduced schemes, including mini-batching, and control variate unbiased and biased estimators. We establish that our method achieves $\mathcal{O}(1/k^2)$ convergence rates in expectation on the squared norm of residual under the Lipschitz continuity and a ``co-hypomonotonicity-type'' assumptions, improving upon non-accelerated counterparts by a factor of $1/k$. We also prove faster $o(1/k^2)$ convergence rates, both in expectation and almost surely. In addition, we show that the sequence of iterates of our method almost surely converges to a solution of the underlying problem. We demonstrate the applicability of our method using general error bound criteria, covering mini-batch stochastic estimators as well as three well-known control variate estimators: loopless SVRG, SAGA, and loopless SARAH, for which the last three variants attain significantly better oracle complexity compared to existing methods. We validate our framework and theoretical results through two numerical examples. The preliminary results illustrate promising performance of our accelerated method over its non-accelerated counterparts.
On the sample complexity of semi-supervised multi-objective learning
Wegel, Tobias, So, Geelon, Park, Junhyung, Yang, Fanny
In multi-objective learning (MOL), several possibly competing prediction tasks must be solved jointly by a single model. Achieving good trade-offs may require a model class $\mathcal{G}$ with larger capacity than what is necessary for solving the individual tasks. This, in turn, increases the statistical cost, as reflected in known MOL bounds that depend on the complexity of $\mathcal{G}$. We show that this cost is unavoidable for some losses, even in an idealized semi-supervised setting, where the learner has access to the Bayes-optimal solutions for the individual tasks as well as the marginal distributions over the covariates. On the other hand, for objectives defined with Bregman losses, we prove that the complexity of $\mathcal{G}$ may come into play only in terms of unlabeled data. Concretely, we establish sample complexity upper bounds, showing precisely when and how unlabeled data can significantly alleviate the need for labeled data. These rates are achieved by a simple, semi-supervised algorithm via pseudo-labeling.
Predictability Enables Parallelization of Nonlinear State Space Models
Gonzalez, Xavier, Kozachkov, Leo, Zoltowski, David M., Clarkson, Kenneth L., Linderman, Scott W.
The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) or DeepPCR (arXiv:2309.16318) have shown that evaluating a state space model can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in evaluation time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. In this work, we establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in $O((\log T)^2)$ time, where $T$ is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis demonstrates that for predictable systems, the optimization problem is always well-conditioned, whereas for unpredictable systems, the conditioning degrades exponentially as a function of the sequence length. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.
Frequency Response Identification of Low-Order Systems: Finite-Sample Analysis
Honarpisheh, Arya, Sznaier, Mario
This paper proposes a frequency-domain system identification method for learning low-order systems. The identification problem is formulated as the minimization of the l2 norm between the identified and measured frequency responses, with the nuclear norm of the Loewner matrix serving as a regularization term. This formulation results in an optimization problem that can be efficiently solved using standard convex optimization techniques. We derive an upper bound on the sampled-frequency complexity of the identification process and subsequently extend this bound to characterize the identification error over all frequencies. A detailed analysis of the sample complexity is provided, along with a thorough interpretation of its terms and dependencies. Finally, the efficacy of the proposed method is demonstrated through an example, along with numerical simulations validating the growth rate of the sample complexity bound.
Introduction to Regularization and Learning Methods for Inverse Problems
Bednarski, Danielle, Roith, Tim
These lecture notes evolve around mathematical concepts arising in inverse problems. We start by introducing inverse problems through examples such as differentiation, deconvolution, computed tomography and phase retrieval. This then leads us to the framework of well-posedness and first considerations regarding reconstruction and inversion approaches. The second chapter then first deals with classical regularization theory of inverse problems in Hilbert spaces. After introducing the pseudo-inverse, we review the concept of convergent regularization. Within this chapter we then proceed to ask the question of how to realize practical reconstruction algorithms. Here, we mainly focus on Tikhonov and sparsity promoting regularization in finite dimensional spaces. In the third chapter, we dive into modern deep-learning methods, which allow solving inverse problems in a data-dependent approach. The intersection between inverse problems and machine learning is a rapidly growing field and our exposition here restricts itself to a very limited selection of topics. Among them are learned regularization, fully-learned Bayesian estimation, post-processing strategies and plug-n-play methods.
Riemannian Optimization for LoRA on the Stiefel Manifold
Park, Juneyoung, Kang, Minjae, Lee, Seongbae, Lee, Haegang, Kim, Seongwan, Lee, Jaeho
While powerful, large language models (LLMs) present significant fine-tuning challenges due to their size. Parameter-efficient fine-tuning (PEFT) methods like LoRA provide solutions, yet suffer from critical optimizer inefficiencies; notably basis redundancy in LoRA's $B$ matrix when using AdamW, which fundamentally limits performance. We address this by optimizing the $B$ matrix on the Stiefel manifold, imposing explicit orthogonality constraints that achieve near-perfect orthogonality and full effective rank. This geometric approach dramatically enhances parameter efficiency and representational capacity. Our Stiefel optimizer consistently outperforms AdamW across benchmarks with both LoRA and DoRA, demonstrating that geometric constraints are the key to unlocking LoRA's full potential for effective LLM fine-tuning.