AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks

Chen, Zhiliang, Lau, Gregory Kang Ruey, Foo, Chuan-Sheng, Low, Bryan Kian Hsiang

arXiv.org Machine LearningJan-31-2025

The performance of a machine learning (ML) model depends heavily on the relevance of its training data to the domain of the downstream evaluation task. However, in practice, the data involved in an unseen evaluation task is often not known to us (e.g., conversations between an LLM and a user are end-to-end encrypted). So, it is not obvious what data would be relevant for training/fine-tuning the ML model to maximize its task performance. Instead, one can only deploy the ML model in the unseen evaluation task to gather multiple rounds of coarse feedback on how well the model has performed. This paper presents a novel global-to-local algorithm called DUET that can exploit the feedback loop by interleaving a data selection method with Bayesian optimization. As a result, DUET can efficiently refine the training data mixture from a pool of data domains to maximize the model's performance on the unseen evaluation task and its convergence to the optimal data mixture can be theoretically guaranteed by analyzing its cumulative regret. Empirical evaluation on image and LLM evaluation tasks shows that DUET finds better training data mixtures than conventional baselines.

data mixture, large language model, machine learning, (17 more...)

arXiv.org Machine Learning

2502.0027

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Locality-aware Surrogates for Gradient-based Black-box Optimization

Momeni, Ali, Uhlich, Stefan, Venkitaraman, Arun, Hsieh, Chia-Yu, Bonetti, Andrea, Matsuo, Ryoga, Ohbuchi, Eisaku, Servadei, Lorenzo

arXiv.org Artificial IntelligenceJan-31-2025

In physics and engineering, many processes are modeled using non-differentiable black-box simulators, making the optimization of such functions particularly challenging. To address such cases, inspired by the Gradient Theorem, we propose locality-aware surrogate models for active model-based black-box optimization. We first establish a theoretical connection between gradient alignment and the minimization of a Gradient Path Integral Equation (GradPIE) loss, which enforces consistency of the surrogate's gradients in local regions of the design space. Leveraging this theoretical insight, we develop a scalable training algorithm that minimizes the GradPIE loss, enabling both offline and online learning while maintaining computational efficiency. We evaluate our approach on three real-world tasks - spanning automated in silico experiments such as coupled nonlinear oscillators, analog circuits, and optical systems - and demonstrate consistent improvements in optimization efficiency under limited query budgets. Our results offer dependable solutions for both offline and online optimization tasks where reliable gradient estimation is needed.

artificial intelligence, machine learning, surrogate model, (17 more...)

arXiv.org Artificial Intelligence

2501.19161

Country:

Europe > Switzerland (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Air (0.87)
Education > Educational Setting > Online (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

HoP: Homeomorphic Polar Learning for Hard Constrained Optimization

Deng, Ke, Zhang, Hanwen, Lu, Jin, Sun, Haijian

arXiv.org Artificial IntelligenceJan-31-2025

Constrained optimization demands highly efficient solvers which promotes the development of learn-to-optimize (L2O) approaches. As a data-driven method, L2O leverages neural networks to efficiently produce approximate solutions. However, a significant challenge remains in ensuring both optimality and feasibility of neural networks' output. To tackle this issue, we introduce Homeomorphic Polar Learning (HoP) to solve the star-convex hard-constrained optimization by embedding homeomorphic mapping in neural networks. The bijective structure enables end-to-end training without extra penalty or correction. For performance evaluation, we evaluate HoP's performance across a variety of synthetic optimization tasks and real-world applications in wireless communications. In all cases, HoP achieves solutions closer to the optimum than existing L2O methods while strictly maintaining feasibility.

artificial intelligence, constraint, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.00304

Country: North America > United States > Georgia > Clarke County > Athens (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits

Mohseni, Masoud, Scherer, Artur, Johnson, K. Grace, Wertheim, Oded, Otten, Matthew, Aadit, Navid Anjum, Alexeev, Yuri, Bresniker, Kirk M., Camsari, Kerem Y., Chapman, Barbara, Chatterjee, Soumitra, Dagnew, Gebremedhin A., Esposito, Aniello, Fahim, Farah, Fiorentino, Marco, Gajjar, Archit, Khalid, Abdullah, Kong, Xiangzhou, Kulchytskyy, Bohdan, Kyoseva, Elica, Li, Ruoyu, Lott, P. Aaron, Markov, Igor L., McDermott, Robert F., Pedretti, Giacomo, Rao, Pooja, Rieffel, Eleanor, Silva, Allyson, Sorebo, John, Spentzouris, Panagiotis, Steiner, Ziv, Torosov, Boyan, Venturelli, Davide, Visser, Robert J., Webb, Zak, Zhan, Xin, Cohen, Yonatan, Ronagh, Pooya, Ho, Alan, Beausoleil, Raymond G., Martinis, John M.

arXiv.org Artificial IntelligenceJan-31-2025

In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits and proof-of-principle error-correction on a single logical qubit. Nevertheless, despite significant progress and excitement, the path toward a full-stack scalable technology is largely unknown. There are significant outstanding quantum hardware, fabrication, software architecture, and algorithmic challenges that are either unresolved or overlooked. These issues could seriously undermine the arrival of utility-scale quantum computers for the foreseeable future. Here, we provide a comprehensive review of these scaling challenges. We show how the road to scaling could be paved by adopting existing semiconductor technology to build much higher-quality qubits, employing system engineering approaches, and performing distributed quantum computation within heterogeneous high-performance computing infrastructures. These opportunities for research and development could unlock certain promising applications, in particular, efficient quantum simulation/learning of quantum data generated by natural or engineered quantum systems. To estimate the true cost of such promises, we provide a detailed resource and sensitivity analysis for classically hard quantum chemistry calculations on surface-code error-corrected quantum computers given current, target, and desired hardware specifications based on superconducting qubits, accounting for a realistic distribution of errors. Furthermore, we argue that, to tackle industry-scale classical optimization and machine learning problems in a cost-effective manner, heterogeneous quantum-probabilistic computing with custom-designed accelerators should be considered as a complementary path toward scalability.

data mining, machine learning, programming language, (23 more...)

arXiv.org Artificial Intelligence

2411.10406

Country:

North America > United States > Wisconsin > Dane County > Madison (0.13)
North America > United States > California > Santa Barbara County > Santa Barbara (0.13)
Asia > Japan (0.13)
(12 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Workflow (0.92)
Overview (0.85)

Industry:

Semiconductors & Electronics (1.00)
Information Technology (1.00)
Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Hardware (1.00)
Information Technology > Data Science > Data Mining (1.00)
(6 more...)

Add feedback

Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone

Hassanpour, Negar, Janjua, Muhammad Kamran, Zhang, Kunlin, Lavasani, Sepehr, Zhang, Xiaowen, Zhou, Chunhua, Gao, Chao

arXiv.org Artificial IntelligenceJan-31-2025

Balancing competing objectives remains a fundamental challenge in multi-task learning (MTL), primarily due to conflicting gradients across individual tasks. A common solution relies on computing a dynamic gradient update vector that balances competing tasks as optimization progresses. Building on this idea, we propose ConicGrad, a principled, scalable, and robust MTL approach formulated as a constrained optimization problem. Our method introduces an angular constraint to dynamically regulate gradient update directions, confining them within a cone centered on the reference gradient of the overall objective. By balancing task-specific gradients without over-constraining their direction or magnitude, ConicGrad effectively resolves inter-task gradient conflicts. Moreover, our framework ensures computational efficiency and scalability to high-dimensional parameter spaces. We conduct extensive experiments on standard supervised learning and reinforcement learning MTL benchmarks, and demonstrate that ConicGrad achieves state-of-the-art performance across diverse tasks.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2502.00217

Country:

North America > Canada (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback

Optimizing Through Change: Bounds and Recommendations for Time-Varying Bayesian Optimization Algorithms

Bardou, Anthony, Thiran, Patrick

arXiv.org Machine LearningJan-31-2025

Time-Varying Bayesian Optimization (TVBO) is the go-to framework for optimizing a time-varying, expensive, noisy black-box function. However, most of the solutions proposed so far either rely on unrealistic assumptions on the nature of the objective function or do not offer any theoretical guarantees. We propose the first analysis that asymptotically bounds the cumulative regret of TVBO algorithms under mild and realistic assumptions only. In particular, we provide an algorithm-independent lower regret bound and an upper regret bound that holds for a large class of TVBO algorithms. Based on this analysis, we formulate recommendations for TVBO algorithms and show how an algorithm (BOLT) that follows them performs better than the state-of-the-art of TVBO through experiments on synthetic and real-world problems.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2501.18963

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.93)

Add feedback

Differentiable Simulation of Soft Robots with Frictional Contacts

Ménager, Etienne, Montaut, Louis, Lidec, Quentin Le, Carpentier, Justin

arXiv.org Artificial IntelligenceJan-31-2025

In recent years, soft robotics simulators have evolved to offer various functionalities, including the simulation of different material types (e.g., elastic, hyper-elastic) and actuation methods (e.g., pneumatic, cable-driven, servomotor). These simulators also provide tools for various tasks, such as calibration, design, and control. However, efficiently and accurately computing derivatives within these simulators remains a challenge, particularly in the presence of physical contact interactions. Incorporating these derivatives can, for instance, significantly improve the convergence speed of control methods like reinforcement learning and trajectory optimization, enable gradient-based techniques for design, or facilitate end-to-end machine-learning approaches for model reduction. This paper addresses these challenges by introducing a unified method for computing the derivatives of mechanical equations within the finite element method framework, including contact interactions modeled as a nonlinear complementarity problem. The proposed approach handles both collision and friction phases, accounts for their nonsmooth dynamics, and leverages the sparsity introduced by mesh-based models. Its effectiveness is demonstrated through several examples of controlling and calibrating soft systems.

artificial intelligence, machine learning, robot, (19 more...)

arXiv.org Artificial Intelligence

2501.18956

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Pareto-frontier Entropy Search with Variational Lower Bound Maximization

Ishikura, Masanori, Karasuyama, Masayuki

arXiv.org Machine LearningJan-31-2025

This study considers multi-objective Bayesian optimization (MOBO) through the information gain of the Pareto-frontier. To calculate the information gain, a predictive distribution conditioned on the Pareto-frontier plays a key role, which is defined as a distribution truncated by the Pareto-frontier. However, it is usually impossible to obtain the entire Pareto-frontier in a continuous domain, and therefore, the complete truncation cannot be known. We consider an approximation of the truncate distribution by using a mixture distribution consisting of two possible approximate truncation obtainable from a subset of the Pareto-frontier, which we call over- and under-truncation. Since the optimal balance of the mixture is unknown beforehand, we propose optimizing the balancing coefficient through the variational lower bound maximization framework, by which the approximation error of the information gain can be minimized. Our empirical evaluation demonstrates the effectiveness of the proposed method particularly when the number of objective functions is large.

approximation, artificial intelligence, optimization problem, (12 more...)

arXiv.org Machine Learning

2501.19073

Country:

Europe > Germany (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Trajectory Optimization Under Stochastic Dynamics Leveraging Maximum Mean Discrepancy

Sharma, Basant, Singh, Arun Kumar

arXiv.org Artificial IntelligenceJan-31-2025

This paper addresses sampling-based trajectory optimization for risk-aware navigation under stochastic dynamics. Typically such approaches operate by computing $\tilde{N}$ perturbed rollouts around the nominal dynamics to estimate the collision risk associated with a sequence of control commands. We consider a setting where it is expensive to estimate risk using perturbed rollouts, for example, due to expensive collision-checks. We put forward two key contributions. First, we develop an algorithm that distills the statistical information from a larger set of rollouts to a reduced-set with sample size $N<<\tilde{N}$. Consequently, we estimate collision risk using just $N$ rollouts instead of $\tilde{N}$. Second, we formulate a novel surrogate for the collision risk that can leverage the distilled statistical information contained in the reduced-set. We formalize both algorithmic contributions using distribution embedding in Reproducing Kernel Hilbert Space (RKHS) and Maximum Mean Discrepancy (MMD). We perform extensive benchmarking to demonstrate that our MMD-based approach leads to safer trajectories at low sample regime than existing baselines using Conditional Value-at Risk (CVaR) based collision risk estimate.

artificial intelligence, machine learning, optimization, (18 more...)

arXiv.org Artificial Intelligence

2501.19045

Country: Europe > Estonia > Tartu County > Tartu (0.04)

Genre: Research Report (0.82)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

ProxSparse: Regularized Learning of Semi-Structured Sparsity Masks for Pretrained LLMs

Liu, Hongyi, Saha, Rajarshi, Jia, Zhen, Park, Youngsuk, Huang, Jiaji, Sabach, Shoham, Wang, Yu-Xiang, Karypis, George

arXiv.org Artificial IntelligenceJan-31-2025

Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semi-structured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback. We present ProxSparse, a learning-based framework for mask selection enabled by regularized optimization. ProxSparse transforms the rigid, non-differentiable mask selection process into a smoother optimization procedure, allowing gradual mask exploration with flexibility. ProxSparse does not involve additional weight updates once the mask is determined. Our extensive evaluations on 7 widely used models show that ProxSparse consistently outperforms previously proposed semi-structured mask selection methods with significant improvement, demonstrating the effectiveness of our learned approach towards semi-structured pruning.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.00258

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback