AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Redeeming intrinsic rewards via constrained optimization

Neural Information Processing SystemsMay-26-2025, 19:26:40 GMT

State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., \epsilon -greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of exploration, prior works incentivize exploration by rewarding the agent when it visits novel states. Such intrinsic rewards (also called exploration bonus or curiosity) often lead to excellent performance on hard exploration tasks. However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available. Consequently, such an overly curious agent performs worse than an agent trained with only task reward. Such inconsistency in performance across tasks prevents the widespread use of intrinsic rewards with RL algorithms.

intrinsic reward, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.58)

Add feedback

Acceleration Exists! Optimization Problems When Oracle Can Only Compare Objective Function Values

Neural Information Processing SystemsMay-26-2025, 19:08:06 GMT

Frequently, the burgeoning field of black-box optimization encounters challenges due to a limited understanding of the mechanisms of the objective function. To address such problems, in this work we focus on the deterministic concept of Order Oracle, which only utilizes order access between function values (possibly with some bounded noise), but without assuming access to their values. As theoretical results, we propose a new approach to create non-accelerated optimization algorithms (obtained by integrating Order Oracle into existing optimization "tools") in non-convex, convex, and strongly convex settings that are as good as both SOTA coordinate algorithms with first-order oracle and SOTA algorithms with Order Oracle up to logarithm factor. Moreover, using the proposed approach, we provide the first accelerated optimization algorithm using the Order Oracle. And also, using an already different approach we provide the asymptotic convergence of the first algorithm with the stochastic Order Oracle concept.

algorithm, artificial intelligence, optimization problem, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Globally Q-linear Gauss-Newton Method for Overparameterized Non-convex Matrix Sensing

Neural Information Processing SystemsMay-26-2025, 18:56:25 GMT

This paper focuses on the optimization of overparameterized, non-convex low-rank matrix sensing (LRMS)--an essential component in contemporary statistics and machine learning. Recent years have witnessed significant breakthroughs in first-order methods, such as gradient descent, for tackling this non-convex optimization problem. However, the presence of numerous saddle points often prolongs the time required for gradient descent to overcome these obstacles. In this paper, we introduce an approximated Gauss-Newton (AGN) method for tackling the non-convex LRMS problem. Notably, AGN incurs a computational cost comparable to gradient descent per iteration but converges much faster without being slowed down by saddle points. We prove that, despite the non-convexity of the objective function, AGN achieves Q-linear convergence from random initialization to the global optimal solution.

artificial intelligence, machine learning, overparameterized non-convex matrix sensing, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Neural Information Processing SystemsMay-26-2025, 18:36:50 GMT

Finding specific preference-guided Pareto solutions that represent different trade-offs among multiple objectives is critical yet challenging in multi-objective problems. Existing methods are restrictive in preference definitions and/or their theoretical guarantees.In this work, we introduce a Flexible framEwork for pREfeRence-guided multi-Objective learning (FERERO) by casting it as a constrained vector optimization problem.Specifically, two types of preferences are incorporated into this formulation -- the relative preference defined by the partial ordering induced by a polyhedral cone, and the absolute preference defined by constraints that are linear functions of the objectives. To solve this problem, convergent algorithms are developed with both single-loop and stochastic variants. Notably, this is the first single-loop primal algorithm for constrained optimization to our knowledge. The proposed algorithms adaptively adjust to both constraint and objective values, eliminating the need to solve different subproblems at different stages of constraint satisfaction.

ferero, flexible framework, preference-guided multi-objective learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.82)

Add feedback

Fairness-Aware Estimation of Graphical Models

Neural Information Processing SystemsMay-26-2025, 18:22:18 GMT

This paper examines the issue of fairness in the estimation of graphical models (GMs), particularly Gaussian, Covariance, and Ising models. These models play a vital role in understanding complex relationships in high-dimensional data. However, standard GMs can result in biased outcomes, especially when the underlying data involves sensitive characteristics or protected groups. To address this, we introduce a comprehensive framework designed to reduce bias in the estimation of GMs related to protected attributes. Our approach involves the integration of the pairwise graph disparity error and a tailored loss function into a nonsmooth multi-objective optimization problem, striving to achieve fairness across different sensitive groups while maintaining the effectiveness of the GMs.

artificial intelligence, fairness-aware estimation, optimization problem, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

A Globally Optimal Portfolio for m-Sparse Sharpe Ratio Maximization

Neural Information Processing SystemsMay-26-2025, 18:12:23 GMT

The Sharpe ratio is an important and widely-used risk-adjusted return in financial engineering. In modern portfolio management, one may require an m-sparse (no more than m active assets) portfolio to save managerial and financial costs. We propose to convert the m-sparse fractional optimization problem into an equivalent m-sparse quadratic programming problem. The semi-algebraic property of the resulting objective function allows us to exploit the Kurdyka-Lojasiewicz property to develop an efficient Proximal Gradient Algorithm (PGA) that leads to a portfolio which achieves the globally optimal m-sparse Sharpe ratio under certain conditions. The convergence rates of PGA are also provided.

artificial intelligence, m-sparse sharpe ratio maximization, optimization problem, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

CoBo: Collaborative Learning via Bilevel Optimization

Neural Information Processing SystemsMay-26-2025, 17:52:55 GMT

Collaborative learning is an important tool to train multiple clients more effectively by enabling communication among clients. Identifying helpful clients, however, presents challenging and often introduces significant overhead. In this paper, we model client-selection and model-training as two interconnected optimization problems, proposing a novel bilevel optimization problem for collaborative learning.We introduce CoBo, a scalable and elastic, SGD-type alternating optimization algorithm that efficiently addresses these problem with theoretical convergence guarantees. Empirically, CoBo achieves superior performance, surpassing popular personalization algorithms by 9.3% in accuracy on a task with high heterogeneity, involving datasets distributed among 80 clients.

artificial intelligence, collaborative learning, optimization problem, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)
Information Technology > Communications > Collaboration (0.97)

Add feedback

Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification

Neural Information Processing SystemsMay-26-2025, 17:41:21 GMT

When applying reinforcement learning from human feedback (RLHF), the reward is learned from data and, therefore, always has some error. It is common to mitigate this by regularizing the policy with KL divergence from a base model, with the hope that balancing reward with regularization will achieve desirable outcomes despite this reward misspecification. We show that when the reward function has light-tailed error, optimal policies under less restrictive KL penalties achieve arbitrarily high utility. However, if error is heavy-tailed, some policies obtain arbitrarily high reward despite achieving no more utility than the base model--a phenomenon we call catastrophic Goodhart. We adapt a discrete optimization method to measure the tails of reward models, finding that they are consistent with light-tailed error. However, the pervasiveness of heavy-tailed distributions in many real-world applications indicates that future sources of RL reward could have heavy-tailed error, increasing the likelihood of reward hacking even with KL regularization.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Boundary Decomposition for Nadir Objective Vector Estimation

Neural Information Processing SystemsMay-26-2025, 17:37:37 GMT

The nadir objective vector plays a key role in solving multi-objective optimization problems (MOPs), where it is often used to normalize the objective space and guide the search. The current methods for estimating the nadir objective vector perform effectively only on specific MOPs. This paper reveals the limitations of these methods: exact methods can only work on discrete MOPs, while heuristic methods cannot deal with the MOP with a complicated feasible objective region. To fill this gap, we propose a general and rigorous method, namely boundary decomposition for nadir objective vector estimation (BDNE). By utilizing bilevel optimization, boundary subproblems are optimized and adjusted alternately, thereby refining their optimal solutions to align with the nadir objective vector.

artificial intelligence, nadir objective vector estimation, optimization problem, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Pretrained Optimization Model for Zero-Shot Black Box Optimization

Neural Information Processing SystemsMay-26-2025, 17:37:09 GMT

Zero-shot optimization involves optimizing a target task that was not seen during training, aiming to provide the optimal solution without or with minimal adjustments to the optimizer. It is crucial to ensure reliable and robust performance in various applications. Current optimizers often struggle with zero-shot optimization and require intricate hyperparameter tuning to adapt to new tasks. To address this, we propose a Pretrained Optimization Model (POM) that leverages knowledge gained from optimizing diverse tasks, offering efficient solutions to zero-shot optimization through direct application or fine-tuning with few-shot samples. Evaluation on the BBOB benchmark and two robot control tasks demonstrates that POM outperforms state-of-the-art black-box optimization methods, especially for high-dimensional tasks.

artificial intelligence, large language model, natural language, (4 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback