Goto

Collaborating Authors

 Optimization


Review for NeurIPS paper: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

Neural Information Processing Systems

All the reviewers were positive towards the paper. Originally, the paper got 687, with relatively high confidences. The major concern about the paper was that it may not fit for large scale problems. During discussion, Reviewer #3 deemed that the rebuttal is "clear and makes sense", hence raised his/her score from 7 to 8. The AC deemed that the theoretical contribution of the paper is good and agreed to forgive the weakness in experiments. Thus the AC recommended acceptance.


Reviews: Regularizing Trajectory Optimization with Denoising Autoencoders

Neural Information Processing Systems

The paper addresses the problem of reducing the exploitation of inaccuracies of learned dynamics models by trajectory optimization algorithms in model-based Reinforcement Learning. For this, it proposes to add a regularizer to the optimization cost which writes as an estimation of the log probability (in a local window) of sampling the optimized trajectory from the distribution of known trajectories. The idea is to avoid trajectories deviating too much from the data used to learn the dynamics model, and hence avoid unreliable solutions. The authors propose to estimate the log probability term with a denoising autoencoder network. They provide multiple experiments comparing their method to other state-of-the-art approaches on known environments/datasets.


Review for NeurIPS paper: Bayesian Robust Optimization for Imitation Learning

Neural Information Processing Systems

Clarity: Overall, I think the paper is fairly well written. I understand that the authors are working within the page restrictions of the conference. With that said, I think there is substantial room for improvement in the paper presentation. First, I think there are more specific ways to describe the contributions (copied from summary): 1) a linear programming formulation to compute the optimal policy for CVaR; 2) show how to use this to implement robust policy optimization under a prior and robust imitation learning; 3) demonstrate favorable comparisons with existing risk-sensitive and risk neutral algorithms for both settings. Right now I think that the description of the contributions hides the most useful contribution.


Reviews: Globally Optimal Learning for Structured Elliptical Losses

Neural Information Processing Systems

Lemma 1) is incrementally built on previous work. Based on related work, it's hard to contextualize their work in the existing literature. The readers may have the following important questions. A clear explanation is needed for this. This work is significant in that they provide optimality proof that leads to more efficient optimization method for a wide range of robust elliptical losses including Gaussian, Generalized Gaussian, Huber, etc.


AnyNav: Visual Neuro-Symbolic Friction Learning for Off-road Navigation

arXiv.org Artificial Intelligence

Off-road navigation is essential for a wide range of applications in field robotics such as planetary exploration and disaster response. However, it remains an unresolved challenge due to the unstructured environments and inherent complexity of terrain-vehicle interactions. Traditional physics-based methods struggle to accurately model the nonlinear dynamics of these interactions, while data-driven approaches often suffer from overfitting to specific motion patterns, vehicle sizes, and types, limiting their generalizability. To overcome these challenges, we introduce a vision-based friction estimation framework grounded in neuro-symbolic principles, integrating neural networks for visual perception with symbolic reasoning for physical modeling. This enables significantly improved generalization abilities through explicit physical reasoning incorporating the predicted friction. Additionally, we develop a physics-informed planner that leverages the learned friction coefficient to generate physically feasible and efficient paths, along with corresponding speed profiles. We refer to our approach as AnyNav and evaluate it in both simulation and real-world experiments, demonstrating its utility and robustness across various off-road scenarios and multiple types of four-wheeled vehicles. These results mark an important step toward developing neuro-symbolic spatial intelligence to reason about complex, unstructured environments and enable autonomous off-road navigation in challenging scenarios. Video demonstrations are available at https://sairlab.org/anynav/, where the source code will also be released.


Grid-based Submap Joining: An Efficient Algorithm for Simultaneously Optimizing Global Occupancy Map and Local Submap Frames

arXiv.org Artificial Intelligence

Optimizing robot poses and the map simultaneously has been shown to provide more accurate SLAM results. However, for non-feature based SLAM approaches, directly optimizing all the robot poses and the whole map will greatly increase the computational cost, making SLAM problems difficult to solve in large-scale environments. To solve the 2D non-feature based SLAM problem in large-scale environments more accurately and efficiently, we propose the grid-based submap joining method. Specifically, we first formulate the 2D grid-based submap joining problem as a non-linear least squares (NLLS) form to optimize the global occupancy map and local submap frames simultaneously. We then prove that in solving the NLLS problem using Gauss-Newton (GN) method, the increments of the poses in each iteration are independent of the occupancy values of the global occupancy map. Based on this property, we propose a poseonly GN algorithm equivalent to full GN method to solve the NLLS problem. The proposed submap joining algorithm is very efficient due to the independent property and the pose-only solution. Evaluations using simulations and publicly available practical 2D laser datasets confirm the outperformance of our proposed method compared to the state-of-the-art methods in terms of efficiency and accuracy, as well as the ability to solve the grid-based SLAM problem in very large-scale environments.


Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling

arXiv.org Artificial Intelligence

Effective multi-user delay-constrained scheduling is crucial in various real-world applications, such as instant messaging, live streaming, and data center management. In these scenarios, schedulers must make real-time decisions to satisfy both delay and resource constraints without prior knowledge of system dynamics, which are often time-varying and challenging to estimate. Current learning-based methods typically require interactions with actual systems during the training stage, which can be difficult or impractical, as it is capable of significantly degrading system performance and incurring substantial service costs. To address these challenges, we propose a novel offline reinforcement learning-based algorithm, named \underline{S}cheduling By \underline{O}ffline Learning with \underline{C}ritic Guidance and \underline{D}iffusion Generation (SOCD), to learn efficient scheduling policies purely from pre-collected \emph{offline data}. SOCD innovatively employs a diffusion-based policy network, complemented by a sampling-free critic network for policy guidance. By integrating the Lagrangian multiplier optimization into the offline reinforcement learning, SOCD effectively trains high-quality constraint-aware policies exclusively from available datasets, eliminating the need for online interactions with the system. Experimental results demonstrate that SOCD is resilient to various system dynamics, including partially observable and large-scale environments, and delivers superior performance compared to existing methods.


Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization

arXiv.org Artificial Intelligence

Differential evolution (DE) algorithm is recognized as one of the most effective evolutionary algorithms, demonstrating remarkable efficacy in black-box optimization due to its derivative-free nature. Numerous enhancements to the fundamental DE have been proposed, incorporating innovative mutation strategies and sophisticated parameter tuning techniques to improve performance. However, no single variant has proven universally superior across all problems. To address this challenge, we introduce a novel framework that employs reinforcement learning (RL) to automatically design DE for black-box optimization through meta-learning. RL acts as an advanced meta-optimizer, generating a customized DE configuration that includes an optimal initialization strategy, update rule, and hyperparameters tailored to a specific black-box optimization problem. This process is informed by a detailed analysis of the problem characteristics. In this proof-of-concept study, we utilize a double deep Q-network for implementation, considering a subset of 40 possible strategy combinations and parameter optimizations simultaneously. The framework's performance is evaluated against black-box optimization benchmarks and compared with state-of-the-art algorithms. The experimental results highlight the promising potential of our proposed framework.


Optimizing Return Distributions with Distributional Dynamic Programming

arXiv.org Artificial Intelligence

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.


Information-theoretic Bayesian Optimization: Survey and Tutorial

arXiv.org Machine Learning

Several scenarios require the optimization of non-convex black-box functions, that are noisy expensive to evaluate functions with unknown analytical expression, whose gradients are hence not accessible. For example, the hyper-parameter tuning problem of machine learning models. Bayesian optimization is a class of methods with state-of-the-art performance delivering a solution to this problem in real scenarios. It uses an iterative process that employs a probabilistic surrogate model, typically a Gaussian process, of the objective function to be optimized computing a posterior predictive distribution of the black-box function. Based on the information given by this posterior predictive distribution, Bayesian optimization includes the computation of an acquisition function that represents, for every input space point, the utility of evaluating that point in the next iteraiton if the objective of the process is to retrieve a global extremum. This paper is a survey of the information theoretical acquisition functions, whose performance typically outperforms the rest of acquisition functions. The main concepts of the field of information theory are also described in detail to make the reader aware of why information theory acquisition functions deliver great results in Bayesian optimization and how can we approximate them when they are intractable. We also cover how information theory acquisition functions can be adapted to complex optimization scenarios such as the multi-objective, constrained, non-myopic, multi-fidelity, parallel and asynchronous settings and provide further lines of research.