Goto

Collaborating Authors

 Optimization


Accept More, Reject Less: Reducing up to 19% Unnecessary Desk-Rejections over 11 Years of ICLR Data

arXiv.org Artificial Intelligence

The explosive growth of AI research has driven paper submissions at flagship AI conferences to unprecedented levels, necessitating many venues in 2025 (e.g., CVPR, ICCV, KDD, AAAI, IJCAI, WSDM) to enforce strict per-author submission limits and to desk-reject any excess papers by simple ID order. While this policy helps reduce reviewer workload, it may unintentionally discard valuable papers and penalize authors' efforts. In this paper, we ask an essential research question on whether it is possible to follow submission limits while minimizing needless rejections. We first formalize the current desk-rejection policies as an optimization problem, and then develop a practical algorithm based on linear programming relaxation and a rounding scheme. Under extensive evaluation on 11 years of real-world ICLR (International Conference on Learning Representations) data, our method preserves up to $19.23\%$ more papers without violating any author limits. Moreover, our algorithm is highly efficient in practice, with all results on ICLR data computed within at most 53.64 seconds. Our work provides a simple and practical desk-rejection strategy that significantly reduces unnecessary rejections, demonstrating strong potential to improve current CS conference submission policies.


VoxelOpt: Voxel-Adaptive Message Passing for Discrete Optimization in Deformable Abdominal CT Registration

arXiv.org Artificial Intelligence

Recent developments in neural networks have improved de-formable image registration (DIR) by amortizing iterative optimization, enabling fast and accurate DIR results. However, learning-based methods often face challenges with limited training data, large deformations, and tend to underperform compared to iterative approaches when label supervision is unavailable. While iterative methods can achieve higher accuracy in such scenarios, they are considerably slower than learning-based methods. To address these limitations, we propose VoxelOpt, a discrete optimization-based DIR framework that combines the strengths of learning-based and iterative methods to achieve a better balance between registration accuracy and runtime. VoxelOpt uses displacement entropy from local cost volumes to measure displacement signal strength at each voxel, which differs from earlier approaches in three key aspects. First, it introduces voxel-wise adaptive message passing, where voxels with lower entropy receives less influence from their neighbors. Second, it employs a multi-level image pyramid with 27-neighbor cost volumes at each level, avoiding exponential complexity growth. Third, it replaces hand-crafted features or contrastive learning with a pretrained founda-tional segmentation model for feature extraction. In abdominal CT registration, these changes allow VoxelOpt to outperform leading iterative in both efficiency and accuracy, while matching state-of-the-art learning-based methods trained with label supervision.


Evolutionary Gait Reconfiguration in Damaged Legged Robots

arXiv.org Artificial Intelligence

To assess the algorithm's effectiveness, each best solution corresponding to a damage scenario is implemented on the damaged robot in the lab environment. We compare the predicted simulation result with real-world data obtained from the robot's IMU and the motion-tracker system. Robot orientation was observed using both IMU and motion-tracker data, while translational motion was evaluated exclusively using motion-tracker recordings due to the high noise typically associated with IMU-based linear acceleration measurements. The summary of the straight motion recovery performance (averaged over 10 best solutions) is depicted in Table II. In addition to that, Figures 4 and 5 (damage: Legs 1 & 6 missing), 7 and 8 (damage: Legs 3 & 4 missing), 10 and 11 (damage: Leg 1 missing), 13 and 14 (damage: Leg 4 missing), illustrate the motion of the main body's orientation and CoM position when a best solution is implemented on the damaged robot.


AYLA: Amplifying Gradient Sensitivity via Loss Transformation in Non-Convex Optimization

arXiv.org Artificial Intelligence

Stochastic Gradient Descent (SGD) and its variants, such as ADAM, are foundational to deep learning optimization, adjusting model parameters through fixed or adaptive learning rates based on loss function gradients. However, these methods often struggle to balance adaptability and efficiency in high-dimensional, non-convex settings. This paper introduces AYLA, a novel optimization framework that enhances training dynamics via loss function transformation. AYLA applies a tunable power-law transformation to the loss, preserving critical points while scaling loss values to amplify gradient sensitivity and accelerate convergence. Additionally, we propose an effective learning rate that dynamically adapts to the transformed loss, further improving optimization efficiency. Empirical evaluations on minimizing a synthetic non-convex polynomial, solving a non-convex curve-fitting task, and performing digit classification (MNIST) and image recognition (CIFAR-100) demonstrate that AYLA consistently outperforms SGD and ADAM in both convergence speed and training stability. By reshaping the loss landscape, AYLA provides a model-agnostic enhancement to existing optimization methods, offering a promising advancement in deep neural network training.


Robust OOD Graph Learning via Mean Constraints and Noise Reduction

arXiv.org Artificial Intelligence

Graph Out-of-Distribution (OOD) classification often suffers from sharp performance drops, particularly under category imbalance and structural noise. This work tackles two pressing challenges in this context: (1) the underperformance of minority classes due to skewed label distributions, and (2) their heightened sensitivity to structural noise in graph data. To address these problems, we propose two complementary solutions. First, Constrained Mean Optimization (CMO) improves minority class robustness by encouraging similarity-based instance aggregation under worst-case conditions. Second, the Neighbor-Aware Noise Reweighting (NNR) mechanism assigns dynamic weights to training samples based on local structural consistency, mitigating noise influence. We provide theoretical justification for our methods, and validate their effectiveness with extensive experiments on both synthetic and real-world datasets, showing significant improvements in Graph OOD generalization and classification accuracy. The code for our method is available at: https://anonymous.4open.science/r/CMO-NNR-2F30.


Fully lifted \emph{blirp} interpolation -- a large deviation view

arXiv.org Machine Learning

In [104] a powerful fully lifted (fl) probabilistic blirp interpolating mechanism was introduced. It arrived as a strong upgrade on partially lifted concepts from [100, 101] and the basic ones from [49, 84] (see also, e.g., [31, 32, 60, 106] for early considerations as well as [5, 64, 67, 101, 107] for a brief history, relevance, and development overview). While the range of applicability in a variety of scientific fields is rather wide, applications in random optimizations are of our prevalent interest. They became particularly fruitful over the last two decades (some of the most prominent examples include, compressed sensing, machine learning, and neural network statistical studies; see, e.g., [50, 72-75, 86-91, 108]). Characterizing typical behavior of their various features ranging from standard optimization metrics (objective values, optimal solutions, relations between optimizing variables) to associated algorithmic ones (accuracy, speed, convergence) became possible in large part due to a strong progress made in understanding and developing powerful comparison mechanisms. For example, many of the above performance metrics often exhibit the so-calledphase-transition (PT) phenomenon where they undergo an abrupt change as one moves from one region of system parameters to another.


The Gittins Index: A Design Principle for Decision-Making Under Uncertainty

arXiv.org Machine Learning

The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora's box model. However, despite the above examples and later extensions thereof, the space of problems that the Gittins index can solve perfectly optimally is limited, and its definition is rather subtle compared to those of other multi-armed bandit algorithms. As a result, the Gittins index is often regarded as being primarily a concept of theoretical importance, rather than a practical tool for solving decision-making problems. The aim of this tutorial is to demonstrate that the Gittins index can be fruitfully applied to practical problems. We start by giving an example-driven introduction to the Gittins index, then walk through several examples of problems it solves - some optimally, some suboptimally but still with excellent performance. Two practical highlights in the latter category are applying the Gittins index to Bayesian optimization, and applying the Gittins index to minimizing tail latency in queues.


Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls

arXiv.org Machine Learning

The pursuit of efficient and controllable high-quality content generation remains a central challenge in artificial intelligence-generated content (AIGC). While one-step generators, enabled by diffusion distillation techniques, offer excellent generation quality and computational efficiency, adapting them to new control conditions--such as structural constraints, semantic guidelines, or external inputs--poses a significant challenge. Conventional approaches often necessitate computationally expensive modifications to the base model and subsequent diffusion distillation. This paper introduces Noise Consistency Training (NCT), a novel and lightweight approach to directly integrate new control signals into pre-trained one-step generators without requiring access to original training images or retraining the base diffusion model. NCT operates by introducing an adapter module and employs a noise consistency loss in the noise space of the generator. This loss aligns the adapted model's generation behavior across noises that are conditionally dependent to varying degrees, implicitly guiding it to adhere to the new control. Theoretically, this training objective can be understood as minimizing the distributional distance between the adapted generator and the conditional distribution induced by the new conditions. NCT is modular, data-efficient, and easily deployable, relying only on the pre-trained one-step generator and a control signal model. Extensive experiments demonstrate that NCT achieves state-of-the-art controllable generation in a single forward pass, surpassing existing multi-step and distillation-based methods in both generation quality and computational efficiency. Code is available at https://github.com/Luo-Yihong/NCT


A large deviation view of \emph{stationarized} fully lifted blirp interpolation

arXiv.org Machine Learning

We consider \emph{bilinearly indexed random processes} (blirp) and study their interpolating comparative mechanisms. Generic introduction of the \emph{fully lifted} (fl) blirp interpolation in [105] was followed by a corresponding stationarization counterpart in [103]. A \emph{large deviation} upgrade of [105] introduced in companion paper [106] is complemented here with the corresponding one of [103]. Similarly to [106], the mechanism that we introduce extends the range of [103]'s applicability so that it encompasses random structures \emph{atypical} features. Among others these include the \emph{local entropies} (LE) which explain atypical solutions clusterings in hard random optimization problems believed to be directly responsible for the presumable existence of the so-called \emph{computational gaps}. Moreover (and similar to [105]), despite on occasion somewhat involved technical considerations, the final forms of the uncovered fundamental interpolating parameters relations are rather elegant and as such provide a valuable tool readily available for further use.


Conservative quantum offline model-based optimization

arXiv.org Machine Learning

Offline model-based optimization (MBO) refers to the task of optimizing a black-box objective function using only a fixed set of prior input-output data, without any active experimentation. Recent work has introduced quantum extremal learning (QEL), which leverages the expressive power of variational quantum circuits to learn accurate surrogate functions by training on a few data points. However, as widely studied in the classical machine learning literature, predictive models may incorrectly extrapolate objective values in unexplored regions, leading to the selection of overly optimistic solutions. In this paper, we propose integrating QEL with conservative objective models (COM) - a regularization technique aimed at ensuring cautious predictions on out-of-distribution inputs. The resulting hybrid algorithm, COM-QEL, builds on the expressive power of quantum neural networks while safeguarding generalization via conservative modeling. Empirical results on benchmark optimization tasks demonstrate that COM-QEL reliably finds solutions with higher true objective values compared to the original QEL, validating its superiority for offline design problems.