Optimization
Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization
Zhao, Jie, Cheong, Kang Hao, Pedrycz, Witold
Graph-structured combinatorial challenges are inherently difficult due to their nonlinear and intricate nature, often rendering traditional computational methods ineffective or expensive. However, these challenges can be more naturally tackled by humans through visual representations that harness our innate ability for spatial reasoning. In this study, we propose transforming graphs into images to preserve their higher-order structural features accurately, revolutionizing the representation used in solving graph-structured combinatorial tasks. This approach allows machines to emulate human-like processing in addressing complex combinatorial challenges. By combining the innovative paradigm powered by multimodal large language models (MLLMs) with simple search techniques, we aim to develop a novel and effective framework for tackling such problems. Our investigation into MLLMs spanned a variety of graph-based tasks, from combinatorial problems like influence maximization to sequential decision-making in network dismantling, as well as addressing six fundamental graph-related issues. Our findings demonstrate that MLLMs exhibit exceptional spatial intelligence and a distinctive capability for handling these problems, significantly advancing the potential for machines to comprehend and analyze graph-structured data with a depth and intuition akin to human cognition. These results also imply that integrating MLLMs with simple optimization strategies could form a novel and efficient approach for navigating graph-structured combinatorial challenges without complex derivations, computationally demanding training and fine-tuning.
Highly Efficient Rotation-Invariant Spectral Embedding for Scalable Incomplete Multi-View Clustering
Wang, Xinxin, Zhang, Yongshan, Zhou, Yicong
Incomplete multi-view clustering presents significant challenges due to missing views. Although many existing graph-based methods aim to recover missing instances or complete similarity matrices with promising results, they still face several limitations: (1) Recovered data may be unsuitable for spectral clustering, as these methods often ignore guidance from spectral analysis; (2) Complex optimization processes require high computational burden, hindering scalability to large-scale problems; (3) Most methods do not address the rotational mismatch problem in spectral embeddings. To address these issues, we propose a highly efficient rotation-invariant spectral embedding (RISE) method for scalable incomplete multi-view clustering. RISE learns view-specific embeddings from incomplete bipartite graphs to capture the complementary information. Meanwhile, a complete consensus representation with second-order rotation-invariant property is recovered from these incomplete embeddings in a unified model. Moreover, we design a fast alternating optimization algorithm with linear complexity and promising convergence to solve the proposed formulation. Extensive experiments on multiple datasets demonstrate the effectiveness, scalability, and efficiency of RISE compared to the state-of-the-art methods.
Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints
Chen, Siyuan, Yu, Hanshen, Yagoobi, Jamal, Shao, Chenhui
Existing approaches to enforcing design constraints in Reinforcement Learning (RL) applications often rely on training-time penalties in the reward function or training/inference-time invalid action masking, but these methods either cannot be modified after training, or are limited in the types of constraints that can be implemented. To address this limitation, we propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in combinatorial optimization problems. This method respects flexible, inference-time constraints that support exclusion of invalid actions and forced inclusion of desired actions, and employs beam search to maximize sequence probability for more sensible constraint incorporation. RLCBS is extensible to RL-based planning and optimization problems that do not require real-time solution, and we apply the method to optimize process parameters for a novel modular testbed for paper drying. An RL agent is trained to minimize energy consumption across varying machine speed levels by generating optimal dryer module and air supply temperature configurations. Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time, while providing a 2.58-fold or higher speed improvement.
Test-time regression: a unifying framework for designing sequence models with associative memory
Wang, Ke Alexander, Shi, Jiaxin, Fox, Emily B.
Sequences provide a remarkably general way to represent and process information. This powerful abstraction has placed sequence modeling at the center of modern deep learning applications, inspiring numerous architectures from transformers to recurrent networks. While this fragmented development has yielded powerful models, it has left us without a unified framework to understand their fundamental similarities and explain their effectiveness. We present a unifying framework motivated by an empirical observation: effective sequence models must be able to perform associative recall. Our key insight is that memorizing input tokens through an associative memory is equivalent to performing regression at test-time. This regression-memory correspondence provides a framework for deriving sequence models that can perform associative recall, offering a systematic lens to understand seemingly ad-hoc architectural choices. We show numerous recent architectures -- including linear attention models, their gated variants, state-space models, online learners, and softmax attention -- emerge naturally as specific approaches to test-time regression. Each architecture corresponds to three design choices: the relative importance of each association, the regressor function class, and the optimization algorithm. This connection leads to new understanding: we provide theoretical justification for QKNorm in softmax attention, and we motivate higher-order generalizations of softmax attention. Beyond unification, our work unlocks decades of rich statistical tools that can guide future development of more powerful yet principled sequence models.
Reviews: Proximal Deep Structured Models
The paper makes a nice conceptual step of embedding proximal optimization algorithms in a deep neural network framework as an RNN. This conceptual move is new as far as I know. It is illuminating for me and I believe it will be found useful in further work. The immediate benefits are nice but not too significant: the new view enables a natural GPU implementation of proximal optimization methods, and bring small (close to insignificant) empirical result improvements. The week side of the paper is presentation quality: Some notation is used without definitions, central concepts (like the dual function) are used without definition, which makes the paper hard to read for people not very familiar with proximal methods.
Negative-Weight Single-Source Shortest Paths in Near-Linear Time
In the single-source shortest paths problem, the goal is to compute the shortest path tree from a designated source vertex in a weighted, directed graph. We present the first nearlinear-time algorithm for the problem that can also handle negative edge-weights; the runtime is O(m log8(n) log W). In contrast to all recent developments that rely on sophisticated continuous optimization methods and dynamic algorithms, our algorithm is simple: it requires only a simple graph decomposition and elementary combinatorial tools. In fact, ours is the first combinatorial algorithm for negative-weight single-source shortest paths to break through the classic O(m n log W) bound from more than three decades ago (Gabow and Tarjan SICOMP'89). We consider the single-source shortest paths (SSSP) problem with (possibly negative) integer weights. Given an m-edge n-vertex directed weighted graph G (V,E,w) with integral edge weight w(e) for every edge e E and a source vertex s V, the goal is to compute a shortest path from s. Two textbook algorithms for SSSP are Bellman-Ford and Dijkstra's algorithm.
Reviews: Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)
The submission considers algorithms for solving a specific class of optimization problems, namely min_{x in Omega_1} F(x), where F(x) max_{u in Omega_2} \langle Ax, u \rangle - phi(u) g(x). Here, g is convex, Omega_1 is closed and convex, Omega_2 is closed, convex, and bounded, and the set of optimal solutions Omega_* \subset Omega_1 is convex, compact, and non-empty. The submission also assumes a proximal mapping for g can be computed efficiently. The above framework is apparently general enough to capture a number of applications, including various natural regularized empirical loss minimization problems that arise in machine learning. Classic work of Nesterov combined a smooth approximation technique with accelerated proximal gradient descent to converge to a solution with epsilon of optimal in O(1/epsilon) iterations.
Reviews: SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques
The paper is overall clearly written, but one important aspect of the algorithm remains not sufficiently expounded: how precisely the subspace optimization is carried over. The paper only mentions in passing that it uses conjugate gradient (CG), but a number of points would deserve further clarification: a) is CG done over a *single* larger minibatch? And how precisely is this minibatch chosen. Which version/implementation do you use? The computational cost *and* additional memory requirement (as this can constitute a practical limitation for large nets) for the subspace optimization would need to be disclosed and made precise.
Reviews: Solving Marginal MAP Problems with NP Oracles and Parity Constraints
There are pros and cons in this article. On the one hand, the idea of approximating complex MMAP queries using a series of random constrained optimization problems is conceptually simple and opens the door to many potential improvements. Furthermore, I really appreciated the diversity of experimental domains used to evaluate the method. On the other hand, I have several concerns about the approximation bounds and the optimization oracle. For the unweighted case, we have a bound of 4 c and, to ensure that the denominator \alpha(c) of the complexity parameter T is greater or equal than 1, c must be greater or equal than 5, which roughly means an approximation constant about one thousand.
Reviews: Stochastic Structured Prediction under Bandit Feedback
Summary: This paper proposes a stochastic online learning method for the task of structured prediction. In this setting, the learner doest not get the correct structured output during training. Instead, it only gets bandit feedback from the labeler. The paper first proposes an online learning algorithm that learns model parameters via stochastic gradient descent; generalizes the learning method to pair-wise comparison of structured outputs; provides an optimization approach with Cross-Entropy Minimization; and theoretically analyzes the convergence property of the optimization approach. Pros: The paper proposes an online stochastic learning algorithm for minimizing the expected loss of structured predictions; gives a method of learning from pair-wise comparisons; and theoretical analyze the convergence rate.