Goto

Collaborating Authors

 Optimization


GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering

arXiv.org Machine Learning

It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the pseudo labels and select the discriminative features simultaneously, we propose a new unsupervised feature selection method, named GlObal and Local information combined Feature Selection (GOLFS), for high dimensional clustering problems. The GOLFS algorithm combines both local geometric structure via manifold learning and global correlation structure of samples via regularized self-representation to select the discriminative features. The combination improves the accuracy of both feature selection and clustering by exploiting more comprehensive information. In addition, an iterative algorithm is proposed to solve the optimization problem and the convergency is proved. Simulations and two real data applications demonstrate the excellent finite-sample performance of GOLFS on both feature selection and clustering.


Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently

arXiv.org Artificial Intelligence

Tree-structured Parzen estimator (TPE) is a versatile hyperparameter optimization (HPO) method supported by popular HPO tools. Since these HPO tools have been developed in line with the trend of deep learning (DL), the problem setups often used in the DL domain have been discussed for TPE such as multi-objective optimization and multi-fidelity optimization. However, the practical applications of HPO are not limited to DL, and black-box combinatorial optimization is actively utilized in some domains, e.g., chemistry and biology. As combinatorial optimization has been an untouched, yet very important, topic in TPE, we propose an efficient combinatorial optimization algorithm for TPE. In this paper, we first generalize the categorical kernel with the numerical kernel in TPE, enabling us to introduce a distance structure to the categorical kernel. Then we discuss modifications for the newly developed kernel to handle a large combinatorial search space. These modifications reduce the time complexity of the kernel calculation with respect to the size of a combinatorial search space. In the experiments using synthetic problems, we verified that our proposed method identifies better solutions with fewer evaluations than the original TPE. Our algorithm is available in Optuna, an open-source framework for HPO.


Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) often suffer from performance degradation as the network depth increases. This paper addresses this issue by introducing initialization methods that enhance signal propagation (SP) within GNNs. We propose three key metrics for effective SP in GNNs: forward propagation, backward propagation, and graph embedding variation (GEV). While the first two metrics derive from classical SP theory, the third is specifically designed for GNNs. We theoretically demonstrate that a broad range of commonly used initialization methods for GNNs, which exhibit performance degradation with increasing depth, fail to control these three metrics simultaneously. To deal with this limitation, a direct exploitation of the SP analysis--searching for weight initialization variances that optimize the three metrics--is shown to significantly enhance the SP in deep GCNs. This approach is called Signal Propagation on Graph-guided Initialization (SPoGInit). Our experiments demonstrate that SPoGInit outperforms commonly used initialization methods on various tasks and architectures. Notably, SPoGInit enables performance improvements as GNNs deepen, which represents a significant advancement in addressing depth-related challenges and highlights the validity and effectiveness of the SP analysis framework.


A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent

arXiv.org Artificial Intelligence

We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.


Data-Driven Differential Evolution in Tire Industry Extrusion: Leveraging Surrogate Models

arXiv.org Artificial Intelligence

The optimization of industrial processes remains a critical challenge, particularly when no mathematical formulation of objective functions or constraints is available. This study addresses this issue by proposing a surrogate-based, data-driven methodology for optimizing complex real-world manufacturing systems using only historical process data. Machine learning models are employed to approximate system behavior and construct surrogate models, which are integrated into a tailored metaheuristic approach: Data-Driven Differential Evolution with Multi-Level Penalty Functions and Surrogate Models, an adapted version of Differential Evolution suited to the characteristics of the studied process. The methodology is applied to an extrusion process in the tire manufacturing industry, with the goal of optimizing initialization parameters to reduce waste and production time. Results show that the surrogate-based optimization approach outperforms historical best configurations, achieving a 65\% reduction in initialization and setup time, while also significantly minimizing material waste. These findings highlight the potential of combining data-driven modeling and metaheuristic optimization for industrial processes where explicit formulations are unavailable.


A Mathematical Optimization Approach to Multisphere Support Vector Data Description

arXiv.org Artificial Intelligence

We present a novel mathematical optimization framework for outlier detection in multimodal datasets, extending Support Vector Data Description approaches. We provide a primal formulation, in the shape of a Mixed Integer Second Order Cone model, that constructs Euclidean hyperspheres to identify anomalous observations. Building on this, we develop a dual model that enables the application of the kernel trick, thus allowing for the detection of outliers within complex, non-linear data structures. An extensive computational study demonstrates the effectiveness of our exact method, showing clear advantages over existing heuristic techniques in terms of accuracy and robustness.


A Distance Metric for Mixed Integer Programming Instances

arXiv.org Artificial Intelligence

Mixed-integer linear programming (MILP) is a powerful tool for addressing a wide range of real-world problems, but it lacks a clear structure for comparing instances. A reliable similarity metric could establish meaningful relationships between instances, enabling more effective evaluation of instance set heterogeneity and providing better guidance to solvers, particularly when machine learning is involved. Existing similarity metrics often lack precision in identifying instance classes or rely heavily on labeled data, which limits their applicability and generalization. To bridge this gap, this paper introduces the first mathematical distance metric for MILP instances, derived directly from their mathematical formulations. By discretizing right-hand sides, weights, and variables into classes, the proposed metric draws inspiration from the Earth mover's distance to quantify mismatches in weight-variable distributions for constraint comparisons. This approach naturally extends to enable instance-level comparisons. We evaluate both an exact and a greedy variant of our metric under various parameter settings, using the StrIPLIB dataset. Results show that all components of the metric contribute to class identification, and that the greedy version achieves accuracy nearly identical to the exact formulation while being nearly 200 times faster. Compared to state-of-the-art baselines--including feature-based, image-based, and neural network models--our unsupervised method consistently outperforms all non-learned approaches and rivals the performance of a supervised classifier on class and subclass grouping tasks.


Fast Non-Episodic Adaptive Tuning of Robot Controllers with Online Policy Optimization

arXiv.org Artificial Intelligence

We study online algorithms to tune the parameters of a robot controller in a setting where the dynamics, policy class, and optimality objective are all time-varying. The system follows a single trajectory without episodes or state resets, and the time-varying information is not known in advance. Focusing on nonlinear geometric quadrotor controllers as a test case, we propose a practical implementation of a single-trajectory model-based online policy optimization algorithm, M-GAPS,along with reparameterizations of the quadrotor state space and policy class to improve the optimization landscape. In hardware experiments,we compare to model-based and model-free baselines that impose artificial episodes. We show that M-GAPS finds near-optimal parameters more quickly, especially when the episode length is not favorable. We also show that M-GAPS rapidly adapts to heavy unmodeled wind and payload disturbances, and achieves similar strong improvement on a 1:6-scale Ackermann-steered car. Our results demonstrate the hardware practicality of this emerging class of online policy optimization that offers significantly more flexibility than classic adaptive control, while being more stable and data-efficient than model-free reinforcement learning.


Mixed Discrete and Continuous Planning using Shortest Walks in Graphs of Convex Sets

arXiv.org Artificial Intelligence

We study the Shortest-Walk Problem (SWP) in a Graph of Convex Sets (GCS). A GCS is a graph where each vertex is paired with a convex program, and each edge couples adjacent programs via additional costs and constraints. A walk in a GCS is a sequence of vertices connected by edges, where vertices may be repeated. The length of a walk is given by the cumulative optimal value of the corresponding convex programs. To solve the SWP in GCS, we first synthesize a piecewise-quadratic lower bound on the problem's cost-to-go function using semidefinite programming. Then we use this lower bound to guide an incremental-search algorithm that yields an approximate shortest walk. We show that the SWP in GCS is a natural language for many mixed discrete-continuous planning problems in robotics, unifying problems that typically require specialized solutions while delivering high performance and computational efficiency. We demonstrate this through experiments in collision-free motion planning, skill chaining, and optimal control of hybrid systems.


Convex Clustering

arXiv.org Machine Learning

This survey reviews a clustering method based on solving a convex optimization problem. Despite the plethora of existing clustering methods, convex clustering has several uncommon features that distinguish it from prior art. The optimization problem is free of spurious local minima, and its unique global minimizer is stable with respect to all its inputs, including the data, a tuning parameter, and weight hyperparameters. Its single tuning parameter controls the number of clusters and can be chosen using standard techniques from penalized regression. We give intuition into the behavior and theory for convex clustering as well as practical guidance. We highlight important algorithms and give insight into how their computational costs scale with the problem size. Finally, we highlight the breadth of its uses and flexibility to be combined and integrated with other inferential methods.