Optimization
Latent Mode Decomposition
Morante, Manuel, Rehman, Naveed ur
--We introduce V ariational Latent Mode Decomposition (VLMD), a new algorithm for extracting oscillatory modes and associated connectivity structures from multivariate signals. VLMD addresses key limitations of existing Multivariate Mode Decomposition (MMD) techniques--including high computational cost, sensitivity to parameter choices, and weak modeling of interchannel dependencies. Its improved performance is driven by a novel underlying model, Latent Mode Decomposition (LMD), which blends sparse coding and mode decomposition to represent multichannel signals as sparse linear combinations of shared latent components composed of AM-FM oscillatory modes. This formulation enables VLMD to operate in a lower-dimensional latent space, enhancing robustness to noise, scalability, and interpretability. The algorithm solves a constrained variational optimization problem that jointly enforces reconstruction fidelity, sparsity, and frequency regularization. Experiments on synthetic and real-world datasets demonstrate that VLMD outperforms state-of-the-art MMD methods in accuracy, efficiency, and the interpretability of extracted structures. ONST A TIONARY signal decomposition techniques constitute an essential tool in Signal Processing for analyzing complex signals. Among them, Mode Decomposition (MD) has emerged as a fundamental framework, enabling the extraction of meaningful intrinsic oscillatory components [1]. Over the last couple of decades, a wide range of MD methods and algorithms have been developed and successfully applied across a wide range of interdisciplinary applications, such as biomedical signal analysis, structural health monitoring, and financial time-series analysis. This particular trend dates back to the late nineties with the introduction of Empirical Mode Decomposition (EMD) [1], which was followed by the development of other similar alternatives, such as Synchro-squeezed Transform (SST) [2], V ariational Mode Decomposition (VMD) [3] and Sliding-window Singular Spectrum Analysis (SSA) [4]. Originally designed for single-channel time series analysis, some of these methods were later extended to handle multivariate time series. Notable multivariate algorithms include Multivariate Empirical Mode Decomposition (MEMD) [5], multivariate nonlinear chirp mode decomposition [6], iterative filtering [7], as well as Multivariate V ariational Mode Decomposition (MVMD) [8].
Joker: Joint Optimization Framework for Lightweight Kernel Machines
Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that Joker saves up to 90\% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.
Adaptive Semantic Token Communication for Transformer-based Edge Inference
Devoto, Alessio, Pomponi, Jary, Merluzzi, Mattia, Di Lorenzo, Paolo, Scardapane, Simone
--This paper presents an adaptive framework for edge inference based on a dynamically configurable transformer-powered deep joint source channel coding (DJSCC) architecture. Motivated by a practical scenario where a resource constrained edge device engages in goal oriented semantic communication, such as selectively transmitting essential features for object detection to an edge server, our approach enables efficient task aware data transmission under varying bandwidth and channel conditions. T o achieve this, input data is tokenized into compact high level semantic representations, refined by a transformer, and transmitted over noisy wireless channels. As part of the DJSCC pipeline, we employ a semantic token selection mechanism that adaptively compresses informative features into a user specified number of tokens per sample. These tokens are then further compressed through the JSCC module, enabling a flexible token communication strategy that adjusts both the number of transmitted tokens and their embedding dimensions. We incorporate a resource allocation algorithm based on Lyapunov stochastic optimization to enhance robustness under dynamic network conditions, effectively balancing compression efficiency and task performance. Experimental results demonstrate that our system consistently outperforms existing baselines, highlighting its potential as a strong foundation for AI native semantic communication in edge intelligence applications. ECENTL Y, there was a surge of interest in semantic and goal-oriented communications, establishing this paradigm as crucial for the development of 6G AI-native communication networks [1], [2]. A. Devoto is with the Department of Computer, Control, and Management Engineering (DIAG) at Sapienza University of Rome, Italy. He is also a member of the Consorzio Nazionale Interuniversitario per le Telecomunicazioni (CNIT), Parma, Italy. Jary Pomponi, Paolo Di Lorenzo, and Simone Scardapane are with the Department of Information Engineering, Electronics, and Telecommunications (DIET) at Sapienza University of Rome, Italy, and they are also affiliated with CNIT. Mat-tia Merluzzi is with CEA-Leti, Universit e Grenoble Alpes, located in Grenoble, France. This work has been supported by the SNS JU project 6G-GOALS under the EU's Horizon program Grant Agreement No 101139232, by Sapienza grant RG123188B3EF6A80 (CENTS), and by European Union under the Italian National Recovery and Resilience Plan of NextGenerationEU, partnership on Telecommunications of the Future (PE00000001 - program REST ART).
Efficient compression of neural networks and datasets
Barth, Lukas Silvester, von Petersenn, Paulo
We compare, improve, and contribute methods that substantially decrease the number of parameters of neural networks while maintaining high test accuracy. When applying our methods to minimize description length, we obtain very effective data compression algorithms. In particular, we develop a probabilistic reformulation of $\ell_0$ regularized optimization for nonlinear models that does not require Monte-Carlo sampling and thus improves upon previous methods. We also improve upon methods involving smooth approximations to the $\ell_0$ norm, and investigate layerwise methods. We compare the methods on different architectures and datasets, including convolutional networks trained on image datasets and transformers trained on parts of Wikipedia. We also created a synthetic teacher-student setup to investigate compression in a controlled continuous setting. Finally, we conceptually relate compression algorithms to Solomonoff's theory of inductive inference and empirically verify the prediction that regularized models can exhibit more sample-efficient convergence.
Corporate Needs You to Find the Difference: Revisiting Submodular and Supermodular Ratio Optimization Problems
Harb, Elfarouk, Yassin, Yousef, Chekuri, Chandra
We study the problem of minimizing or maximizing the average value $ f(S)/|S| $ of a submodular or supermodular set function $ f: 2^V \to \mathbb{R} $ over non-empty subsets $ S \subseteq V $. This generalizes classical problems such as Densest Subgraph (DSG), Densest Supermodular Set (DSS), and Submodular Function Minimization (SFM). Motivated by recent applications, we introduce two broad formulations: Unrestricted Sparsest Submodular Set (USSS) and Unrestricted Densest Supermodular Set (UDSS), which allow for negative and non-monotone functions. We show that DSS, SFM, USSS, UDSS, and the Minimum Norm Point (MNP) problem are equivalent under strongly polynomial-time reductions, enabling algorithmic crossover. In particular, viewing these through the lens of the MNP in the base polyhedron, we connect Fujishige's theory with dense decomposition, and show that both Fujishige-Wolfe's algorithm and the heuristic \textsc{SuperGreedy++} act as universal solvers for all these problems, including sub-modular function minimization. Theoretically, we explain why \textsc{SuperGreedy++} is effective beyond DSS, including for tasks like submodular minimization and minimum $ s $-$ t $ cut. Empirically, we test several solvers, including the Fujishige-Wolfe algorithm on over 400 experiments across seven problem types and large-scale real/synthetic datasets. Surprisingly, general-purpose convex and flow-based methods outperform task-specific baselines, demonstrating that with the right framing, general optimization techniques can be both scalable and state-of-the-art for submodular and supermodular ratio problems.
Designing an efficient and equitable humanitarian supply chain dynamically via reinforcement learning
Specifically, it is a policy gradient method, often used for deep learning when the policy network is very large. The predecessor to PPO, Trust Region Policy Optimization (TRPO), was published in 2015 by Schulman et al . It addressed the instability issue of another algorithm, the Deep Q - Network (DQN).
SEvoBench : A C++ Framework For Evolutionary Single-Objective Optimization Benchmarking
Yang, Yongkang, Zhao, Jian, Yang, Tengfei
We present SEvoBench, a modern C++ framework for evolutionary computation (EC), specifically designed to systematically benchmark evolutionary single-objective optimization algorithms. The framework features modular implementations of Particle Swarm Optimization (PSO) and Differential Evolution (DE) algorithms, organized around three core components: (1) algorithm construction with reusable modules, (2) efficient benchmark problem suites, and (3) parallel experimental analysis. Experimental evaluations demonstrate the framework's superior performance in benchmark testing and algorithm comparison. Case studies further validate its capabilities in algorithm hybridization and parameter analysis. Compared to existing frameworks, SEvoBench demonstrates three key advantages: (i) highly efficient and reusable modular implementations of PSO and DE algorithms, (ii) accelerated benchmarking through parallel execution, and (iii) enhanced computational efficiency via SIMD (Single Instruction Multiple Data) vectorization for large-scale problems.
Dual Ascent Diffusion for Inverse Problems
Kim, Minseo, Levy, Axel, Wetzstein, Gordon
Ill-posed inverse problems are fundamental in many domains, ranging from astrophysics to medical imaging. Emerging diffusion models provide a powerful prior for solving these problems. Existing maximum-a-posteriori (MAP) or posterior sampling approaches, however, rely on different computational approximations, leading to inaccurate or suboptimal samples. To address this issue, we introduce a new approach to solving MAP problems with diffusion model priors using a dual ascent optimization framework. Our framework achieves better image quality as measured by various metrics for image restoration problems, it is more robust to high levels of measurement noise, it is faster, and it estimates solutions that represent the observations more faithfully than the state of the art.
Robust Invariant Representation Learning by Distribution Extrapolation
Yoshida, Kotaro, Slavakis, Konstantinos
Invariant risk minimization (IRM) aims to enable out-of-distribution (OOD) generalization in deep learning by learning invariant representations. As IRM poses an inherently challenging bi-level optimization problem, most existing approaches -- including IRMv1 -- adopt penalty-based single-level approximations. However, empirical studies consistently show that these methods often fail to outperform well-tuned empirical risk minimization (ERM), highlighting the need for more robust IRM implementations. This work theoretically identifies a key limitation common to many IRM variants: their penalty terms are highly sensitive to limited environment diversity and over-parameterization, resulting in performance degradation. To address this issue, a novel extrapolation-based framework is proposed that enhances environmental diversity by augmenting the IRM penalty through synthetic distributional shifts. Extensive experiments -- ranging from synthetic setups to realistic, over-parameterized scenarios -- demonstrate that the proposed method consistently outperforms state-of-the-art IRM variants, validating its effectiveness and robustness.
Tuning for Trustworthiness -- Balancing Performance and Explanation Consistency in Neural Network Optimization
Hinterleitner, Alexander, Bartz-Beielstein, Thomas
Despite the growing interest in Explainable Artificial Intelligence (XAI), explainability is rarely considered during hyperparameter tuning or neural architecture optimization, where the focus remains primarily on minimizing predictive loss. In this work, we introduce the novel concept of XAI consistency, defined as the agreement among different feature attribution methods, and propose new metrics to quantify it. For the first time, we integrate XAI consistency directly into the hyperparameter tuning objective, creating a multi-objective optimization framework that balances predictive performance with explanation robustness. Implemented within the Sequential Parameter Optimization Toolbox (SPOT), our approach uses both weighted aggregation and desirability-based strategies to guide model selection. Through our proposed framework and supporting tools, we explore the impact of incorporating XAI consistency into the optimization process. This enables us to characterize distinct regions in the architecture configuration space: one region with poor performance and comparatively low interpretability, another with strong predictive performance but weak interpretability due to low \gls{xai} consistency, and a trade-off region that balances both objectives by offering high interpretability alongside competitive performance. Beyond introducing this novel approach, our research provides a foundation for future investigations into whether models from the trade-off zone-balancing performance loss and XAI consistency-exhibit greater robustness by avoiding overfitting to training performance, thereby leading to more reliable predictions on out-of-distribution data.