AITopics

2508.0541

Country:

Europe (0.67)
Asia > Japan > Honshū (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Bendahi, Abderrahim, Doerr, Benjamin, Fradin, Adrien, Lutzeyer, Johannes F.

Speeding Up Hyper-Heuristics With Markov-Chain Operator Selection and the Only-Worsening Acceptance Operator

arXiv.org Artificial IntelligenceSep-3-2025

The move-acceptance hyper-heuristic was recently shown to be able to leave local optima with astonishing efficiency (Lissovoi et al., Artificial Intelligence (2023)). In this work, we propose two modifications to this algorithm that demonstrate impressive performances on a large class of benchmarks including the classic Cliff$_d$ and Jump$_m$ function classes. (i) Instead of randomly choosing between the only-improving and any-move acceptance operator, we take this choice via a simple two-state Markov chain. This modification alone reduces the runtime on Jump$_m$ functions with gap parameter $m$ from $Ω(n^{2m-1})$ to $O(n^{m+1})$. (ii) We then replace the all-moves acceptance operator with the operator that only accepts worsenings. Such a, counter-intuitive, operator has not been used before in the literature. However, our proofs show that our only-worsening operator can greatly help in leaving local optima, reducing, e.g., the runtime on Jump functions to $O(n^3 \log n)$ independent of the gap size. In general, we prove a remarkably good runtime of $O(n^{k+1} \log n)$ for our Markov move-acceptance hyper-heuristic on all members of a new benchmark class SEQOPT$_k$, which contains a large number of functions having $k$ successive local optima, and which contains the commonly studied Jump$_m$ and Cliff$_d$ functions for $k=2$.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

2506.01107

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning

Li, Wei, Yuan, Hangjie, Zhao, Zixiang, Zhu, Yifan, Lu, Aojun, Feng, Tao, Sun, Yanan

Balancing sensitivity to new tasks and stability for retaining past knowledge is crucial in continual learning (CL). Recently, sharpness-aware minimization has proven effective in transfer learning and has also been adopted in continual learning (CL) to improve memory retention and learning efficiency. However, relying on zeroth-order sharpness alone may favor sharper minima over flatter ones in certain settings, leading to less robust and potentially suboptimal solutions. In this paper, we propose \textbf{C}ontinual \textbf{Flat}ness (\textbf{C-Flat}), a method that promotes flatter loss landscapes tailored for CL. C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline. Besides, we present a general framework that integrates C-Flat into all major CL paradigms and conduct comprehensive comparisons with loss-minima optimizers and flat-minima-based CL methods. Our results show that C-Flat consistently improves performance across a wide range of settings. In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion, significantly reducing the update cost required by C-Flat. Extensive experiments across multiple CL methods, datasets, and scenarios demonstrate the effectiveness and efficiency of our proposed approaches. Code is available at https://github.com/WanNaa/C-Flat.

artificial intelligence, machine learning, optimization problem, (16 more...)

2508.1886

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.86)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.34)

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

Xing, Jiazheng, Ci, Hai, Xu, Hongbin, Yuan, Hangjie, Liu, Yong, Shou, Mike Zheng

However, current diffusion watermarking methods face significant limitations: zero-bit watermarking systems lack the capacity for large-scale user tracking, while multi-bit methods are highly sensitive to certain image transformations or generative attacks, resulting in a lack of comprehensive robustness. In this paper, we propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process. OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations, with tailored regularization terms to preserve image quality and ensure imperceptibility. To address the challenge of memory consumption growing linearly with the number of denoising steps during optimization, OptMark incorporates adjoint gradient methods, reducing memory usage from O ( N) to O (1). Experimental results demonstrate that Opt-Mark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.

artificial intelligence, machine learning, watermark, (15 more...)

2508.21727

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Integrating Large Language Models with Network Optimization for Interactive and Explainable Supply Chain Planning: A Real-World Case Study

Venkatachalam, Saravanan

This paper presents an integrated framework that combines traditional network optimization models with large language models (LLMs) to deliver interactive, explainable, and role-aware decision support for supply chain planning. The proposed system bridges the gap between complex operations research outputs and business stakeholder understanding by generating natural language summaries, contextual visualizations, and tailored key performance indicators (KPIs). The core optimization model addresses tactical inventory redistribution across a network of distribution centers for multi-period and multi-item, using a mixed-integer formulation. The technical architecture incorporates AI agents, RESTful APIs, and a dynamic user interface to support real-time interaction, configuration updates, and simulation-based insights. A case study demonstrates how the system improves planning outcomes by preventing stockouts, reducing costs, and maintaining service levels. Future extensions include integrating private LLMs, transfer learning, reinforcement learning, and Bayesian neural networks to enhance explainability, adaptability, and real-time decision-making.

large language model, machine learning, natural language, (18 more...)

2508.21622

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

PMODE: Theoretically Grounded and Modular Mixture Modeling

Vandermeulen, Robert A.

We introduce PMODE (Partitioned Mixture Of Density Estimators), a general and modular framework for mixture modeling with both parametric and nonparametric components. PMODE builds mixtures by partitioning the data and fitting separate estimators to each subset. It attains near-optimal rates for this estimator class and remains valid even when the mixture components come from different distribution families. As an application, we develop MV-PMODE, which scales a previously theoretical approach to high-dimensional density estimation to settings with thousands of dimensions. Despite its simplicity, it performs competitively against deep baselines on CIFAR-10 anomaly detection.

artificial intelligence, data mining, machine learning, (17 more...)

2508.21396

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

FUTURE: Flexible Unlearning for Tree Ensemble

Chen, Ziheng, Huang, Jin, Cheng, Jiali, Guo, Yuchan, Wang, Mengjie, Morishetti, Lalitesh, Nag, Kaushiki, Amiri, Hadi

Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains, including bioinformatics, finance, and medical diagnosis. With increasing emphasis on data privacy and the \textit{right to be forgotten}, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information. However, existing methods are often tailored to a particular model or rely on the discrete tree structure, making them difficult to generalize to complex ensembles and inefficient for large-scale datasets. To address these limitations, we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically, we formulate the problem of forgetting samples as a gradient-based optimization task. In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic model approximations within the optimization framework. This enables end-to-end unlearning in an effective and efficient manner. Extensive experiments on real-world datasets show that FUTURE yields significant and successful unlearning performance.

artificial intelligence, machine learning, natural language, (18 more...)

2508.21181

Country: North America > United States > Massachusetts > Middlesex County > Lowell (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Rubio, Natalia L., Darve, Eric F., Marsden, Alison L.

Data-Driven Bifurcation Handling in Physics-Based Reduced-Order Vascular Hemodynamic Models

Three-dimensional (3D) finite-element simulations of cardiovascular flows provide high-fidelity predictions to support cardiovascular medicine, but their high computational cost limits clinical practicality. Reduced-order models (ROMs) offer computationally efficient alternatives but suffer reduced accuracy, particularly at vessel bifurcations where complex flow physics are inadequately captured by standard Poiseuille flow assumptions. We present an enhanced numerical framework that integrates machine learning-predicted bifurcation coefficients into zero-dimensional (0D) hemodynamic ROMs to improve accuracy while maintaining computational efficiency. We develop a resistor-resistor-inductor (RRI) model that uses neural networks to predict pressure-flow relationships from bifurcation geometry, incorporating linear and quadratic resistances along with inductive effects. The method employs non-dimensionalization to reduce training data requirements and apriori flow split prediction for improved bifurcation characterization. We incorporate the RRI model into a 0D model using an optimization-based solution strategy. We validate the approach in isolated bifurcations and vascular trees, across Reynolds numbers from 0 to 5,500, defining ROM accuracy by comparison to 3D finite element simulation. Results demonstrate substantial accuracy improvements: averaged across all trees and Reynolds numbers, the RRI method reduces inlet pressure errors from 54 mmHg (45%) for standard 0D models to 25 mmHg (17%), while a simplified resistor-inductor (RI) variant achieves 31 mmHg (26%) error. The enhanced 0D models show particular effectiveness at high Reynolds numbers and in extensive vascular networks. This hybrid numerical approach enables accurate, real-time hemodynamic modeling for clinical decision support, uncertainty quantification, and digital twins in cardiovascular biomedical engineering.

artificial intelligence, bifurcation, machine learning, (20 more...)

2508.21165

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

arXiv.org Machine LearningAug-29-2025

Distributed optimization: designed for federated learning

Guo, Wenyou, Qu, Ting, Pan, Chunrong, Huang, George Q.

--Federated Learning (FL), as a distributed collaborative Machine Learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both centralized and decentralized FL settings. Furthermore, we develop multiple termination criteria and parameter update mechanisms to enhance computational efficiency, accompanied by rigorous theoretical guarantees of convergence. By generalizing the augmented Lagrangian relaxation through the incorporation of proximal relaxation and quadratic approximation, our framework systematically recovers a broad of classical unconstrained optimization methods, including proximal algorithm, classic gradient descent, and stochastic gradient descent, among others. Notably, the convergence properties of these methods can be naturally derived within the proposed theoretical framework. Numerical experiments demonstrate that the proposed algorithm exhibits strong performance in large-scale settings with significant statistical heterogeneity across clients. Such formulations, commonly referred to as consensus optimization problems, find widespread applications in interdisciplinary domains including distributed ML, collaborative sensing in sensor networks, and distributed parameter estimation [1]. This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 52375498, and in part by the Fundamental Research Funds for the Central Universities under Grant 21623111. Ting Qu is with Guangdong International Cooperation Base of Science and Technology for GBA Smart Logistics, Jinan University, Zhuhai 519070, China, also with School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China, and also with Institute of Physical Internet, Jinan University, Zhuhai 519070, China (e-mail: quting@jnu.edu.cn).

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Machine Learning

2508.08606

Country:

Asia > China > Guangdong Province > Zhuhai (0.64)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

arXiv.org Artificial IntelligenceAug-29-2025

CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference

Xu, Guanyu, Hao, Zhiwei, Shen, Li, Luo, Yong, Sun, Fuhui, Wang, Xiaoyan, Hu, Han, Wen, Yonggang

--The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge systems is a significant challenge due to the considerable computational demands and resource requirements of these models. Existing strategies typically either offload transformer computations to other devices or directly deploy compressed models on individual edge devices. T o tackle these challenges, we propose a collaborative inference system for general transformer models, termed CoFormer . The central idea behind CoFormer is to exploit the divisibility and integrability of transformer . An off-the-shelf large transformer can be decomposed into multiple smaller models for distributed inference, and their intermediate results are aggregated to generate the final output. We formulate an optimization problem to minimize both inference latency and accuracy degradation under heterogeneous hardware constraints. DeBo algorithm is proposed to first solve the optimization problem to derive the decomposition policy, and then progressively calibrate decomposed models to restore performance. We demonstrate the capability to support a wide range of transformer models on heterogeneous edge devices, achieving up to 3.1 inference speedup with large transformer models. Notably, CoFormer enables the efficient inference of GPT2-XL with 1.6 billion parameters on edge devices, reducing memory requirements by 76.3%. CoFormer can also reduce energy consumption by approximately 40% while maintaining satisfactory inference performance. Guanyu Xu, Zhiwei Hao and Han Hu are with the School of Information and Electrionics, Beijing Institute of Technology, Beijing 100081, China. Li Shen is with the School of Cyber Science and Technology, Shen-zhen Campus of Sun Y at-sen University, Shenzhen 518107, China. Y ong Luo is with the School of Computer Science, National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan 430072, China. Fuhui Sun and Xiaoyan Wang are with Information Technology Service Center of People's Court, Beijing, 100745, China. Y onggang Wen is with the College of Computing and Data Science, Nanyang Technological University, Singapore 639798. CoFormer significantly outperforms other methods. Specifically, CoFormer accelerates inference speed by 3.1 compared to Swin-L [4] with only 1.7% accuracy sacrifice.

large language model, machine learning, natural language, (19 more...)

2508.20375

Country:

Asia > China > Beijing > Beijing (0.64)
Asia > China > Hubei Province > Wuhan (0.44)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.44)
Health & Medicine > Therapeutic Area > Immunology (0.44)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)