AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Sketching Low-Rank Plus Diagonal Matrices

Fernandez, Andres, Dangel, Felix, Hennig, Philipp, Schneider, Frank

arXiv.org Artificial IntelligenceOct-3-2025

Many relevant machine learning and scientific computing tasks involve high-dimensional linear operators accessible only via costly matrix-vector products. In this context, recent advances in sketched methods have enabled the construction of *either* low-rank *or* diagonal approximations from few matrix-vector products. This provides great speedup and scalability, but approximation errors arise due to the assumed simpler structure. This work introduces SKETCHLORD, a method that simultaneously estimates both low-rank *and* diagonal components, targeting the broader class of Low-Rank *plus* Diagonal (LoRD) linear operators. We demonstrate theoretically and empirically that this joint estimation is superior also to any sequential variant (diagonal-then-low-rank or low-rank-then-diagonal). Then, we cast SKETCHLORD as a convex optimization problem, leading to a scalable algorithm. Comprehensive experiments on synthetic (approximate) LoRD matrices confirm SKETCHLORD's performance in accurately recovering these structures. This positions it as a valuable addition to the structured approximation toolkit, particularly when high-fidelity approximations are desired for large-scale operators, such as the deep learning Hessian.

artificial intelligence, machine learning, recovery, (17 more...)

arXiv.org Artificial Intelligence

2509.23587

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models

Zhou, Chenyu, Xu, Tianyi, Lin, Jianghao, Ge, Dongdong

arXiv.org Artificial IntelligenceOct-3-2025

Large Language Models (LLMs) have shown promising capabilities for solving Operations Research (OR) problems. While reinforcement learning serves as a powerful paradigm for LLM training on OR problems, existing works generally face two key limitations. First, outcome reward suffers from the credit assignment problem, where correct final answers can reinforce flawed reasoning. Second, conventional discriminative process supervision is myopic, failing to evaluate the interdependent steps of OR modeling holistically. To this end, we introduce StepORLM, a novel self-evolving framework with generative process supervision. At its core, StepORLM features a co-evolutionary loop where a policy model and a generative process reward model (GenPRM) iteratively improve on each other. This loop is driven by a dual-feedback mechanism: definitive, outcome-based verification from an external solver, and nuanced, holistic process evaluation from the GenPRM. The combined signal is used to align the policy via Weighted Direct Preference Optimization (W-DPO) and simultaneously refine the GenPRM. Our resulting 8B-parameter StepORLM establishes a new state-of-the-art across six benchmarks, significantly outperforming vastly larger generalist models, agentic methods, and specialized baselines. Moreover, the co-evolved GenPRM is able to act as a powerful and universally applicable process verifier, substantially boosting the inference scaling performance of both our own model and other existing LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.22558

Country: Europe > Austria (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

Xu, Ping, Ning, Zhiyuan, Li, Pengjiang, Liu, Wenhao, Wang, Pengyang, Cui, Jiaxu, Zhou, Yuanchun, Wang, Pengfei

arXiv.org Artificial IntelligenceOct-3-2025

Single-cell RNA sequencing (scRNA-seq) reveals cell heterogeneity, with cell clustering playing a key role in identifying cell types and marker genes. Recent advances, especially graph neural networks (GNNs)-based methods, have significantly improved clustering performance. However, the analysis of scRNA-seq data remains challenging due to noise, sparsity, and high dimensionality. Compounding these challenges, GNNs often suffer from over-smoothing, limiting their ability to capture complex biological information. In response, we propose scSiameseClu, a novel Siamese Clustering framework for interpreting single-cell RNA-seq data, comprising of 3 key steps: (1) Dual Augmentation Module, which applies biologically informed perturbations to the gene expression matrix and cell graph relationships to enhance representation robustness; (2) Siamese Fusion Module, which combines cross-correlation refinement and adaptive information fusion to capture complex cellular relationships while mitigating over-smoothing; and (3) Optimal Transport Clustering, which utilizes Sinkhorn distance to efficiently align cluster assignments with predefined proportions while maintaining balance. Comprehensive evaluations on seven real-world datasets demonstrate that scSiameseClu outperforms state-of-the-art methods in single-cell clustering, cell type annotation, and cell type classification, providing a powerful tool for scRNA-seq data interpretation.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.12626

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

5938b4d054136e5d59ada6ec9c295d7a-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 23:52:16 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Industry: Education (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 23:33:41 GMT

So, informally speaking, there is a duality between approximation factors of parallel rounding and multiplicative bounds of move-making. This duality is also invariant to applying the rounding on an interval of labels, or over a hierarchical clustering of labels.

algorithm, distance function, move-making algorithm, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 23:21:30 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors present a flexible variational inference method geared Gaussian process models with various likelihoods. Specifically, they derive an inference method for models where some fixed number of latent functions (with GP priors that depend on the input covariate) parameterize a likelihood for conditionally independent observations. They use variational inference to obtain the posterior over the latent functions, where the variational family of distributions is taken to be a mixture of Gaussians with some fixed number of components, and some covariance complexity (full, diagonal, block diagonal, etc). The paper derives the standard evidence lower bound (ELBO), which decomposes into a negative KL term and an expected log-likelihood term, and they note some convenient properties of these decompositions (re: optimizing covariance function parameters). This paper is well written, very clear, and technically sound.

experiment, gradient, variational inference, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: