AITopics | spö

Collaborating Authors

spö

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SPO: Sequential Monte Carlo Policy Optimisation

Neural Information Processing SystemsMar-17-2026, 19:56:19 GMT

Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they often result in a negative impact on performance. In this paper, we introduce SPO: Sequential Monte Carlo Policy Optimisation, a model-based reinforcement learning algorithm grounded within the Expectation Maximisation (EM) framework. We show that SPO provides robust policy improvement and efficient scaling properties. The sample-based search makes it directly applicable to both discrete and continuous action spaces without modifications. We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines across both continuous and discrete environments. Furthermore, the parallel nature of SPO's search enables effective utilisation of hardware accelerators, yielding favourable scaling laws.

machine learning, proceedings, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Generalization Bounds in the Predict-then-Optimize Framework

Othman El Balghiti, Adam Elmachtoub, Paul Grigas, Ambuj Tewari

Neural Information Processing SystemsFeb-13-2026, 10:56:49 GMT

Neural Information Processing Systems http://nips.cc/

generalization, spo loss, spö, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

RiskBoundsandCalibrationforaSmart Predict-then-OptimizeMethod

Neural Information Processing SystemsFeb-10-2026, 21:33:22 GMT

Moreover, since the SPO loss is not continuous nor convex in general [Elmachtoub and Grigas, 2021], which makesthe training ofaprediction model computationally intractable, Elmachtoub and Grigas [2021] introduced a novel convex surrogate loss, referred to as the SPO+ loss.

artificial intelligence, loss function, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

51311013e51adebc3c34d2cc591fefee-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 10:27:41 GMT

formulation, gradient, relaxation, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)

Add feedback

Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach

Wan, Beichen, Liu, Mo, Grigas, Paul, Shen, Zuo-Jun Max

arXiv.org Machine LearningFeb-6-2026

We consider the sequential experimental design problem in the predict-then-optimize paradigm. In this paradigm, the outputs of the prediction model are used as coefficient vectors in a downstream linear optimization problem. Traditional sequential experimental design aims to control the input variables (features) so that the improvement in prediction accuracy from each experimental outcome (label) is maximized. However, in the predict-then-optimize setting, performance is ultimately evaluated based on the decision loss induced by the downstream optimization, rather than by prediction error. This mismatch between prediction accuracy and decision loss renders traditional decision-blind designs inefficient. To address this issue, we propose a directional-based metric to quantify predictive uncertainty. This metric does not require solving an optimization oracle and is therefore computationally tractable. We show that the resulting sequential design criterion enjoys strong consistency and convergence guarantees. Under a broad class of distributions, we demonstrate that our directional uncertainty-based design attains an earlier stopping time than decision-blind designs. This advantage is further supported by real-world experiments on an LLM job allocation problem.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2602.0534

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.86)

Industry:

Media > Film (0.46)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

On the Tension Between Optimality and Adversarial Robustness in Policy Optimization

Li, Haoran, Lv, Jiayu, Han, Congying, Zhang, Zicheng, Li, Anqi, Liu, Yan, Guo, Tiande, Jiang, Nan

arXiv.org Artificial IntelligenceDec-2-2025

Achieving optimality and adversarial robustness in deep reinforcement learning has long been regarded as conflicting goals. Nonetheless, recent theoretical insights presented in CAR suggest a potential alignment, raising the important question of how to realize this in practice. This paper first identifies a key gap between theory and practice by comparing standard policy optimization (SPO) and adversarially robust policy optimization (ARPO). Although they share theoretical consistency, a fundamental tension between robustness and optimality arises in practical policy gradient methods. SPO tends toward convergence to vulnerable first-order stationary policies (FOSPs) with strong natural performance, whereas ARPO typically favors more robust FOSPs at the expense of reduced returns. Furthermore, we attribute this tradeoff to the reshaping effect of the strongest adversary in ARPO, which significantly complicates the global landscape by inducing deceptive sticky FOSPs. This improves robustness but makes navigation more challenging. To alleviate this, we develop the BARPO, a bilevel framework unifying SPO and ARPO by modulating adversary strength, thereby facilitating navigability while preserving global optima. Extensive empirical results demonstrate that BARPO consistently outperforms vanilla ARPO, providing a practical approach to reconcile theoretical and empirical performance.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2512.01228

Country: North America > United States > Illinois (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

SPO: Sequential Monte Carlo Policy Optimisation

Neural Information Processing SystemsOct-9-2025, 17:04:42 GMT

Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents.

algorithm, international conference, objective, (16 more...)

Neural Information Processing Systems

Country:

North America > Puerto Rico > San Juan > San Juan (0.04)
Europe > Slovenia > Upper Carniola > Municipality of Bled > Bled (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)
Overview (0.92)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

51311013e51adebc3c34d2cc591fefee-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 22:16:59 GMT

formulation, gradient, relaxation, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)

Add feedback

Single-stream Policy Optimization

Xu, Zhongwen, Ding, Zihan

arXiv.org Machine LearningSep-24-2025

We revisit policy-gradient optimization for Large Language Models (LLMs) from a single-stream perspective. Prevailing group-based methods like GRPO reduce variance with on-the-fly baselines but suffer from critical flaws: frequent degenerate groups erase learning signals, and synchronization barriers hinder scalability. We introduce Single-stream Policy Optimization (SPO), which eliminates these issues by design. SPO replaces per-group baselines with a persistent, KL-adaptive value tracker and normalizes advantages globally across the batch, providing a stable, low-variance learning signal for every sample. Being group-free, SPO enables higher throughput and scales effectively in long-horizon or tool-integrated settings where generation times vary. Furthermore, the persistent value tracker naturally enables an adaptive curriculum via prioritized sampling. Experiments using Qwen3-8B show that SPO converges more smoothly and attains higher accuracy than GRPO, while eliminating computation wasted on degenerate groups. Ablation studies confirm that SPO's gains stem from its principled approach to baseline estimation and advantage normalization, offering a more robust and efficient path for LLM reasoning. Across five hard math benchmarks with Qwen3 8B, SPO improves the average maj@32 by +3.4 percentage points (pp) over GRPO, driven by substantial absolute point gains on challenging datasets, including +7.3 pp on BRUMO 25, +4.4 pp on AIME 25, +3.3 pp on HMMT 25, and achieves consistent relative gain in pass@$k$ across the evaluated $k$ values. SPO's success challenges the prevailing trend of adding incidental complexity to RL algorithms, highlighting a path where fundamental principles, not architectural workarounds, drive the next wave of progress in LLM reasoning.

arxivpreprintarxiv, grpo, spö, (12 more...)

arXiv.org Machine Learning

2509.13232

Country: Asia > China > Jiangsu Province > Yancheng (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

spö

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SPO: Sequential Monte Carlo Policy Optimisation

Generalization Bounds in the Predict-then-Optimize Framework

b943325cc7b7422d2871b345bf9b067f-Supplemental.pdf

RiskBoundsandCalibrationforaSmart Predict-then-OptimizeMethod

51311013e51adebc3c34d2cc591fefee-AuthorFeedback.pdf

Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach

On the Tension Between Optimality and Adversarial Robustness in Policy Optimization

SPO: Sequential Monte Carlo Policy Optimisation

51311013e51adebc3c34d2cc591fefee-AuthorFeedback.pdf

Single-stream Policy Optimization