AITopics

2607.0101

Genre: Research Report (0.40)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Machine LearningJul-1-2026

On the Convergence of Self-Improving Online LLM Alignment

Wu, Xudong, Liu, Pangpang, Aggarwal, Vaneet, Chen, Jiayu

Abstractitations, recent work explores online RLHF that iterates between generating on-policy responses and collecting preferences [Lee et al., 2024, Park et al., 2022]. Among online The Self-Improving Alignment (SAIL) algorithmapproaches, SAIL reduces a bilevel alignment formulation addresses distribution shift by reducing a bilevelto a computationally efficient single-level surrogate and formulation of the problem to an efficient, single-reports strong empirical gains [Ding et al., 2024]. Empirically, SAIL has demonstratedisting online pipelines are largely heuristic and do not anastrong performance on this task. However, a for-lytically control the distributional shift induced by iterative mal analysis of its convergence properties has beendata collection [Chakraborty et al., 2024, Shen et al., 2024], lacking. We identify a key theoretical challenge: which has been linked to suboptimal performance in practice the standard SAIL objective function is not guar- [Sharma et al., 2024]. To address this limita-A growing line of work argues that the coupling between tion, we propose a regularized objective, SAILreward learning and policy updates is fundamentally bilevel and should be modeled as such [Chakraborty et al., 2024].RevKL, which incorporates a reverse KullbackAs a follow-up, Ding et al. [2024] reduces the bilevel align-Leibler (KL) divergence penalty to improve the optimization landscape. Our central theoretical con-ment objective to a tractable single-level surrogate and retribution is to prove that this regularized objectiveports strong empirical gains, yet it lacks formal convergence satisfies the Polyak-Lojasiewicz (PL) conditionguarantees. Related theoretical analyses in bilevel/RLHFstyle problems exist [e.g., Yang et al., 2025, Chakrabortywithin a bounded parameter space. We establish et al., 2024, Gaur et al., 2025], yet they either focus onglobal convergence guarantees, achieving a nearlinear sample complexity.

large language model, machine learning, natural language, (20 more...)

2606.31524

Country: North America > United States (0.46)

Genre: Research Report (0.83)

Industry:

Health & Medicine (0.46)
Law Enforcement & Public Safety (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-30-2026

Liquidity-Based Audit of Algorithmic Trading Strategies

Aldridge, Irene

Market microstructure has long classified trading activity by its informational role: an informed trader demands liquidity by trading in the direction of private information, while a market maker supplies liquidity by absorbing that order flow and earning the spread in compensation Kyle (1985); Glosten and Milgrom (1985). This classification is typically recovered from the data the classifier requires: signed order flow, quote revisions, or the sequential-trade structure of the market. The classification is harder to apply to an algorithmic strategy whose internal logic is unobservable. However, the signals or optimization problems generating the decisions of a typical quantitative fund are not visible, even though the trades and reported positions may be available. This paper shows that the liquidity role of such a strategy (consumer or provider) can be recovered from realized portfolio costs and trade decisions alone, without observing quotes, order flow, or any other microstructure-specific signal.

artificial intelligence, correction, machine learning, (19 more...)

2606.29018

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Kim, Kunwoong, Kim, Dongha

What Drives the Inlier-Memorization Effect? A Theory of Outlier Detection via Early Training Dynamics

arXiv.org Machine LearningJun-30-2026

Outlier detection (OD) aims to identify anomalous instances by learning the underlying structure of normal data (inliers), and is particularly challenging in fully unsupervised settings where no information about anomalies is available during training. Recent advances have leveraged the inlier-memorization (IM) effect, a phenomenon in which deep models memorize inlier patterns earlier than those of outliers, as a powerful signal for distinguishing outliers. However, despite its empirical success, the theoretical understanding of the IM effect remains limited. In this work, we present a theoretical study of the IM effect. Focusing on a simple autoencoder, we show that, under mild assumptions, the model can successfully memorize inliers while failing to memorize outliers during certain stages of early training. In particular, we characterize not only the emergence of the IM effect, but also its strength and persistence, and analyze how these properties depend on the data distribution and parameter initialization. In addition, building on these insights, we derive simple yet practical guidelines for enhancing the IM effect, including data preprocessing and parameter initialization schemes, achieving state-of-the-art performance on the ADBench datasets. Our findings provide a theoretical foundation for the IM effect and offer actionable directions for improving IM-based outlier detection methods.

artificial intelligence, data mining, machine learning, (19 more...)

2606.29791

Country:

Europe (0.92)
North America > United States > California (0.27)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Liu, Shixiang, Yang, Hanming

Adversarial Contamination Meets Hard Thresholding: An Iterative Algorithm with Signal Adaptivity and Minimax Optimality

arXiv.org Machine LearningJun-29-2026

Pervasive data contamination -- stemming from measurement errors, outliers, or adversarial corruption -- has motivated the development of robust statistical methods. In this context, we propose a two-stage Adversarial Contamination-resistant Iterative Hard Thresholding (AC-IHT) algorithm for high-dimensional regression with contamination. Our nonconvex algorithm achieves minimax near-optimal (up to logarithmic terms) estimation by iteratively updating the coefficient vector and the contamination vector with different thresholding scales. We further demonstrate that our AC-IHT estimator is signal-adaptive: under proper signal conditions, it adaptively attains a sharper estimation rate and more accurate support recovery. Moreover, it enjoys the strong oracle property, laying a theoretical foundation for asymptotic inference. Numerical experiments confirm its superior finite-sample performance. Finally, we discuss theoretical extensions of the proposed procedure to generalized linear models and to heavy-tailed noise settings.

artificial intelligence, data mining, machine learning, (19 more...)

2606.27685

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

Kim, Jung-hun, Grebennikova, Anna, Perchet, Vianney

Asymptotically Optimal Learning for Parametric Prophet Inequalities

arXiv.org Machine LearningJun-26-2026

We study learning in prophet inequalities with i.i.d. rewards drawn from an exponential-type parametric family with an unknown parameter $θ$, a class that includes exponential, Pareto, and bounded-support power-family distributions. We first characterize the optimal full-information asymptotic competitive ratio for this family. In the unbounded-support case, the limit is $ {\left(θ/({θ-c_+})\right)^{c_+/θ}}/ {Γ(1-c_+/θ)},$ while in the bounded-support case, the limit is $1$. We then propose a confidence-based dynamic-programming policy for online learning. By exploiting the explicit parametric structure, the policy achieves the same optimal asymptotic competitive ratio using only online observations, without external offline samples. We further derive distribution-specific convergence rates for canonical examples. Finally, numerical experiments on synthetic instances illustrate the performance of our algorithm.

artificial intelligence, logn, machine learning, (18 more...)

2606.26893

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Neural Information Processing SystemsJun-23-2026, 12:23:08 GMT

Learning Counterfactual Outcomes Under Rank Preservation

Counterfactual inference aims to estimate the counterfactual outcome at the individual level given knowledge of an observed treatment and the factual outcome, with broad applications in fields such as epidemiology, econometrics, and management science. Previous methods rely on a known structural causal model (SCM) or assume the homogeneity of the exogenous variable and strict monotonicity between the outcome and exogenous variable. In this paper, we propose a principled approach for identifying and estimating the counterfactual outcome. We first introduce a simple and intuitive rank preservation assumption to identify the counterfactual outcome without relying on a known structural causal model. Building on this, we propose a novel ideal loss for theoretically unbiased learning of the counterfactual outcome and further develop a kernel-based estimator for its empirical estimation. Our theoretical analysis shows that the rank preservation assumption is not stronger than the homogeneity and strict monotonicity assumptions, and shows that the proposed ideal loss is convex, and the proposed estimator is unbiased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed method.

artificial intelligence, machine learning, neural information processing system, (14 more...)

Country: North America > United States (0.92)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Neural Information Processing SystemsJun-23-2026, 04:47:35 GMT

ffab50f3cad7cb5733ca324e5be20976-Paper-Conference.pdf

The capacity of deep learning models is often large enough to both learn the underlying statistical signal and overfit to noise in the training set. This noise memorization can be harmful especially for data with a low signal-to-noise ratio (SNR), leading to poor generalization. Inspired by prior observations that label noise provides implicit regularization that improves generalization, in this work, we investigate whether introducing label noise to the gradient updates can enhance the test performance of neural network (NN) in the low SNR regime. Specifically, we consider training a two-layer NN with a simple label noise gradient descent (GD) algorithm, in an idealized signal-noise data setting. We prove that adding label noise during training suppresses noise memorization, preventing it from dominating the learning process; consequently, label noise GD enjoys rapid signal growth while the overfitting remains controlled, thereby achieving good generalization despite the low SNR. In contrast, we also show that NN trained with standard GD tends to overfit to noise in the same low SNR setting and establish a non-vanishing lower bound on its test error, thus demonstrating the benefit of introducing label noise in gradient-based training.

artificial intelligence, deep learning, machine learning, (15 more...)

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Neural Information Processing SystemsJun-23-2026, 04:10:17 GMT

Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens

Concept-based Models are neural networks that learn a concept extractor to map inputs to high-level concepts and an inference layer to translate these into predictions. Ensuring these modules produce interpretable concepts and behave reliably in out-of-distribution is crucial, yet the conditions for achieving this remain unclear. We study this problem by establishing a novel connection between Concept-based Models and reasoning shortcuts (RSs), a common issue where models achieve high accuracy by learning low-quality concepts, even when the inference layer is fixed and provided upfront. Specifically, we extend RSs to the more complex setting of Concept-based Models and derive theoretical conditions for identifying both the concepts and the inference layer. Our empirical results highlight the impact of RSs and show that existing methods, even combined with multiple natural mitigation strategies, often fail to meet these conditions in practice.

artificial intelligence, machine learning, natural language, (17 more...)

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsJun-23-2026, 03:51:36 GMT

Exploring Landscapes for Better Minima along Valleys

However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the complex geometric properties of the loss landscape, it is difficult to guarantee that such a point is the lowest or provides the best generalization. To address this, we propose an adaptor "E" for gradient-based optimizers. The adapted optimizer tends to continue exploring along landscape 5.0 valleys (areas with low and nearly identical losses) in order to search for potentially1.0

large language model, machine learning, natural language, (22 more...)

Country: Asia (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)