AITopics | rrm

Recursive partitioning methods provide computationally efficient surrogates for the Wasserstein distance, yet their statistical behavior and their resolution in the small-discrepancy regime remain insufficiently understood. We study Recursive Rank Matching (RRM) as a representative instance of this class under a population-anchored reference. In this setting, we establish consistency and an explicit convergence rate for the anchored empirical RRM under the quadratic cost. We then identify a dominant mismatch mechanism responsible for the loss of resolution in the small-discrepancy regime. Based on this analysis, we introduce Selective Recursive Rank Matching (SRRM), which suppresses the resulting dominant mismatches and yields a higher-fidelity practical surrogate for the Wasserstein distance at moderate additional computational cost.

artificial intelligence, machine learning, rrm, (19 more...)

arXiv.org Machine Learning

2603.18781

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Riemannian stochastic optimization methods avoid strict saddle points Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsFeb-12-2026, 10:57:13 GMT

Riemannian setting is a highly challenging affair.

artificial intelligence, machine learning, saddle point, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

5e809ba53f34d9170386ebfc8b60300f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 10:57:10 GMT

algorithm, manifold, saddle point, (14 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

5e809ba53f34d9170386ebfc8b60300f-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 18:47:25 GMT

artificial intelligence, machine learning, manifold, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Riemannian stochastic optimization methods avoid strict saddle points

Neural Information Processing SystemsOct-8-2025, 18:47:21 GMT

Informally, our main result may be stated as follows: Under any stochastic Riemannian Robbins-Monro method, the probability of converging to a strict saddle point (or a submanifold thereof) is zero.

artificial intelligence, machine learning, saddle point, (16 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Residual Reward Models for Preference-based Reinforcement Learning

Cao, Chenyang, Rogel-García, Miguel, Nabail, Mohamed, Wang, Xueqian, Rhinehart, Nicholas

arXiv.org Artificial IntelligenceJul-2-2025

Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from slow convergence speed since it requires training in a reward model. Prior work has proposed learning a reward model from demonstrations and fine-tuning it using preferences. However, when the model is a neural network, using different loss functions for pre-training and fine-tuning can pose challenges to reliable optimization. In this paper, we propose a method to effectively leverage prior knowledge with a Residual Reward Model (RRM). An RRM assumes that the true reward of the environment can be split into a sum of two parts: a prior reward and a learned reward. The prior reward is a term available before training, for example, a user's ``best guess'' reward function, or a reward function learned from inverse reinforcement learning (IRL), and the learned reward is trained with preferences. We introduce state-based and image-based versions of RRM and evaluate them on several tasks in the Meta-World environment suite. Experimental results show that our method substantially improves the performance of a common PbRL method. Our method achieves performance improvements for a variety of different types of prior rewards, including proxy rewards, a reward obtained from IRL, and even a negated version of the proxy reward. We also conduct experiments with a Franka Panda to show that our method leads to superior performance on a real robot. It significantly accelerates policy learning for different tasks, achieving success in fewer steps than the baseline. The videos are presented at https://sunlighted.github.io/RRM-web/.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2507.00611

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RRM: Robust Reward Model Training Mitigates Reward Hacking

Liu, Tianqi, Xiong, Wei, Ren, Jie, Chen, Lichang, Wu, Junru, Joshi, Rishabh, Gao, Yang, Shen, Jiaming, Qin, Zhen, Yu, Tianhe, Sohn, Daniel, Makarova, Anastasiia, Liu, Jeremiah, Liu, Yuan, Piot, Bilal, Ittycheriah, Abe, Kumar, Aviral, Saleh, Mohammad

arXiv.org Artificial IntelligenceSep-19-2024

Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on Reward-Bench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%. Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone in aligning large language models (LLMs) with human preferences to produce responses that are more helpful, honest, and harmless (Ouyang et al., 2022; Bai et al., 2022a). This approach involves training a reward model (RM) on human feedback, which then guides the LLM to generate high-quality responses through reinforcement learning.

artifact, arxiv preprint arxiv, reward model, (14 more...)

arXiv.org Artificial Intelligence

2409.13156

Country:

North America > United States > Virginia (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(3 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation

Chen, Louis L., Chern, Bobbie, Eckstrand, Eric, Mahapatra, Amogh, Royset, Johannes O.

arXiv.org Artificial IntelligenceMay-30-2024

Labeling errors in datasets are common, if not systematic, in practice. They naturally arise in a variety of contexts--human labeling, noisy labeling, and weak labeling (i.e., image classification), for example. This presents a persistent and pervasive stress on machine learning practice. In particular, neural network (NN) architectures can withstand minor amounts of dataset imperfection with traditional countermeasures such as regularization, data augmentation, and batch normalization. However, major dataset imperfections often prove insurmountable. We propose and study the implementation of Rockafellian Relaxation (RR), a new loss reweighting, architecture-independent methodology, for neural network training. Experiments indicate RR can enhance standard neural network methods to achieve robust performance across classification tasks in computer vision and natural language processing (sentiment analysis). We find that RR can mitigate the effects of dataset corruption due to both (heavy) labeling error and/or adversarial perturbation, demonstrating effectiveness across a variety of data domains and machine learning tasks.

architecture, dataset, rrm, (17 more...)

arXiv.org Artificial Intelligence

2405.20531

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Monterey County > Monterey (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Riemannian stochastic optimization methods avoid strict saddle points

Hsieh, Ya-Ping, Karimi, Mohammad Reza, Krause, Andreas, Mertikopoulos, Panayotis

arXiv.org Artificial IntelligenceNov-4-2023

Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.

algorithm, manifold, saddle point, (16 more...)

arXiv.org Artificial Intelligence

2311.02374

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Business Metric-Aware Forecasting for Inventory Management

Zhou, Helen, Arik, Sercan O., Wang, Jingtao

arXiv.org Artificial IntelligenceAug-24-2023

Time-series forecasts play a critical role in business planning. However, forecasters typically optimize objectives that are agnostic to downstream business goals and thus can produce forecasts misaligned with business preferences. In this work, we demonstrate that optimization of conventional forecasting metrics can often lead to sub-optimal downstream business performance. Focusing on the inventory management setting, we derive an efficient procedure for computing and optimizing proxies of common downstream business metrics in an end-to-end differentiable manner. We explore a wide range of plausible cost trade-off scenarios, and empirically demonstrate that end-to-end optimization often outperforms optimization of standard business-agnostic forecasting metrics (by up to 45.7% for a simple scaling model, and up to 54.0% for an LSTM encoder-decoder model). Finally, we discuss how our findings could benefit other business contexts.

artificial intelligence, forecasting, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.13118

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology: