AITopics | edit distance

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Neural Information Processing SystemsJun-23-2026, 05:42:00 GMT

Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human preferences and values. While recent research has primarily focused on algorithmic advancements--such as reducing computational overhead or strengthening reward models to mitigate reward hacking--the critical role of prompt-data construction and its scalability has received comparatively less attention. In this paper, we address this gap by systematically exploring data-driven bottlenecks that currently hinder RLHF performance scaling, focusing specifically on the challenges posed by reward hacking and decreasing response diversity. To mitigate reward hacking, we introduce a hybrid reward system combining reasoning task verifiers (RTV) and a generative reward model (GenRM). This approach enables accurate assessment of responses against clearly defined ground-truth solutions. Additionally, in order to ensure response diversity and enhance learning effectiveness, we propose a novel prompt-selection method named Pre-PPO, explicitly identifying training prompts that are inherently challenging and thus less prone to reward hacking.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Water & Waste Management > Solid Waste Management (1.00)
Law (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The quest for the GRAph Level autoEncoder (GRALE)

Neural Information Processing SystemsJun-17-2026, 08:36:53 GMT

Although graph-based learning has attracted a lot of attention, graph representation learning is still a challenging task whose resolution may impact key application fields such as chemistry or biology. To this end, we introduce GRALE, a novel graph autoencoder that encodes and decodes graphs of varying sizes into a shared embedding space. GRALE is trained using an Optimal Transport-inspired loss that compares the original and reconstructed graphs and leverages a differentiable node matching module, which is trained jointly with the encoder and decoder. The proposed attention-based architecture relies on Evoformer, the core component of AlphaFold, which we extend to support both graph encoding and decoding. We show, in numerical experiments on simulated and molecular data, that GRALE enables a highly general form of pre-training, applicable to a wide range of downstream tasks, from classification and regression to more complex tasks such as graph interpolation, editing, matching, and prediction.1

artificial intelligence, graph, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

b81a352c156ca123c30c740f147a4496-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 01:06:09 GMT

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

3473decccb0509fb264818a7512a8b9b-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 10:46:37 GMT

artificial intelligence, dataset, natural language, (13 more...)

Neural Information Processing Systems

Country:

Asia > China (0.16)
Oceania > Australia (0.15)
North America > United States (0.15)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback

Reward Augmented Maximum Likelihood for Neural Structured Prediction

Mohammad Norouzi, Samy Bengio, zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans

Neural Information Processing SystemsMar-23-2026, 05:44:09 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
(2 more...)

Add feedback

Clustering Billions of Reads for DNA Data Storage

Neural Information Processing SystemsMar-17-2026, 16:47:36 GMT

Storing data in synthetic DNA offers the possibility of improving information density and durability by several orders of magnitude compared to current storage technologies. However, DNA data storage requires a computationally intensive process to retrieve the data. In particular, a crucial step in the data retrieval pipeline involves clustering billions of strings with respect to edit distance. Datasets in this domain have many notable properties, such as containing a very large number of small clusters that are well-separated in the edit distance metric space. In this regime, existing algorithms are unsuitable because of either their long running time or low accuracy. To address this issue, we present a novel distributed algorithm for approximately computing the underlying clusters.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss

Neural Information Processing SystemsFeb-17-2026, 17:13:38 GMT

We propose Any2Graph, a generic framework for end-to-end Supervised Graph Prediction (SGP) i.e. a deep learning model that predicts an entire graph for any kind of input.

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology: