AITopics | levine

Learning from Reward-Free Offline Data: ACase for Planning with Latent Dynamics Models

Neural Information Processing SystemsJun-23-2026, 06:57:20 GMT

A long-standing goal in AI is to develop agents capable of solving diverse tasks across a range of environments, including those never seen during training. Two dominant paradigms address this challenge: (i) reinforcement learning (RL), which learns policies via trial and error, and (ii) optimal control, which plans actions using a known or learned dynamics model. However, their comparative strengths in the offline setting--where agents must learn from reward-free trajectories--remain underexplored. In this work, we systematically evaluate RL and control-based methods on a suite of navigation tasks, using offline datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot methods. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning. We investigate how factors such as data diversity, trajectory quality, and environment variability influence the performance of these approaches. Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts and is more data-efficient, while achieving trajectory stitching performance comparable to leading model-free methods. Notably, planning with a latent dynamics model proves to be a strong approach for handling suboptimal offline data and adapting to diverse environments.

machine learning, reinforcement learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 12:06:04 GMT

Offline reinforcement learning (RL) aims to learn a policy from a fixed dataset without additional environment interaction. However, effective offline policy learning often requires a large and diverse dataset to mitigate epistemic uncertainty. Collecting such data demands substantial online interactions, which are costly or infeasible in many real-world domains. Therefore, improving policy learning from limited offline data--achieving high data efficiency--is critical for practical offline RL. In this paper, we propose a simple yet effective plug-and-play pretraining framework that initializes the feature representation of a Q-network to enhance data efficiency in offline RL. Our approach employs a shared Q-network architecture trained in two stages: pretraining a backbone feature extractor with a transition prediction head; training a Q-network--combining the backbone feature extractor and a Q-value head--with any offline RL objective. Extensive experiments on the D4RL, Robomimic, V-D4RL, and ExoRL benchmarks show that our method substantially improves both performance and data efficiency across diverse datasets and domains. Remarkably, with only 10% of the dataset, our approach outperforms standard offline RL baselines trained on the full data.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Normalizing Flows are Capable Models for Continuous Control

Neural Information Processing SystemsJun-18-2026, 09:58:14 GMT

Modern reinforcement learning (RL) algorithms have found success by using probabilistic models, such as transformers, energy-based models, and diffusion/flowbased models. To this end, researchers often choose to pay the price of accommodating these models into their algorithms - diffusion models are expressive, but are computationally intensive due to their reliance on solving differential equations, while autoregressive transformer models are scalable but typically require learning discrete representations. Normalizing flows (NFs), by contrast, seem to provide an appealing alternative, as they enable likelihoods and sampling without solving differential equations or autoregressive architectures. However, their potential in RL has received limited attention, partly due to the prevailing belief that normalizing flows lack sufficient expressivity. We show that this is not the case. Building on recent work in NFs, we propose a single NF architecture which integrates seamlessly into RL algorithms, serving as a policy, Q-function, and occupancy measure. Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.1

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

3D-IntPhys: Learning 3DVisual Intuitive Physics for Fluids, Rigid Bodies, and Granular Materials: Supplementary Material Anonymous Author(s) Affiliation Address email 1 1 Additional Results

Neural Information Processing SystemsApr-25-2026, 06:52:37 GMT

Embed to control: A locally linear307 latent dynamics model for control from raw images.

artificial intelligence, corr, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots (0.97)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

099fe6b0b444c23836c4a5d07346082b-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 14:36:22 GMT

Add feedback

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Neural Information Processing SystemsApr-24-2026, 12:32:39 GMT

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated. However, existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic, which suppresses the generalization of the value function and hinders the performance improvement. This paper explores mild but enough conservatism for offline learning while not harming generalization. We propose Mildly Conservative Q-learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Qvalues. We theoretically show that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OOD actions. Experimental results on the D4RL benchmarks demonstrate that MCQ achieves remarkable performance compared with prior work. Furthermore, MCQ shows superior generalization ability when transferring from offline to online, and significantly outperforms baselines. Our code is publicly available at https://github.com/dmksjfl/MCQ.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Effective Diversityin Population Based Reinforcement Learning

Neural Information Processing SystemsFeb-19-2026, 07:23:45 GMT

ininternational conferenceon learning representation, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Sweden > Stockholm > Stockholm (0.05)
Asia > Middle East > Jordan (0.05)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

41bd71e7bf7f9fe68f1c936940fd06bd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 02:05:57 GMT

In appendix, we Reward OptimizationWepropose .

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

LobsDICE: OfflineLearningfromObservationvia StationaryDistributionCorrectionEstimation

Neural Information Processing SystemsFeb-19-2026, 00:52:29 GMT

We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities.

artificial intelligence, demonstration, machine learning, (15 more...)

Neural Information Processing Systems

Technology: