AITopics | metaworld

07956d40074d6523bad11112b3225c6e-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 13:13:40 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation

Liu, Zhaoyang, Pan, Mokai, Wang, Zhongyi, Zhu, Kaizhen, Lu, Haotao, Wang, Jingya, Shi, Ye

arXiv.org Artificial IntelligenceDec-9-2025

Imitation learning with diffusion models has advanced robotic control by capturing multi-modal action distributions. However, existing approaches typically treat observations as high-level conditioning inputs to the denoising network, rather than integrating them into the stochastic dynamics of the diffusion process itself. As a result, sampling must begin from random Gaussian noise, weakening the coupling between perception and control and often yielding suboptimal performance. W e introduce Bridge-Policy, a generative visuomotor policy that explicitly embeds observations within the stochastic differential equation via a diffusion-bridge formulation. By constructing an observation-informed trajectory, BridgePolicy enables sampling to start from a rich, informative prior rather than random noise, substantially improving precision and reliability in control. A key challenge is that classical diffusion bridges connect distributions with matched dimensionality, whereas robotic observations are heterogeneous and multi-modal and do not naturally align with the action space. T o address this, we design a multi-modal fusion module and a semantic aligner that unify visual and state inputs and align observation and action representations, making the bridge applicable to heterogeneous robot data. Extensive experiments across 52 simulation tasks on three benchmarks and five real-world tasks demonstrate that BridgePolicy consistently outperforms state-of-the-art generative policies.

artificial intelligence, arxiv preprint arxiv, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2512.07212

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning Weikang Wan

Neural Information Processing SystemsNov-20-2025, 03:43:50 GMT

This paper introduces DiffTORI, which utilizes Diff erentiable T rajectory O ptimization as the policy representation to generate actions for deep R einforcement and I mitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Additional Experimental Results

Neural Information Processing SystemsNov-15-2025, 12:28:01 GMT

Robot action primitives are agnostic to the exact geometry of the underlying robot, provided the robot is a manipulator arm. As noted in the related works section, Dynamic Motion Primitives (DMP) are an alternative skill formulation that is common robotics literature. Each primitive ran 200 low-level actions with a path length of five high level actions, while the low-level path length was 500. With raw actions, each episode took 16.49 We run an ablation to measure how often RAPS uses each primitive.

artificial intelligence, machine learning, metaworld, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0ccd800d59e94679246ec79d4b19587e-Paper-Conference.pdf

Neural Information Processing SystemsNov-13-2025, 14:30:55 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Learning Parameterized Skills from Demonstrations

Gupta, Vedant, Fu, Haotian, Luo, Calvin, Jiang, Yiding, Konidaris, George

arXiv.org Artificial IntelligenceOct-29-2025

We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.

artificial intelligence, continuous parameter, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.24095

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(6 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning Weikang Wan

Neural Information Processing SystemsOct-10-2025, 16:06:17 GMT

This paper introduces DiffTORI, which utilizes Diff erentiable T rajectory O ptimization as the policy representation to generate actions for deep R einforcement and I mitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

0ccd800d59e94679246ec79d4b19587e-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 18:21:36 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Appendix T able of Contents

Neural Information Processing SystemsOct-8-2025, 01:54:50 GMT

The actor losses used in DoubleGum, SAC, and DDPG are all derived from the same principle. SAC (Haarnoja et al., 2018a,b) has a policy with learned variance and state-independent Section B.1 shows this for the actor losses of DoubleGum, SAC, and DDPG. We now relate the critic losses to each other, starting from the most general case, DoubleGum. The SAC noise model is derived from Equation 16 in three ways. In continuous control, Fujimoto et al. (2018) introduced Twin Networks, a method that improved Follow-up work selects a quantile estimate from an ensemble (Kuznetsov et al., 2020; Chen et al., 2021; Ball et al., 2023), which we demonstrate is Moskovitz et al. (2021) and Ball et al. (2023) showed that the appropriate Garg et al. (2023) present a method of estimating its value using Gumbel regression.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning

Luan, Yao, Mu, Ni, Yang, Yiqin, Xu, Bo, Jia, Qing-Shan

arXiv.org Artificial IntelligenceSep-30-2025

Preference-based reinforcement learning (PbRL) bypasses complex reward engineering by learning rewards directly from human preferences, enabling better alignment with human intentions. However, its effectiveness in multi-stage tasks, where agents sequentially perform sub-tasks (e.g., navigation, grasping), is limited by stage misalignment: Comparing segments from mismatched stages, such as movement versus manipulation, results in uninformative feedback, thus hindering policy learning. In this paper, we validate the stage misalignment issue through theoretical analysis and empirical experiments. To address this issue, we propose STage-AlIgned Reward learning (STAIR), which first learns a stage approximation based on temporal distance, then prioritizes comparisons within the same stage. Temporal distance is learned via contrastive learning, which groups temporally close states into coherent stages, without predefined task knowledge, and adapts dynamically to policy changes. Extensive experiments demonstrate STAIR's superiority in multi-stage tasks and competitive performance in single-stage tasks. Furthermore, human studies show that stages approximated by STAIR are consistent with human cognition, confirming its effectiveness in mitigating stage misalignment.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2509.23802

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Collaborating Authors

metaworld

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

07956d40074d6523bad11112b3225c6e-Supplemental-Conference.pdf

Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning Weikang Wan

A Additional Experimental Results

0ccd800d59e94679246ec79d4b19587e-Paper-Conference.pdf

Learning Parameterized Skills from Demonstrations

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning Weikang Wan

0ccd800d59e94679246ec79d4b19587e-Paper-Conference.pdf

Appendix T able of Contents

STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning