AITopics

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Neural Information Processing SystemsFeb-9-2026, 15:45:50 GMT

2f0435cffef91068ced08d7c7d8e643e-Paper-Conference.pdf

demonstration, learning, regularization, (14 more...)

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-7-2026, 08:05:27 GMT

Landmark-RxR: SolvingVision-and-Language NavigationwithFine-GrainedAlignmentSupervision

In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions. Cross-modal alignment is one of the most critical challenges in VLN because the predicted trajectory needs to match the given instruction accurately.

machine learning, natural language, trajectory, (18 more...)

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Neural Information Processing SystemsNov-21-2025, 04:03:33 GMT

Robust Imitation of Diverse Behaviors

Ziyu Wang, Josh S. Merel, Scott E. Reed, Nando de Freitas, Gregory Wayne, Nicolas Heess

In this paper, we show how to combine the favourable aspects of these two approaches.

artificial intelligence, machine learning, trajectory, (15 more...)

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-20-2025

RoboTidy : A 3D Gaussian Splatting Household Tidying Benchmark for Embodied Navigation and Action

Sun, Xiaoquan, Zhang, Ruijian, Pang, Kang, Miao, Bingchen, Tan, Yuxiang, Yang, Zhen, Li, Ming, Chen, Jiayu

Household tidying is an important application area, yet current benchmarks neither model user preferences nor support mobility, and they generalize poorly, making it hard to comprehensively assess integrated language-to-action capabilities. To address this, we propose RoboTidy, a unified benchmark for language-guided household tidying that supports Vision-Language-Action (VLA) and Vision-Language-Navigation (VLN) training and evaluation. RoboTidy provides 500 photorealistic 3D Gaussian Splatting (3DGS) household scenes (covering 500 objects and containers) with collisions, formulates tidying as an "Action (Object, Container)" list, and supplies 6.4k high-quality manipulation demonstration trajectories and 1.5k naviagtion trajectories to support both few-shot and large-scale training. We also deploy RoboTidy in the real world for object tidying, establishing an end-to-end benchmark for household tidying. RoboTidy offers a scalable platform and bridges a key gap in embodied AI by enabling holistic and realistic evaluation of language-guided robots.

container, large language model, machine learning, (15 more...)

2511.14161

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceNov-7-2025

Learning from Online Videos at Inference Time for Computer-Use Agents

Liu, Yujian, Wang, Ze, Chen, Hao, Sun, Ximeng, Yu, Xiaodong, Wu, Jialian, Liu, Jiang, Barsoum, Emad, Liu, Zicheng, Chang, Shiyu

Computer-use agents can operate computers and automate laborious tasks, but despite recent rapid progress, they still lag behind human users, especially when tasks require domain-specific procedural knowledge about particular applications, platforms, and multi-step workflows. Humans can bridge this gap by watching video tutorials: we search, skim, and selectively imitate short segments that match our current subgoal. In this paper, we study how to enable computer-use agents to learn from online videos at inference time effectively. We propose a framework that retrieves and filters tutorial videos, converts them into structured demonstration trajectories, and dynamically selects trajectories as in-context guidance during execution. Particularly, using a VLM, we infer UI actions, segment videos into short subsequences of actions, and assign each subsequence a textual objective. At inference time, a two-stage selection mechanism dynamically chooses a single trajectory to add in context at each step, focusing the agent on the most helpful local guidance for its next decision. Experiments on two widely used benchmarks show that our framework consistently outperforms strong base agents and variants that use only textual tutorials or transcripts. Analyses highlight the importance of trajectory segmentation and selection, action filtering, and visual information, suggesting that abundant online videos can be systematically distilled into actionable guidance that improves computer-use agents at inference time. Our code is available at https://github.com/UCSB-NLP-Chang/video_demo.

large language model, machine learning, natural language, (21 more...)

2511.04137

Country: Europe > Austria (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Neural Information Processing SystemsOct-8-2025, 09:24:14 GMT

Coherent Soft Imitation Learning Joe Watson Sandy H. Huang Nicolas Heess

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions.

demonstration, learning, regularization, (14 more...)

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-8-2025

DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation

Lee, Taeyeop, Kang, Gyuree, Wen, Bowen, Kim, Youngho, Back, Seunghyeok, Kweon, In So, Shim, David Hyunchul, Yoon, Kuk-Jin

Abstract-- Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities. Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. T o address this limitation, we propose DeL T a (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. I. INTRODUCTION Transparent objects are prevalent across real-world environments, including laboratories, kitchens, and manufacturing facilities. However, conventional depth sensors often fail to perceive these objects accurately.

artificial intelligence, large language model, natural language, (19 more...)

2510.05662

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

arXiv.org Artificial IntelligenceOct-8-2025

mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies

Steiner, Remo, Millane, Alexander, Tingdahl, David, Volk, Clemens, Ramasamy, Vikram, Yao, Xinjie, Du, Peter, Pouya, Soha, Sheng, Shiwei

End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.

artificial intelligence, machine learning, reconstruction, (17 more...)

2509.20297

Country: Europe > Switzerland (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Jiang, Sunshine, Fang, Xiaolin, Roy, Nicholas, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack, Ancha, Siddharth

Streaming Flow Policy: Simplifying diffusion/flow-matching policies by treating action trajectories as flow trajectories

arXiv.org Artificial IntelligenceSep-26-2025

Recent advances in diffusion$/$flow-matching policies have enabled imitation learning of complex, multi-modal action trajectories. However, they are computationally expensive because they sample a trajectory of trajectories: a diffusion$/$flow trajectory of action trajectories. They discard intermediate action trajectories, and must wait for the sampling process to complete before any actions can be executed on the robot. We simplify diffusion$/$flow policies by treating action trajectories as flow trajectories. Instead of starting from pure noise, our algorithm samples from a narrow Gaussian around the last action. Then, it incrementally integrates a velocity field learned via flow matching to produce a sequence of actions that constitute a single trajectory. This enables actions to be streamed to the robot on-the-fly during the flow sampling process, and is well-suited for receding horizon policy execution. Despite streaming, our method retains the ability to model multi-modal behavior. We train flows that stabilize around demonstration trajectories to reduce distribution shift and improve imitation learning performance. Streaming flow policy outperforms prior methods while enabling faster policy execution and tighter sensorimotor loops for learning-based robot control. Project website: https://streaming-flow-policy.github.io/

artificial intelligence, machine learning, trajectory, (18 more...)

2505.21851

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)