AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)

Neural Information Processing SystemsFeb-13-2026, 15:36:41 GMT

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces Peter Shaw

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available.

demonstration, machine learning, reinforcement learning, (21 more...)

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
(4 more...)

Pratyusha Sharma, Deepak Pathak, Abhinav Gupta

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller

Neural Information Processing SystemsFeb-12-2026, 20:56:05 GMT

Neural Information Processing Systems http://nips.cc/

controller, demonstration, robot, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Neural Information Processing SystemsFeb-9-2026, 11:46:35 GMT

7b647a7d88f4d6319bf0d600d168dbeb-Paper.pdf

It is therefore difficult to discuss and understand what choices, among the high-levelalgorithmic options as well as low-levelimplementation details, matter.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceDec-10-2025

OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer

Yin, Jessica, Qi, Haozhi, Wi, Youngsun, Kundu, Sayantan, Lambeta, Mike, Yang, William, Wang, Changhao, Wu, Tingfan, Malik, Jitendra, Hellebrekers, Tess

Abstract-- Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mastering manipulation. We introduce OSMO, an open-source wearable tactile glove designed for human-to-robot skill transfer . The glove features 12 three-axis tactile sensors across the fingertips and palm and is designed to be compatible with state-of-the-art hand-tracking methods for in-the-wild data collection. We demonstrate that a robot policy trained exclusively on human demonstrations collected with OSMO, without any real robot data, is capable of executing a challenging contact-rich manipulation task. On a real-world wiping task requiring sustained contact pressure, our tactile-aware policy achieves a 72% success rate, outperforming vision-only baselines by eliminating contact-related failure modes. We release complete hardware designs, firmware, and assembly instructions to support community adoption. Tactile sensing enables humans to excel at manipulation by providing real-time feedback about contact forces that vision alone cannot capture. Consider trying to dice a carrot from video alone; one cannot observe the nuanced force control that makes the task successful. Many different applied forces can result in nearly identical visual appearances, leaving critical information about force control invisible to vision.

artificial intelligence, demonstration, manipulation, (16 more...)

2512.0892

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry: Education (0.61)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.91)

arXiv.org Artificial IntelligenceNov-25-2025

Latent Adaptive Planner for Dynamic Manipulation

Noh, Donghun, Kong, Deqian, Zhao, Minglu, Lizarraga, Andrew, Xie, Jianwen, Wu, Ying Nian, Hong, Dennis

We present the Latent Adaptive Planner (LAP), a trajectory-level latent-variable policy for dynamic nonprehensile manipulation (e.g., box catching) that formulates planning as inference in a low-dimensional latent space and is learned effectively from human demonstration videos. During execution, LAP achieves real-time adaptation by maintaining a posterior over the latent plan and performing variational replanning as new observations arrive. To bridge the embodiment gap between humans and robots, we introduce a model-based proportional mapping that regenerates accurate kinematic-dynamic joint states and object positions from human demonstrations. Through challenging box catching experiments with varying object properties, LAP demonstrates superior success rates, trajectory smoothness, and energy efficiency by learning human-like compliant motions and adaptive behaviors. Overall, LAP enables dynamic manipulation with real-time adaptation and successfully transfer across heterogeneous robot platforms using the same human demonstration videos.

artificial intelligence, machine learning, manipulation, (15 more...)

2505.03077

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

arXiv.org Artificial IntelligenceNov-24-2025

VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation

Chen, Jiaming, Jiang, Yiyu, Huang, Aoshen, Li, Yang, Pan, Wei

Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two objects such as assembly, tool use, and bimanual grasping. To address these challenges, we introduce a novel VLM-Assisted Siamese Flow Diffusion (VLM-SFD) framework for efficient imitation learning in dual-arm cooperative manipulation. The proposed VLM-SFD framework exhibits outstanding adaptability, significantly enhancing the ability to rapidly adapt and generalize to diverse real-world tasks from only a minimal number of human demonstrations. Specifically, we propose a Siamese Flow Diffusion Network (SFDNet) employs a dual-encoder-decoder Siamese architecture to embed two target objects into a shared latent space, while a diffusion-based conditioning process - conditioned by task instructions - generates two-stream object-centric motion flows that guide dual-arm coordination. We further design a dynamic task assignment strategy that seamlessly maps the predicted 2D motion flows into 3D space and incorporates a pre-trained vision-language model (VLM) to adaptively assign the optimal motion to each robotic arm over time. Experiments validate the effectiveness of the proposed method, demonstrating its ability to generalize to diverse manipulation tasks while maintaining high efficiency and adaptability. The code and demo videos are publicly available on our project website https://sites.google.com/view/vlm-sfd/.

large language model, machine learning, natural language, (19 more...)

doi: 10.1109/LRA.2025.3627381

2506.13428

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-21-2025

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Guzey, Irmak, Qi, Haozhi, Urain, Julen, Wang, Changhao, Yin, Jessica, Bodduluri, Krishna, Lambeta, Mike, Pinto, Lerrel, Rai, Akshara, Malik, Jitendra, Wu, Tingfan, Sharma, Akash, Bharadhwaj, Homanga

Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on labor-intensive robot data collection. Despite substantial efforts, progress toward this goal has been bottle-necked by the embodiment gap between humans and robots, as well as by difficulties in extracting relevant contextual and motion cues that enable learning of autonomous policies from in-the-wild human videos. We claim that with simple yet sufficiently powerful hardware for obtaining human data and our proposed framework AINA, we are now one significant step closer to achieving this dream. AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any environment using Aria Gen 2 glasses. These glasses are lightweight and portable, feature a high-resolution RGB camera, provide accurate on-board 3D head and hand poses, and offer a wide stereo view that can be leveraged for depth estimation of the scene. This setup enables the learning of 3D point-based policies for multi-fingered hands that are robust to background changes and can be deployed directly without requiring any robot data (including online corrections, reinforcement learning, or simulation). We compare our framework against prior human-to-robot policy learning approaches, ablate our design choices, and demonstrate results across nine everyday manipulation tasks. Robot rollouts are best viewed on our website: https://aina-robot.github.io.

artificial intelligence, demonstration, robot, (16 more...)

2511.16661

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Pomponi, Vincenzo, Franceschi, Paolo, Baraldo, Stefano, Roveda, Loris, Avram, Oliver, Gambardella, Luca Maria, Valente, Anna

DynaMimicGen: A Data Generation Framework for Robot Learning of Dynamic Tasks

arXiv.org Artificial IntelligenceNov-21-2025

Learning robust manipulation policies typically requires large and diverse datasets, the collection of which is time-consuming, labor-intensive, and often impractical for dynamic environments. In this work, we introduce DynaMimicGen (D-MG), a scalable dataset generation framework that enables policy training from minimal human supervision while uniquely supporting dynamic task settings. Given only a few human demonstrations, D-MG first segments the demonstrations into meaningful sub-tasks, then leverages Dynamic Movement Primitives (DMPs) to adapt and generalize the demonstrated behaviors to novel and dynamically changing environments. Improving prior methods that rely on static assumptions or simplistic trajectory interpolation, D-MG produces smooth, realistic, and task-consistent Cartesian trajectories that adapt in real time to changes in object poses, robot states, or scene geometry during task execution. Our method supports different scenarios - including scene layouts, object instances, and robot configurations - making it suitable for both static and highly dynamic manipulation tasks. We show that robot agents trained via imitation learning on D-MG-generated data achieve strong performance across long-horizon and contact-rich benchmarks, including tasks like cube stacking and placing mugs in drawers, even under unpredictable environment changes. By eliminating the need for extensive human demonstrations and enabling generalization in dynamic settings, D-MG offers a powerful and efficient alternative to manual data collection, paving the way toward scalable, autonomous robot learning.

artificial intelligence, demonstration, machine learning, (16 more...)

2511.16223

Country: Europe (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)

arXiv.org Artificial IntelligenceNov-20-2025

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

Guo, Jian-Ting, Chen, Yu-Cheng, Hsieh, Ping-Chun, Ho, Kuo-Hao, Huang, Po-Wei, Wu, Ti-Rong, Wu, I-Chen

Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents in the human evaluation study. Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents. Our code is available at https://rlg.iis.sinica.edu.tw/papers/MAQ.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2511.15055

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)