AITopics | real robot

Collaborating Authors

real robot

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8a146f1a3da4700cbf03cdc55e2daae6-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 20:55:49 GMT

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DIJIT: A Robotic Head for an Active Observer

Tabrizi, Mostafa Kamali, Chi, Mingshi, Dey, Bir Bikram, Yuan, Yu Qing, Solbach, Markus D., Liu, Yiqian, Jenkin, Michael, Tsotsos, John K.

arXiv.org Artificial IntelligenceDec-10-2025

We present DIJIT, a novel binocular robotic head expressly designed for mobile agents that behave as active observers. DIJIT's unique breadth of functionality enables active vision research and the study of human-like eye and head-neck motions, their interrelationships, and how each contributes to visual ability. DIJIT is also being used to explore the differences between how human vision employs eye/head movements to solve visual tasks and current computer vision methods. DIJIT's design features nine mechanical degrees of freedom, while the cameras and lenses provide an additional four optical degrees of freedom. The ranges and speeds of the mechanical design are comparable to human performance. Our design includes the ranges of motion required for convergent stereo, namely, vergence, version, and cyclotorsion. The exploration of the utility of these to both human and machine vision is ongoing. Here, we present the design of DIJIT and evaluate aspects of its performance. We present a new method for saccadic camera movements. In this method, a direct relationship between camera orientation and motor values is developed. The resulting saccadic camera movements are close to human movements in terms of their accuracy.

artificial intelligence, dijit, saccade, (15 more...)

arXiv.org Artificial Intelligence

2512.07998

Country:

Europe (1.00)
North America > United States (0.93)
Asia (0.68)

Genre: Research Report (0.82)

Industry:

Media > Photography (0.68)
Media > Film (0.68)
Media > Television (0.54)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback

8a146f1a3da4700cbf03cdc55e2daae6-AuthorFeedback.pdf

Neural Information Processing SystemsOct-9-2025, 14:44:32 GMT

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI

Tai, Cong, Zheng, Zhaoyu, Long, Haixu, Wu, Hansheng, Xiang, Haodong, Long, Zhengbin, Xiong, Jun, Shi, Rong, Zhang, Shizhuang, Qiu, Gang, Wang, He, Li, Ruifeng, Huang, Jun, Chang, Bin, Feng, Shuai, Shen, Tao

arXiv.org Artificial IntelligenceSep-19-2025

Abstract-- The emerging field of Vision-Language-Action (VLA) for humanoid robots faces several fundamental challenges, including the high cost of data acquisition, the lack of a standardized benchmark, and the significant gap between simulation and the real world. T o overcome these obstacles, we propose RealMirror, a comprehensive, open-source embodied AI VLA platform. RealMirror builds an efficient, low-cost data collection, model training, and inference system that enables end-to-end VLA research without requiring a real robot. T o facilitate model evolution and fair comparison, we also introduce a dedicated VLA benchmark for humanoid robots, featuring multiple scenarios, extensive trajectories, and various VLA models. Jun Xiong is with The Chinese University of Hong Kong, Shenzhen, China. In conclusion, with the unification of these critical components, RealMirror provides a robust framework that significantly accelerates the development of VLA models for humanoid robots. I. INTRODUCTION The rapid evolution of Large Language Models (LLMs) like GPT [1], Qwen [2], and Deepseek [3] has significantly advanced the development of Artificial General Intelligence (AGI). While exhibiting remarkable model performance, they lack the ability to perform tasks in the real world.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.14687

Country:

Asia > China > Hong Kong (0.24)
Asia > China > Guangdong Province > Shenzhen (0.24)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Visuomotor Grasping with World Models for Surgical Robots

Lin, Hongbin, Li, Bin, Au, Kwok Wai Samuel

arXiv.org Artificial IntelligenceAug-18-2025

Grasping is a fundamental task in robot-assisted surgery (RAS), and automating it can reduce surgeon workload while enhancing efficiency, safety, and consistency beyond teleoperated systems. Most prior approaches rely on explicit object pose tracking or handcrafted visual features, limiting their generalization to novel objects, robustness to visual disturbances, and the ability to handle deformable objects. Visuomotor learning offers a promising alternative, but deploying it in RAS presents unique challenges, such as low signal-to-noise ratio in visual observations, demands for high safety and millimeter-level precision, as well as the complex surgical environment. This paper addresses three key challenges: (i) sim-to-real transfer of visuomotor policies to ex vivo surgical scenes, (ii) visuomotor learning using only a single stereo camera pair -- the standard RAS setup, and (iii) object-agnostic grasping with a single policy that generalizes to diverse, unseen surgical objects without retraining or task-specific models. We introduce Grasp Anything for Surgery V2 (GASv2), a visuomotor learning framework for surgical grasping. GASv2 leverages a world-model-based architecture and a surgical perception pipeline for visual observations, combined with a hybrid control system for safe execution. We train the policy in simulation using domain randomization for sim-to-real transfer and deploy it on a real robot in both phantom-based and ex vivo surgical settings, using only a single pair of endoscopic cameras. Extensive experiments show our policy achieves a 65% success rate in both settings, generalizes to unseen objects and grippers, and adapts to diverse disturbances, demonstrating strong performance, generality, and robustness.

controller, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2508.112

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Add feedback

Tackling the 3D Simulation League: an interview with Klaus Dorer and Stefan Glaser

AIHubJul-15-2025, 14:26:05 GMT

A screenshot from the new simulator that will be trialled for a special challenge at RoboCup2025. The annual RoboCup event, where teams gather from across the globe to take part in competitions across a number of leagues, will this year take place in Brazil, from 15-21 July. In advance of kick-off, we spoke to two members of the RoboCup Soccer 3D Simulation League: Executive Committee Member Klaus Dorer, and Stefan Glaser, who is on the Maintenance Committee and who has been recently developing a new simulator for the League. Could start by just giving us a quick introduction to the Simulation League? Klaus Dorer: There are two Simulation Leagues in Soccer: the 2D Simulation League and the 3D Simulation League. The 2D Simulation League, as the name suggests, is a flat league where the players and ball are simulated with simplified physics and the main focus is on team strategy.

artificial intelligence, simulation league, simulator, (12 more...)

AIHub

Country: South America > Brazil (0.25)

Genre: Personal > Interview (0.40)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Soccer Robots (0.76)

Add feedback

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin

Abou-Chakra, Jad, Sun, Lingfeng, Rana, Krishan, May, Brandon, Schmeckpeper, Karl, Suenderhauf, Niko, Minniti, Maria Vittoria, Herlant, Laura

arXiv.org Artificial IntelligenceJul-3-2025

We introduce real-is-sim, a new approach to integrating simulation into behavior cloning pipelines. In contrast to real-only methods, which lack the ability to safely test policies before deployment, and sim-to-real methods, which require complex adaptation to cross the sim-to-real gap, our framework allows policies to seamlessly switch between running on real hardware and running in parallelized virtual environments. At the center of real-is-sim is a dynamic digital twin, powered by the Embodied Gaussian simulator, that synchronizes with the real world at 60Hz. This twin acts as a mediator between the behavior cloning policy and the real robot. Policies are trained using representations derived from simulator states and always act on the simulated robot, never the real one. During deployment, the real robot simply follows the simulated robot's joint states, and the simulation is continuously corrected with real world measurements. This setup, where the simulator drives all policy execution and maintains real-time synchronization with the physical world, shifts the responsibility of crossing the sim-to-real gap to the digital twin's synchronization mechanisms, instead of the policy itself. We demonstrate real-is-sim on a long-horizon manipulation task (PushT), showing that virtual evaluations are consistent with real-world results. We further show how real-world data can be augmented with virtual rollouts and compare to policies trained on different representations derived from the simulator state including object poses and rendered images from both static and robot-mounted cameras. Our results highlight the flexibility of the real-is-sim framework across training, evaluation, and deployment stages. Videos available at https://real-is-sim.github.io.

artificial intelligence, robot, simulator, (17 more...)

arXiv.org Artificial Intelligence

2504.03597

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.91)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?

Gundawar, Atharva, Sagar, Som, Senanayake, Ransalu

arXiv.org Artificial IntelligenceJul-1-2025

Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often assumes a deep understanding of low-level physical prerequisites, a capability that remains largely unverified. For robots to perform actions reliably, they must comprehend intrinsic object properties (e.g., material, weight), action affordances (e.g., graspable, stackable), and physical constraints (e.g., stability, reachability, or an object's state, such as being closed). Despite the widespread use of VLMs in manipulation tasks, we argue that off-the-shelf models may lack this granular, physically grounded understanding, as such prerequisites are often overlooked during training. To address this critical gap, we introduce PAC Bench, a comprehensive benchmark designed to systematically evaluate VLMs on their understanding of core Properties, Affordances, and Constraints (PAC) from a task executability perspective. PAC Bench features a diverse dataset with over 30,000 annotations, comprising 673 real-world images (115 object classes, 15 property types, and 1 to 3 affordances defined per class), 100 real-world humanoid-view scenarios, and 120 unique simulated constraint scenarios across four tasks. Our evaluations reveal significant gaps in the ability of current VLMs to grasp fundamental physical concepts, highlighting limitations in their suitability for reliable robot manipulation and pointing to key areas for targeted research. PAC Bench also serves as a standardized benchmark for rigorously evaluating physical reasoning in VLMs and guiding the development of more robust, physically grounded models for robotic applications. Project Page: https://pacbench.github.io/

constraint, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.23725

Country:

North America > United States > Arizona (0.04)
Africa > Mozambique > Gaza Province > Xai-Xai (0.04)

Genre: Research Report (0.63)

Industry: Appliances & Durable Goods (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RoboTwin: A Robotic Teleoperation Framework Using Digital Twins

Yelchuri, Harsha, Singh, Diwakar Kumar, Gnani, Nithish Krishnabharathi, Prabhakar, T V, Singh, Chandramani

arXiv.org Artificial IntelligenceJun-3-2025

--Robotic surgery imposes a significant cognitive burden on the surgeon. This cognitive burden increases in the case of remote robotic surgeries due to latency between entities and thus might affect the quality of surgery. Here, the patient side and the surgeon side are geographically separated by hundreds to thousands of kilometres. Real-time teleoperation of robots requires strict latency bounds for control and feedback. We propose a dual digital twin (DT) framework and explain the simulation environment and teleoperation framework. Here, the doctor visually controls the locally available DT of the patient side and thus experiences minimum latency. The second digital twin serves two purposes. Firstly, it provides a layer of safety for operator-related mishaps, and secondly, it conveys the coordinates of known and unknown objects back to the operator's side digital twin. We show that teleoperation accuracy and user experience are enhanced with our approach. Experimental results using the NASA-TLX metric show that the quality of surgery is vastly improved with DT, perhaps due to reduced cognitive burden. The network data rate for identifying objects at the operator side is 25x lower than normal.

artificial intelligence, digital twin, robot, (16 more...)

arXiv.org Artificial Intelligence

2506.01027

Country: North America > United States (0.49)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Government > Regional Government > North America Government > United States Government (0.49)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning

Singh, Rohan P., Morisawa, Mitsuharu, Benallegue, Mehdi, Xie, Zhaoming, Kanehiro, Fumio

arXiv.org Artificial IntelligenceApr-21-2025

Email: rohan-singh@aist.go.jp Abstract -- For the deployment of legged robots in real-world environments, it is essential to develop robust locomotion control methods for challenging terrains that may exhibit unexpected deformability and irregularity. In this paper, we explore the application of sim-to-real deep reinforcement learning (RL) for the design of bipedal locomotion controllers for humanoid robots on compliant and uneven terrains. Our key contribution is to show that a simple training curriculum for exposing the RL agent to randomized terrains in simulation can achieve robust walking on a real humanoid robot using only proprioceptive feedback. We train an end-to-end bipedal locomotion policy using the proposed approach, and show extensive real-robot demonstration on the HRP-5P humanoid over several difficult terrains inside and outside the lab environment. Further, we argue that the robustness of a bipedal walking policy can be improved if the robot is allowed to exhibit aperiodic motion with variable stepping frequency. We propose a new control policy to enable modification of the observed clock signal, leading to adaptive gait frequencies depending on the terrain and command velocity. Through simulation experiments, we show the effectiveness of this policy specifically for walking over challenging terrains by controlling swing and stance durations. This is primarily due to the strict temporal and spatial assumptions placed by such approaches on the foot trajectories and environmental contacts [1], [2]. When faced with an irregular or compliant (i.e.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/Humanoids58906.2024.10769793

2504.13619

Genre:

Instructional Material > Course Syllabus & Notes (0.54)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback