AITopics | Robots

Collaborating Authors

Robots

Robots in the work place can perform hazardous or even 'impossible' tasks; e.g., toxic waste clean-up, desert and space exploration, and more. AI researchers are also interested in the intelligent processing involved in moving about and manipulating objects in the real world.

News Overviews Instructional Materials AI-Alerts Classics

Memorize What Matters: Emergent Scene Decomposition from Multitraverse Yiming Li1,2 Zehong Wang 1 Yue Wang

Neural Information Processing SystemsJun-1-2025, 13:43:30 GMT

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting.

artificial intelligence, machine learning, traversal, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

Deepak Pathak, Christopher Lu, Trevor Darrell, Phillip Isola, Alexei A. Efros

Neural Information Processing SystemsJun-1-2025, 12:48:37 GMT

Contemporary sensorimotor learning approaches typically start with an existing complex agent (e.g., a robotic arm), which they learn to control. In contrast, this paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies. Each primitive agent consists of a limb with a motor attached at one end. Limbs may choose to link up to form collectives. When a limb initiates a link-up action, and there is another limb nearby, the latter is magnetically connected to the'parent' limb's motor. This forms a new single agent, which may further link with other agents. In this way, complex morphologies can emerge, controlled by a policy whose architecture is in explicit correspondence with the morphology. We evaluate the performance of these dynamic and modular agents in simulated environments. We demonstrate better generalization to test-time changes both in the environment, as well as in the structure of the agent, compared to static and monolithic baselines.

artificial intelligence, machine learning, morphology, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.54)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Robots (0.89)

Add feedback

Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions

Neural Information Processing SystemsJun-1-2025, 12:01:16 GMT

Understanding how humans interact with each other is key to building realistic multi-human virtual reality systems. This area remains relatively unexplored due to the lack of large-scale datasets. Recent datasets focusing on this issue mainly consist of activities captured entirely in controlled indoor environments with choreographed actions, significantly affecting their diversity. To address this, we introduce Harmony4D, a multi-view video dataset for human-human interaction featuring in-the-wild activities such as wrestling, dancing, MMA, and more. We use a flexible multi-view capture system to record these dynamic activities and provide annotations for human detection, tracking, 2D/3D pose estimation, and mesh recovery for closely interacting subjects. We propose a novel markerless algorithm to track 3D human poses in severe occlusion and close interaction to obtain our annotations with minimal manual intervention. Harmony4D consists of 1.66 million images and 3.32 million human instances from more than 20 synchronized cameras with 208 video sequences spanning diverse environments and 24 unique subjects. We rigorously evaluate existing stateof-the-art methods for mesh recovery and highlight their significant limitations in modeling close interaction scenarios. Additionally, we fine-tune a pre-trained HMR2.0 model on Harmony4D and demonstrate an improved performance of 54.8% PVE in scenes with severe occlusion and contact.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.91)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

SS3DM: Benchmarking Street-View Surface Reconstruction with a Synthetic 3D Mesh Dataset 1* Heng Zhou

Neural Information Processing SystemsJun-1-2025, 11:01:53 GMT

Reconstructing accurate 3D surfaces for street-view scenarios is crucial for applications such as digital entertainment and autonomous driving simulation. However, existing street-view datasets, including KITTI, Waymo, and nuScenes, only offer noisy LiDAR points as ground-truth data for geometric evaluation of reconstructed surfaces. These geometric ground-truths often lack the necessary precision to evaluate surface positions and do not provide data for assessing surface normals. To overcome these challenges, we introduce the SS3DM dataset, comprising precise Synthetic Street-view 3D Mesh models exported from the CARLA simulator. These mesh models facilitate accurate position evaluation and include normal vectors for evaluating surface normal. To simulate the input data in realistic driving scenarios for 3D reconstruction, we virtually drive a vehicle equipped with six RGB cameras and five LiDAR sensors in diverse outdoor scenes. Leveraging this dataset, we establish a benchmark for state-of-the-art surface reconstruction methods, providing a comprehensive evaluation of the associated challenges.

artificial intelligence, machine learning, reconstruction, (14 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.49)
Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Add feedback

We Bought a 'Peeing' Robot Attack Dog From Temu. It Was Even Weirder Than Expected

WIREDJun-1-2025, 10:00:00 GMT

In my 15 years of reviewing tech, this pellet-firing, story-telling, pretend-urinating robot attack dog is easily the strangest thing I've ever tested. Arriving in a slightly battered box following a series of questionable decisions on Temu, I'm immediately drawn to the words "FIRE BULLETS PET" emblazoned on the box. And there, resting behind the protective plastic window with all the innocence of a newborn lamb, lies the plastic destroyer of worlds that my four-and-a-half-year-old immediately (and inexplicably), names Clippy. Clippy is a robot dog. And he (my son assures me that it's a he), is clearly influenced by the remarkable, and somewhat terrifying, robotic canine creations of Boston Dynamics--a renowned company that's leading the robot revolution.

artificial intelligence, clippy, robot attack dog, (5 more...)

WIRED

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Multi-Reward Best Policy Identification Filippo Vannella Ericsson AB

Neural Information Processing SystemsJun-1-2025, 09:51:02 GMT

Rewards are a critical aspect of formulating Reinforcement Learning (RL) problems; often, one may be interested in testing multiple reward functions, or the problem may naturally involve multiple rewards. In this study, we investigate the Multi-Reward Best Policy Identification (MR-BPI) problem, where the goal is to determine the best policy for all rewards in a given set R with minimal sample complexity and a prescribed confidence level. We derive a fundamental instancespecific lower bound on the sample complexity required by any Probably Correct (PC) algorithm in this setting. This bound guides the design of an optimal exploration policy attaining minimal sample complexity. However, this lower bound involves solving a hard non-convex optimization problem. We address this challenge by devising a convex approximation, enabling the design of sample-efficient algorithms. We propose MR-NaS, a PC algorithm with competitive performance on hard-exploration tabular environments. Extending this approach to Deep RL (DRL), we also introduce DBMR-BPI, an efficient algorithm for model-free exploration in multi-reward settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Sweden (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

E-Motion: Future Motion Simulation via Event Sequence Diffusion

Neural Information Processing SystemsJun-1-2025, 09:48:39 GMT

Forecasting a typical object's future motion is a critical task for interpreting and interacting with dynamic environments in computer vision. Event-based sensors, which could capture changes in the scene with exceptional temporal granularity, may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable. Inspired by that, we propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework. Specifically, we initially employ pre-trained stable video diffusion models to adapt the event sequence dataset.

diffusion model, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Toward Approaches to Scalability in 3D Human Pose Estimation

Neural Information Processing SystemsJun-1-2025, 09:43:29 GMT

In the field of 3D Human Pose Estimation (HPE), scalability and generalization across diverse real-world scenarios remain significant challenges. This paper addresses two key bottlenecks to scalability: limited data diversity caused by'popularity bias' and increased'one-to-many' depth ambiguity arising from greater pose diversity. We introduce the Biomechanical Pose Generator (BPG), which leverages biomechanical principles, specifically the normal range of motion, to autonomously generate a wide array of plausible 3D poses without relying on a source dataset, thus overcoming the restrictions of popularity bias. To address depth ambiguity, we propose the Binary Depth Coordinates (BDC), which simplifies depth estimation into a binary classification of joint positions (front or back). This method decomposes a 3D pose into three core elements--2D pose, bone length, and binary depth decision--substantially reducing depth ambiguity and enhancing model robustness and accuracy, particularly in complex poses. Our results demonstrate that these approaches increase the diversity and volume of pose data while consistently achieving performance gains, even amid the complexities introduced by increased pose diversity.

artificial intelligence, machine learning, video understanding, (11 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.73)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.63)

Add feedback

SeeA

Neural Information Processing SystemsJun-1-2025, 08:32:51 GMT

This paper aims at the third, i.e., developing the Sampling-exploration enhanced A

data mining, machine learning, node, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.93)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback

DiffuBox: Refining 3D Object Detection with Point Diffusion Katie Z Luo

Neural Information Processing SystemsJun-1-2025, 07:51:28 GMT

Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors.

artificial intelligence, diffubox, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.93)

Industry: