Goto

Collaborating Authors

 Robots


Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning

Neural Information Processing Systems

Recent advances in large language models (LLMs) have enabled the automatic generation of executable code for task planning and control in embodied agents such as robots, demonstrating the potential of LLM-based embodied intelligence. However, these LLM-based code-as-policies approaches often suffer from limited environmental grounding, particularly in dynamic or partially observable settings, leading to suboptimal task success rates due to incorrect or incomplete code generation. In this work, we propose a neuro-symbolic embodied task planning framework that incorporates explicit symbolic verification and interactive validation processes during code generation. In the validation phase, the framework generates exploratory code that actively interacts with the environment to acquire missing observations while preserving task-relevant states. This integrated process enhances the grounding of generated code, resulting in improved task reliability and success rates in complex environments. We evaluate our framework on RLBench and in real-world settings across dynamic, partially observable scenarios. Experimental results demonstrate that our framework improves task success rates by 46.2\% over Code as Policies baselines and attains over 86.8\% executability of task-relevant actions, thereby enhancing the reliability of task planning in dynamic environments.


Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Neural Information Processing Systems

However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during real-world execution. To address these issues, we propose Adversarial Locomotion and Motion Imitation (ALMI), a novel framework that enables adversarial policy learning between upper and lower body. Specifically, the lower body aims to provide robust locomotion capabilities to follow velocity commands while the upper body tracks various motions. Conversely, the upper-body policy ensures effective motion tracking when the robot executes velocity-based movements. Through iterative updates, these policies achieve coordinated whole-body control, which can be extended to loco-manipulation tasks with teleoperation systems. Extensive experiments demonstrate that our method achieves robust locomotion and precise motion tracking in both simulation and on the full-size Unitree H1-2 robot. Additionally, we release a large-scale whole-body motion control dataset featuring high-quality episodic trajectories from MuJoCo simulations. The project page is https://almi-humanoid.github.io.


Non-Line-of-Sight 3D Reconstruction with Radar

Neural Information Processing Systems

Seeing hidden structures and objects around corners is critical for robots operating in complex, cluttered environments. Existing methods, however, are limited to detecting and tracking hidden objects rather than reconstructing the occluded full scene.


Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents

Neural Information Processing Systems

Pre-training vision-language representations on human action videos has emerged as a promising approach to reduce reliance on large-scale expert demonstrations for training embodied agents. However, prior methods often employ time contrastive learning based on goal-reaching heuristics, progressively aligning language instructions from the initial to the final frame. This overemphasis on future frames can result in erroneous vision-language associations, as actions may terminate early or include irrelevant moments in the end. To address this issue, we propose Action Temporal Coherence Learning (AcTOL) to learn ordered and continuous vision-language representations without rigid goal-based constraint. AcTOL treats a video as a continuous trajectory where it (1) contrasts semantic differences between frames to reflect their natural ordering, and (2) imposes a local Brownian bridge constraint to ensure smooth transitions across intermediate frames. Extensive imitation learning experiments on both simulated and real robots show that the pretrained features significantly enhance downstream manipulation tasks with high robustness to different linguistic styles of instructions, offering a viable pathway toward generalized embodied agents. Our project page is at https://actol-pretrain.github.io/.


GeForce Now's best tier just got a 70 price cut, but the clock is ticking

PCWorld

Nvidia GeForce Now is offering significant discounts on yearly subscriptions, with the Ultimate tier reduced to $130 annually, saving $70. PCWorld highlights this limited-time promotion runs until July 8th, making cloud gaming more accessible for budget-conscious users. The service enables streaming PC games from existing libraries on various devices without requiring powerful hardware. Nvidia's GeForce Now streaming service is a great way to make use of a big Steam library without needing a beefy gaming PC. That's becoming a much more appealing option, as prices for RAM and storage become untenable ( thanks, in no small part, to Nvidia). If you're thinking about signing up, Nvidia is offering up to $70 off a yearly subscription, but only for the next month or so. The "Summer Sale" brings the price of the Ultimate tier down to $130 for a year, and the Performance tier down to $65.


Beatbot Sora 10 review: The affordable pool robot most people need

PCWorld

When you purchase through links in our articles, we may earn a small commission. A budget pool robot that handles basic cleaning well enough, but it stands out most for how affordable it is. Beatbot's Sora line, introduced earlier this year, marked the robot producer's aggressive foray into lower-cost pool cleaning systems, with three models on sale at stair-stepped price points. The Sora 10 stands at the bottom of that price band, typically available for under $500, which is pretty much the bare minimum you can get away with paying for a pool robot that has any real value. So, what does $500 get you?


Robot Talk Episode 160 – Robotic blacksmiths, with Edward Mehr

Robohub

Claire chatted to Edward Mehr from Machina Labs about their RoboCraftsman that shapes complex metal parts for the aerospace, defence, and automotive industries. Edward Mehr is an entrepreneur and engineer specializing in advanced manufacturing, robotics, and artificial intelligence. As the Co-Founder and CEO of Machina Labs, he leads efforts to integrate AI-driven robotics into flexible, on-demand production systems. Under his leadership, Machina Labs is reshaping how industries such as aerospace, defence, and automotive approach metal forming and modern manufacturing. Before founding Machina Labs, Ed worked at leading technology companies, including Relativity Space, Averon, SpaceX, Google, and Microsoft.


Tree-Guided Diffusion Planner

Neural Information Processing Systems

Planning with pretrained diffusion models has emerged as a promising approach for solving test-time guided control problems. Standard gradient guidance typically performs optimally under convex, differentiable reward landscapes. However, it shows substantially reduced effectiveness in real-world scenarios with non-convex objectives, non-differentiable constraints, and multi-reward structures. Furthermore, recent supervised planning approaches require task-specific training or value estimators, which limits test-time flexibility and zero-shot generalization. We propose a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. We frame test-time planning as a tree search problem using a bi-level sampling process: (1) diverse parent trajectories are produced via training-free particle guidance to encourage broad exploration, and (2) sub-trajectories are refined through fast conditional denoising guided by task objectives. TDP addresses the limitations of gradient guidance by exploring diverse trajectory regions and harnessing gradient information across this expanded solution space using only pretrained models and test-time reward signals. We evaluate TDP on three diverse tasks: maze gold-picking, robot arm block manipulation, and AntMaze multi-goal exploration. TDP consistently outperforms state-of-the-art approaches on all tasks.


3D-Agent: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

Neural Information Processing Systems

Driven by the applications in autonomous driving, robotics, and augmented reality, 3D object annotation is a critical task compared to 2D annotation, such as spatial complexity, occlusion, and viewpoint inconsistency.


AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Neural Information Processing Systems

Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error accumulation under high degrees of freedom. On the other hand, they treat the entire mobile manipulation process with the same visual observation modality (e.g., either all 2D or all 3D), overlooking the distinct multimodal perception requirements at different stages during mobile manipulation. To address this, we propose the Adaptive Coordination Diffusion Transformer (AC-DiT), which enhances mobile base and manipulator coordination for end-to-end mobile manipulation.