Goto

Collaborating Authors

 Robotics & Automation


Are Humanoid Robots Ready to Be Deployed?

The New Yorker

Are Humanoid Robots Ready to Be Deployed? Neo and a dozen other robots with human forms are scheduled to hit the market. "The same robot that can land a backflip might not be able to walk up a flight of stairs," a researcher said. On a recent sunny day in Silicon Valley, I visited the industrial headquarters of 1X Technologies. Security was tight, so I had to put a sticker over my cellphone's camera and talk my way out of signing an N.D.A. before I was brought into an enormous space to meet Neo, the company's home robot. Neo stands five feet six and has no facial features except for two black cameras in place of eyes. The robot is a humanoid--its design is inspired by the human form--and its proportions are a blend of those of the median American male and those of the median American female. But Neo has no skin. Instead, it wears a beige nylon turtleneck bodysuit, gloves, and padded shoes over a see-through carapace. Under that is a skeleton made up of more than a hundred whizzing motors and cordlike artificial tendons that control Neo's limbs. Neo's cozy, minimalist aesthetic allows it to blend into the background. If it served me an espresso at a café, I'm not certain I would look up from my phone. The robot weighs just sixty-six pounds, and I was able to pick it up in a bridal carry. It communicates through a speaker in its chest, using several different voices; the default one is in a calm but authoritative masculine register, an A.I.-modulated mixture of several voice actors. Neo can talk, listen, and respond to commands.


Amazon Zoox's latest robotaxi looks (marginally) less like a toy car

Engadget

Amazon Zoox's latest robotaxi looks (marginally) less like a toy car Amazon Zoox's latest robotaxi looks (marginally) less like a toy car The company said it will soon begin large-scale production of its autonomous vehicle. Zoox, the self-driving startup that Amazon purchased in 2020, has showed off the new version of its autonomous vehicle that it says was designed for large-scale production. While it still looks like the old version the company introduced in 2020, the new vehicle comes with changes that improve its comfort for riders and make it easier to interact with. The company relocated the vehicle's bidirectional reflectors for better visibility and made them rotate colors to better distinguish its front from its rear, seeing as the robotaxi has a boxy form factor. It also gave the speaker and microphone on the door two-way audio capabilities to enable communication between riders and road users, as well as between first responders and Zoox support.


First global rules adopted for self-driving cars, U.N. says

The Japan Times

First global rules adopted for self-driving cars, U.N. says Safety concerns and costs have long slowed progress on autonomous vehicles. The first global regulations for fully autonomous vehicles were adopted Wednesday, a U.N. agency said, establishing uniform international safety requirements that could pave the way for larger-scale rollouts of self-driving cars. Safety concerns and the cost of developing next-level systems have long slowed progress on autonomous vehicles. As self-driving cars have begun to hit the road in a growing number of cities, the fragmented national approaches to regulation have spurred manufacturer fears that vehicles developed for one market could be blocked from others. In a bid to address that issue, a meeting of the World Forum for Harmonization of Vehicle Regulations at the United Nations in Geneva decided to introduce a global regulatory framework for vehicles equipped with fully autonomous driving systems (ADS).


At least three killed in drone strikes in Russian controlled Horlivka

Al Jazeera

Is the war entering a new phase? A multi-storey apartment has been hit with what Russian-installed authorities called a Ukrainian drone strike in the Donetsk region. The area is in a Russian-controlled part of Ukraine. European heatwave, scorching weather triggers UK'red' warning Rubio: US'completely aligned' with Gulf allies on Iran


Multimodal Causal Reasoning for UAVObject Detection

Neural Information Processing Systems

Unmanned Aerial Vehicle (UAV) object detection faces significant challenges due to complex environmental conditions and different imaging conditions. These factors introduce significant changes in scale and appearance, particularly for small objects that occupy limited pixels and exhibit limited information, complicating detection tasks. To address these challenges, we propose a Multimodel Causal Reasoning framework based on YOLO backbone for UAVObject Detection (MCR-UOD). The key idea is to use the backdoor adjustment to discover the condition-invariant object representation for easy detection. Specifically, the YOLO backbone is first adjusted to incorporate the pre-trained vision-language model.


DrivingRecon: Large 4DGaussian Reconstruction Model For Autonomous Driving

Neural Information Processing Systems

Large reconstruction model has remarkable progress, which can directly predict 3D or 4D representations for unseen scenes and objects. However, current work has not systematically explored the potential of large reconstruction models in the field of autonomous driving.


Reliable World Simulation for Autonomous Driving

Neural Information Processing Systems

How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data with expert trajectories, struggle to represent hazardous or non-expert behaviors that are rare in training corpus. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e.g., CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring a diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of diverse openworld driving scenarios under various actions, including hazardous non-expert ones. To close the gap between high-fidelity simulation and applications that require reward signals to judge different actions, we introduce a Video2Reward module that estimates a reward from ReSim's simulated future. Our ReSim paradigm achieves up to 44% higher visual fidelity, improves controllability for both expert and non-expert actions by over 50%, and boosts planning and policy selection performance on NAVSIM by 2% and 25%, respectively.


DINO-Foresight: Looking into the Future with DINO

Neural Information Processing Systems

Predicting future dynamics is crucial for applications like autonomous driving and robotics, where understanding the environment is key. Existing pixel-level methods are computationally expensive and often focus on irrelevant details. To address these challenges, we introduce DINO-Foresight, a novel framework that operates in the semantic feature space of pretrained Vision Foundation Models (VFMs). Our approach trains a masked feature transformer in a self-supervised manner to predict the evolution of VFM features over time. By forecasting these features, we can apply off-the-shelf, task-specific heads for various scene understanding tasks. In this framework, VFM features are treated as a latent space, to which different heads attach to perform specific tasks for future-frame analysis. Extensive experiments show the very strong performance, robustness and scalability of our framework.


Towards Reasoning Centric Benchmark for Aerial Anomaly Understanding

Neural Information Processing Systems

While unmanned aerial vehicles (UAVs) offer wide-area, high-altitude coverage for anomaly detection, they face challenges such as dynamic viewpoints, scale variations, and complex scenes. Existing datasets and methods, mainly designed for fixed ground-level views, struggle to adapt to these conditions, leading to significant performance drops in drone-view scenarios. To bridge this gap, we introduce A2Seek (Aerial Anomaly Seek), a large-scale, reasoning-centric benchmark dataset for aerial anomaly understanding. This dataset covers various scenarios and environmental conditions, providing high-resolution real-world aerial videos with detailed annotations, including anomaly categories, frame-level timestamps, region-level bounding boxes, and natural language explanations for causal reasoning. Building on this dataset, we propose A2Seek-R1, a novel reasoning framework that generalizes R1-style strategies to aerial anomaly understanding, enabling a deeper understanding of "Where" anomalies occur and "Why" they happen in aerial frames.


STSBench: ASpatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Neural Information Processing Systems

We introduce STSBench, a scenario-based framework to benchmark the holistic understanding of vision-language models (VLMs) for autonomous driving. The framework automatically mines predefined traffic scenarios from any dataset using ground-truth annotations, provides an intuitive user interface for efficient human verification, and generates multiple-choice questions for model evaluation. Applied to the nuScenes dataset, we present STSnu, the first benchmark that evaluates the spatio-temporal reasoning capabilities of VLMs based on comprehensive 3D perception. Existing benchmarks typically target off-the-shelf or fine-tuned VLMs for images or videos from a single viewpoint, focusing on semantic tasks such as object recognition, dense captioning, risk assessment, or scene understanding. In contrast, STSnu evaluates driving expert VLMs for end-to-end driving, operating on videos from multi-view cameras or LiDAR. It specifically assesses their ability to reason about both ego-vehicle actions and complex interactions among traffic participants, a crucial capability for autonomous vehicles.