Goto

Collaborating Authors

 rive


LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation

Chang, Wei-Jer, Zhan, Wei, Tomizuka, Masayoshi, Chandraker, Manmohan, Pittaluga, Francesco

arXiv.org Artificial Intelligence

Evaluating autonomous vehicles with controllability enables scalable testing in counterfactual or structured settings, enhancing both efficiency and safety. We introduce LangTraj, a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios. By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors, generating nuanced and realistic scenarios. Unlike prior approaches that depend on domain-specific guidance functions, LangTraj incorporates language conditioning during training, facilitating more intuitive traffic simulation control. We propose a novel closed-loop training strategy for diffusion models, explicitly tailored to enhance stability and realism during closed-loop simulation. To support language-conditioned simulation, we develop Inter-Drive, a large-scale dataset with diverse and interactive labels for training language-conditioned diffusion models. Our dataset is built upon a scalable pipeline for annotating agent-agent interactions and single-agent behaviors, ensuring rich and varied supervision. Validated on the Waymo Open Motion Dataset, LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation, establishing a new paradigm for flexible and scalable autonomous vehicle testing. Project Website: https://langtraj.github.io/


DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Yang, Xuemeng, Wen, Licheng, Ma, Yukai, Mei, Jianbiao, Li, Xin, Wei, Tiantian, Lei, Wenjie, Fu, Daocheng, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu

arXiv.org Artificial Intelligence

This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fidelity conditional generative model with infinite autoregression. This powerful synergy empowers any driving agent capable of processing real-world images to navigate in DriveArena's simulated environment. The agent perceives its surroundings through images generated by World Dreamer and output trajectories. These trajectories are fed into Traffic Manager, achieving realistic interactions with other vehicles and producing a new scene layout. Finally, the latest scene layout is relayed back into World Dreamer, perpetuating the simulation cycle. This iterative process fosters closed-loop exploration within a highly realistic environment, providing a valuable platform for developing and evaluating driving agents across diverse and challenging scenarios. DriveArena signifies a substantial leap forward in leveraging generative image data for the driving simulation platform, opening insights for closed-loop autonomous driving. Code will be available soon on GitHub: https://github.com/PJLab-ADG/DriveArena


MagicDrive: Street View Generation with Diverse 3D Geometry Control

Gao, Ruiyuan, Chen, Kai, Xie, Enze, Hong, Lanqing, Li, Zhenguo, Yeung, Dit-Yan, Xu, Qiang

arXiv.org Artificial Intelligence

Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework offering diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.


AlphaFold's new rival? Meta AI predicts shape of 600 million proteins

#artificialintelligence

The ESM Metagenomic Atlas database contains structure predictions for 617 million proteins.Credit: ESM Metagenomic Atlas (CC BY 4.0) When London-based Deep Mind unveiled predicted structures for some 220 million proteins this year, it covered nearly every protein from known organisms in DNA databases. Now, another tech giant is filling in the dark matter of our protein universe. Researchers at Meta (formerly Facebook, headquartered in Menlo Park, California) have used artificial intelligence (AI) to predict the structures of some 600 million proteins from bacteria, viruses and other microbes that haven't been characterized. 'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures "These are the structures we know the least about. These are incredibly mysterious proteins. I think they offer the potential for great insight into biology," says Alexander Rives, the research lead for Meta AI's protein team.