AITopics | Wen, Bowen

Collaborating Authors

Wen, Bowen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Any6D: Model-free 6D Pose Estimation of Novel Objects

Lee, Taeyeop, Wen, Bowen, Kang, Minjun, Kang, Gyuree, Kweon, In So, Yoon, Kuk-Jin

arXiv.org Artificial IntelligenceMar-24-2025

We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a render-and-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, Toyota-Light, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation. Project page: https://taeyeop.com/any6d

artificial intelligence, estimation, video understanding, (16 more...)

arXiv.org Artificial Intelligence

2503.18673

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.91)

Add feedback

FoundationStereo: Zero-Shot Stereo Matching

Wen, Bowen, Trepte, Matthew, Aribido, Joseph, Kautz, Jan, Gallo, Orazio, Birchfield, Stan

arXiv.org Artificial IntelligenceJan-21-2025

Tremendous progress has been made in deep stereo matching to excel on benchmark datasets through per-domain fine-tuning. However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching. We introduce FoundationStereo, a foundation model for stereo depth estimation designed to achieve strong zero-shot generalization. To this end, we first construct a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism, followed by an automatic self-curation pipeline to remove ambiguous samples. We then design a number of network architecture components to enhance scalability, including a side-tuning feature backbone that adapts rich monocular priors from vision foundation models to mitigate the sim-to-real gap, and long-range context reasoning for effective cost volume filtering. Together, these components lead to strong robustness and accuracy across domains, establishing a new standard in zero-shot stereo depth estimation. Project page: https://nvlabs.github.io/FoundationStereo/

computer vision, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.09898

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

Hsu, Cheng-Chun, Wen, Bowen, Xu, Jie, Narang, Yashraj, Wang, Xiaolong, Zhu, Yuke, Biswas, Joydeep, Birchfield, Stan

arXiv.org Artificial IntelligenceNov-1-2024

We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We show improvement compared to prior work on RLBench simulated tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion

artificial intelligence, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2411.00965

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment

Garrett, Caelan, Mandlekar, Ajay, Wen, Bowen, Fox, Dieter

arXiv.org Artificial IntelligenceOct-24-2024

Imitation learning from human demonstrations is an effective paradigm for robot manipulation, but acquiring large datasets is costly and resource-intensive, especially for long-horizon tasks. To address this issue, we propose SkillMimicGen (SkillGen), an automated system for generating demonstration datasets from a few human demos. SkillGen segments human demos into manipulation skills, adapts these skills to new contexts, and stitches them together through free-space transit and transfer motion. We also propose a Hybrid Skill Policy (HSP) framework for learning skill initiation, control, and termination components from SkillGen datasets, enabling skills to be sequenced using motion planning at test-time. We demonstrate that SkillGen greatly improves data generation and policy learning performance over a state-of-the-art data generation framework, resulting in the capability to produce data for large scene variations, including clutter, and agents that are on average 24% more successful. We demonstrate the efficacy of SkillGen by generating over 24K demonstrations across 18 task variants in simulation from just 60 human demonstrations, and training proficient, often near-perfect, HSP agents. Finally, we apply SkillGen to 3 real-world manipulation tasks and also demonstrate zero-shot sim-to-real transfer on a long-horizon assembly task. Videos, and more at https://skillgen.github.io.

artificial intelligence, demonstration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.18907

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Tang, Bingjie, Akinola, Iretiayo, Xu, Jie, Wen, Bowen, Handa, Ankur, Van Wyk, Karl, Fox, Dieter, Sukhatme, Gaurav S., Ramos, Fabio, Narang, Yashraj

arXiv.org Artificial IntelligenceJul-10-2024

Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world, along with parallelized simulation environments for policy learning, 2) a novel simulation-based approach for learning specialist (i.e., part-specific) policies and generalist (i.e., unified) assembly policies, 3) demonstrations of specialist policies that individually solve 80 assemblies with 80% or higher success rates in simulation, as well as a generalist policy that jointly solves 20 assemblies with an 80%+ success rate, and 4) zero-shot sim-to-real transfer that achieves similar (or better) performance than simulation, including on perception-initialized assembly. The key methodological takeaway is that a union of diverse algorithms from manufacturing engineering, character animation, and time-series analysis provides a generic and robust solution for a diverse range of robotic assembly problems.To our knowledge, AutoMate provides the first simulation-based framework for learning specialist and generalist policies over a wide range of assemblies, as well as the first system demonstrating zero-shot sim-to-real transfer over such a range.

artificial intelligence, assembly, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2407.08028

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

Tang, Zhenggang, Ren, Zhongzheng, Zhao, Xiaoming, Wen, Bowen, Tremblay, Jonathan, Birchfield, Stan, Schwing, Alexander

arXiv.org Artificial IntelligenceJun-15-2024

We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel correspondence algorithm that first matches RGB-based pairs, then leverages multi-view information and 3D reprojection to robustly filter false positives in two steps. We also introduce a new dataset for exploring the problem of modifying a NeRF scene through a single observation. Our dataset ( https://github.com/nerfdeformer/nerfdeformer ) contains 113 synthetic scenes leveraging 47 3D assets. We show that our proposed method outperforms NeRF editing methods as well as diffusion-based methods, and we also explore different methods for filtering correspondences.

artificial intelligence, machine learning, transformation, (14 more...)

arXiv.org Artificial Intelligence

2406.10543

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Weng, Yijia, Wen, Bowen, Tremblay, Jonathan, Blukis, Valts, Fox, Dieter, Guibas, Leonidas, Birchfield, Stan

arXiv.org Artificial IntelligenceJun-6-2024

We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associate the two states. By explicitly modeling point-level correspondences and exploiting cues from images, 3D reconstructions, and kinematics, our method yields more accurate and stable results compared to prior work. It also handles more than one movable part and does not rely on any object shape or structure priors. Project page: https://github.com/NVlabs/DigitalTwinArt

artificial intelligence, machine learning, reconstruction, (15 more...)

arXiv.org Artificial Intelligence

2404.0144

Country:

Europe > Netherlands (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

Localization and Perception for Control of a Low Speed Autonomous Shuttle in a Campus Pilot Deployment

Wen, Bowen

arXiv.org Artificial IntelligenceApr-2-2024

Future SAE Level 4 and Level 5 autonomous vehicles will require novel applications of localization, perception, control and artificial intelligence technology in order to offer innovative and disruptive solutions to current mobility problems. Accurate localization is essential for self driving vehicle navigation in GPS inaccessible environments. This thesis concentrates on low speed autonomous shuttles that are mainly utilized for university campus intelligent transportation systems and presents initial results of ongoing work on developing solutions to the localization and perception challenges of a university planned pilot deployment orientated application. The paper treats autonomous driving with real time kinematics GPS (Global Positioning Systems) with an inertial measurement unit (IMU), combined with simultaneous localization and mapping (SLAM) with threedimensional light detection and ranging (LIDAR) sensor, which provides solutions to scenarios where GPS is not available or a lower cost and hence lower accuracy GPS is desirable. The in-house automated low speed electric vehicle from the Automated Driving Lab is used in experimental evaluation and verification. An improved version of Hector SLAM was implemented on ROS and compared with high resolution GPS aided localization framework in the same hardware architecture. The overall configuration that combines ROS with DSpace controller can be easily transplantable prototype in other ii hardware architectures for future similar research. Real-world experiments that are reported here have been conducted in a small test area close to the Ohio State University AV pilot test route.

algorithm, artificial intelligence, vehicle, (16 more...)

arXiv.org Artificial Intelligence

2407.0082

Country: North America > United States > Ohio (0.34)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Wen, Bowen, Yang, Wei, Kautz, Jan, Birchfield, Stan

arXiv.org Artificial IntelligenceDec-13-2023

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.08344

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

Mandlekar, Ajay, Nasiriany, Soroush, Wen, Bowen, Akinola, Iretiayo, Narang, Yashraj, Fan, Linxi, Zhu, Yuke, Fox, Dieter

arXiv.org Artificial IntelligenceOct-26-2023

Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents. However, the demonstrations can be extremely costly and time-consuming to collect. We introduce MimicGen, a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use MimicGen to generate over 50K demonstrations across 18 tasks with diverse scene configurations, object instances, and robot arms from just ~200 human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks, such as multi-part assembly and coffee preparation, across broad initial state distributions. We further demonstrate that the effectiveness and utility of MimicGen data compare favorably to collecting additional human demonstrations, making it a powerful and economical approach towards scaling up robot learning. Datasets, simulation environments, videos, and more at https://mimicgen.github.io .

artificial intelligence, data generation system, scalable robot learning, (2 more...)

arXiv.org Artificial Intelligence

2310.17596

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback