AITopics | Fox, Dieter

Plotting

Fox, Dieter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

Fang, Xiaolin, Garrett, Caelan Reed, Eppner, Clemens, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack, Fox, Dieter

arXiv.org Artificial IntelligenceOct-3-2023

Task and Motion Planning (TAMP) approaches are effective at planning long-horizon autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by leveraging deep generative modeling, specifically diffusion models, to learn constraints and samplers that capture these difficult-to-engineer aspects of the planning model. These learned samplers are composed and combined within a TAMP solver in order to find action parameter values jointly that satisfy the constraints along a plan. To tractably make predictions for unseen objects in the environment, we define these samplers on low-dimensional learned latent embeddings of changing object state. We evaluate our approach in an articulated object manipulation domain and show how the combination of classical TAMP, generative learning, and latent embeddings enables long-horizon constraint-based reasoning. We also apply the learned sampler in the real world. More details are available at https://sites.google.com/view/dimsam-tamp

artificial intelligence, constraint-based reasoning, task and motion planning, (4 more...)

arXiv.org Artificial Intelligence

2306.13196

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback

Constrained Generative Sampling of 6-DoF Grasps

Lundell, Jens, Verdoja, Francesco, Le, Tran Nguyen, Mousavian, Arsalan, Fox, Dieter, Kyrki, Ville

arXiv.org Artificial IntelligenceAug-17-2023

Most state-of-the-art data-driven grasp sampling methods propose stable and collision-free grasps uniformly on the target object. For bin-picking, executing any of those reachable grasps is sufficient. However, for completing specific tasks, such as squeezing out liquid from a bottle, we want the grasp to be on a specific part of the object's body while avoiding other locations, such as the cap. This work presents a generative grasp sampling network, VCGS, capable of constrained 6 Degrees of Freedom (DoF) grasp sampling. In addition, we also curate a new dataset designed to train and evaluate methods for constrained grasping. The new dataset, called CONG, consists of over 14 million training samples of synthetically rendered point clouds and grasps at random target areas on 2889 objects. VCGS is benchmarked against GraspNet, a state-of-the-art unconstrained grasp sampler, in simulation and on a real robot. The results demonstrate that VCGS achieves a 10-15% higher grasp success rate than the baseline while being 2-3 times as sample efficient. Supplementary material is available on our project website.

artificial intelligence, machine learning, target area, (13 more...)

arXiv.org Artificial Intelligence

2302.10745

Country:

Europe (0.28)
Asia (0.28)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

One-Shot Neural Fields for 3D Object Understanding

Blukis, Valts, Lee, Taeyeop, Tremblay, Jonathan, Wen, Bowen, Kweon, In So, Yoon, Kuk-Jin, Fox, Dieter, Birchfield, Stan

arXiv.org Artificial IntelligenceAug-8-2023

We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from a single RGB input image at test time by leveraging recent advances in Neural Radiance Fields (NeRF) that learn category-level priors on large multiview datasets, then fine-tune on novel objects from one or few views. We expand the NeRF model for additional grasp outputs and explore ways to leverage this representation for robotics. At test-time, we build the representation from a single RGB input image observing the scene from only one viewpoint. We find that the recovered representation allows rendering from novel views, including of occluded object parts, and also for predicting successful stable grasps. Grasp poses can be directly decoded from our latent representation with an implicit grasp decoder. We experimented in both simulation and real world and demonstrated the capability for robust robotic grasping using such compact representation. Website: https://nerfgrasp.github.io

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2210.12126

Country: Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

Qin, Yuzhe, Yang, Wei, Huang, Binghao, Van Wyk, Karl, Su, Hao, Wang, Xiaolong, Chao, Yu-Wei, Fox, Dieter

arXiv.org Artificial IntelligenceAug-2-2023

Figure 1: We present AnyTeleop, a vision-based teleoperation system for a variety of scenarios to solve a wide range of manipulation tasks. AnyTeleop can be used for various robot arms with different robot hands. It also supports teleoperation within different realities, such as IsaacGym (top row), and SAPIEN simulator (middle row), and real world (bottom rows). Abstract--Vision-based teleoperation offers the possibility experiments, AnyTeleop can outperform a previous system that to endow robots with human-level intelligence to physically was designed for a specific robot hardware with a higher interact with the environment, while only requiring low-cost success rate, using the same robot. However, current vision-based teleoperation AnyTeleop leads to better imitation learning performance, systems are designed and engineered towards a particular robot compared with a previous system that is particularly designed model and deploy environment, which scales poorly as the pool for that simulator. of the robot models expands and the variety of the operating environment increases. They can adapt Reality (VR) devices [4, 17, 15], wearable gloves [29, 30], to new robots given only the kinematic model, i.e., URDF handheld controller [47, 48, 20], haptic sensors [12, 23, files. Second, we develop a web-based viewer compatible 52, 55], or motion capture trackers [68]. Fortunately, recent with standard browsers, to achieve simulator-agnostic visualization developments in vision-based teleoperation [2, 24, 16, 26, and enable remote teleoperation across the internet.

artificial intelligence, human computer interaction, teleoperation, (16 more...)

arXiv.org Artificial Intelligence

2307.04577

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.90)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.66)

Add feedback

Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

Simeonov, Anthony, Goyal, Ankit, Manuelli, Lucas, Yen-Chen, Lin, Sarmiento, Alina, Rodriguez, Alberto, Agrawal, Pulkit, Fox, Dieter

arXiv.org Artificial IntelligenceJul-10-2023

We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/

artificial intelligence, machine learning, point cloud, (17 more...)

arXiv.org Artificial Intelligence

2307.04751

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Meta-Policy Learning over Plan Ensembles for Robust Articulated Object Manipulation

Chamzas, Constantinos, Garrett, Caelan, Sundaralingam, Balakumar, Kavraki, Lydia E., Fox, Dieter

arXiv.org Artificial IntelligenceJul-8-2023

Recent work has shown that complex manipulation skills, such as pushing or pouring, can be learned through state-of-the-art learning based techniques, such as Reinforcement Learning (RL). However, these methods often have high sample-complexity, are susceptible to domain changes, and produce unsafe motions that a robot should not perform. On the other hand, purely geometric model-based planning can produce complex behaviors that satisfy all the geometric constraints of the robot but might not be dynamically feasible for a given environment. In this work, we leverage a geometric model-based planner to build a mixture of path-policies on which a task-specific meta-policy can be learned to complete the task. In our results, we demonstrate that a successful meta-policy can be learned to push a door, while requiring little data and being robust to model uncertainty of the environment. We tested our method on a 7-DOF Franka-Emika Robot pushing a cabinet door in simulation.

artificial intelligence, machine learning, world model, (16 more...)

arXiv.org Artificial Intelligence

2307.0404

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

RVT: Robotic View Transformer for 3D Object Manipulation

Goyal, Ankit, Xu, Jie, Guo, Yijie, Blukis, Valts, Chao, Yu-Wei, Fox, Dieter

arXiv.org Artificial IntelligenceJun-26-2023

For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace. In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct). It also trains 36X faster than PerAct for achieving the same performance and achieves 2.3X the inference speed of PerAct. Further, RVT can perform a variety of manipulation tasks in the real world with just a few ($\sim$10) demonstrations per task. Visual results, code, and trained model are provided at https://robotic-view-transformer.github.io/.

artificial intelligence, machine learning, orth, (13 more...)

arXiv.org Artificial Intelligence

2306.14896

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AR2-D2:Training a Robot Without a Robot

Duan, Jiafei, Wang, Yi Ru, Shridhar, Mohit, Fox, Dieter, Krishna, Ranjay

arXiv.org Artificial IntelligenceJun-23-2023

Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot. AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object while simultaneously capturing essential data modalities for training a real robot. We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects. Our experiments further show that training with our AR data is as effective as training with real-world robot demonstrations. Moreover, our user study indicates that users find AR2-D2 intuitive to use and require no training in contrast to four other frequently employed methods for collecting robot demonstrations.

artificial intelligence, demonstration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.13818

Genre: Research Report > Experimental Study (0.47)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.47)

Add feedback

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Meng, Xiangyun, Hatch, Nathan, Lambert, Alexander, Li, Anqi, Wagener, Nolan, Schmittle, Matthew, Lee, JoonHo, Yuan, Wentao, Chen, Zoey, Deng, Samuel, Okopal, Greg, Fox, Dieter, Boots, Byron, Shaban, Amirreza

arXiv.org Artificial IntelligenceMay-29-2023

Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.15771

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.49)
Information Technology (0.48)
Automobiles & Trucks (0.48)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(2 more...)

Add feedback

IndustReal: Transferring Contact-Rich Assembly Tasks from Simulation to Reality

Tang, Bingjie, Lin, Michael A., Akinola, Iretiayo, Handa, Ankur, Sukhatme, Gaurav S., Ramos, Fabio, Fox, Dieter, Narang, Yashraj

arXiv.org Artificial IntelligenceMay-26-2023

Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on assembly. We present IndustReal, a set of algorithms, systems, and tools that solve assembly tasks in simulation with reinforcement learning (RL) and successfully achieve policy transfer to the real world. Specifically, we propose 1) simulation-aware policy updates, 2) signed-distance-field rewards, and 3) sampling-based curricula for robotic RL agents. We use these algorithms to enable robots to solve contact-rich pick, place, and insertion tasks in simulation. We then propose 4) a policy-level action integrator to minimize error at policy deployment time. We build and demonstrate a real-world robotic assembly system that uses the trained policies and action integrator to achieve repeatable performance in the real world. Finally, we present hardware and software tools that allow other researchers to reproduce our system and results. For videos and additional details, please see http://sites.google.com/nvidia.com/industreal .

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2305.1711

Country: North America > United States > California (0.14)

Genre: Research Report (0.63)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback