AITopics | Schwager, Mac

Collaborating Authors

Schwager, Mac

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

Chen, Qianzhong, Sun, Jiankai, Gao, Naixiang, Low, JunEn, Chen, Timothy, Schwager, Mac

arXiv.org Artificial IntelligenceMar-5-2025

Autonomous visual navigation is an essential element in robot autonomy. Reinforcement learning (RL) offers a promising policy training paradigm. However existing RL methods suffer from high sample complexity, poor sim-to-real transfer, and limited runtime adaptability to navigation scenarios not seen during training. These problems are particularly challenging for drones, with complex nonlinear and unstable dynamics, and strong dynamic coupling between control and perception. In this paper, we propose a novel framework that integrates 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning (DDRL) to train vision-based drone navigation policies. By leveraging high-fidelity 3D scene representations and differentiable simulation, our method improves sample efficiency and sim-to-real transfer. Additionally, we incorporate a Context-aided Estimator Network (CENet) to adapt to environmental variations at runtime. Moreover, by curriculum training in a mixture of different surrounding environments, we achieve in-task generalization, the ability to solve new instances of a task not seen during training. Drone hardware experiments demonstrate our method's high training efficiency compared to state-of-the-art RL methods, zero shot sim-to-real transfer for real robot deployment without fine tuning, and ability to adapt to new instances within the same task class (e.g. to fly through a gate at different locations with different distractors in the environment).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2503.03984

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.82)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps

Shorinwa, Ola, Sun, Jiankai, Schwager, Mac, Majumdar, Anirudha

arXiv.org Artificial IntelligenceFeb-10-2025

We present SIREN for registration of multi-robot Gaussian Splatting (GSplat) maps, with zero access to camera poses, images, and inter-map transforms for initialization or fusion of local submaps. To realize these capabilities, SIREN harnesses the versatility and robustness of semantics in three critical ways to derive a rigorous registration pipeline for multi-robot GSplat maps. First, SIREN utilizes semantics to identify feature-rich regions of the local maps where the registration problem is better posed, eliminating the need for any initialization which is generally required in prior work. Second, SIREN identifies candidate correspondences between Gaussians in the local maps using robust semantic features, constituting the foundation for robust geometric optimization, coarsely aligning 3D Gaussian primitives extracted from the local maps. Third, this key step enables subsequent photometric refinement of the transformation between the submaps, where SIREN leverages novel-view synthesis in GSplat maps along with a semantics-based image filter to compute a high-accuracy non-rigid transformation for the generation of a high-fidelity fused map. We demonstrate the superior performance of SIREN compared to competing baselines across a range of real-world datasets, and in particular, across the most widely-used robot hardware platforms, including a manipulator, drone, and quadruped. In our experiments, SIREN achieves about 90x smaller rotation errors, 300x smaller translation errors, and 44x smaller scale errors in the most challenging scenes, where competing methods struggle. We will release the code and provide a link to the project page after the review process.

artificial intelligence, machine learning, registration, (18 more...)

arXiv.org Artificial Intelligence

2502.06519

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting

Yu, Javier, Chen, Timothy, Schwager, Mac

arXiv.org Artificial IntelligenceJan-23-2025

3D Gaussian Splatting offers expressive scene reconstruction, modeling a broad range of visual, geometric, and semantic information. However, efficient real-time map reconstruction with data streamed from multiple robots and devices remains a challenge. To that end, we propose HAMMER, a server-based collaborative Gaussian Splatting method that leverages widely available ROS communication infrastructure to generate 3D, metric-semantic maps from asynchronous robot data-streams with no prior knowledge of initial robot positions and varying on-device pose estimators. HAMMER consists of (i) a frame alignment module that transforms local SLAM poses and image data into a global frame and requires no prior relative pose knowledge, and (ii) an online module for training semantic 3DGS maps from streaming data. HAMMER handles mixed perception modes, adjusts automatically for variations in image pre-processing among different devices, and distills CLIP semantic codes into the 3D scene for open-vocabulary language queries. In our real-world experiments, HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is useful for downstream tasks, such as semantic goal-conditioned navigation (e.g., ``go to the couch"). Accompanying content available at hammer-project.github.io.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2501.14147

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

Learning Robot Safety from Sparse Human Feedback using Conformal Prediction

Feldman, Aaron O., Vincent, Joseph A., Adang, Maximilian, Low, Jun En, Schwager, Mac

arXiv.org Artificial IntelligenceJan-8-2025

Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.

data mining, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2501.04823

Country: North America > United States > California (0.67)

Genre: Research Report (0.51)

Industry:

Energy (0.48)
Transportation (0.46)
Aerospace & Defense (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

Low, JunEn, Adang, Maximilian, Yu, Javier, Nagami, Keiko, Schwager, Mac

arXiv.org Artificial IntelligenceDec-20-2024

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only on-board perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k observation-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level body rate and thrust commands at 20Hz onboard a drone. Crucially, SV-Net includes a Rapid Motor Adaptation (RMA) module that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2412.16346

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Transportation > Air (1.00)
Energy > Oil & Gas (0.70)
Leisure & Entertainment (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Get a Grip: Multi-Finger Grasp Evaluation at Scale Enables Robust Sim-to-Real Transfer

Lum, Tyler Ga Wei, Li, Albert H., Culbertson, Preston, Srinivasan, Krishnan, Ames, Aaron D., Schwager, Mac, Bohg, Jeannette

arXiv.org Artificial IntelligenceOct-31-2024

This work explores conditions under which multi-finger grasping algorithms can attain robust sim-to-real transfer. While numerous large datasets facilitate learning generative models for multi-finger grasping at scale, reliable real-world dexterous grasping remains challenging, with most methods degrading when deployed on hardware. An alternate strategy is to use discriminative grasp evaluation models for grasp selection and refinement, conditioned on real-world sensor measurements. This paradigm has produced state-of-the-art results for vision-based parallel-jaw grasping, but remains unproven in the multi-finger setting. In this work, we find that existing datasets and methods have been insufficient for training discriminitive models for multi-finger grasping. To train grasp evaluators at scale, datasets must provide on the order of millions of grasps, including both positive and negative examples, with corresponding visual data resembling measurements at inference time. To that end, we release a new, open-source dataset of 3.5M grasps on 4.3K objects annotated with RGB images, point clouds, and trained NeRFs. Leveraging this dataset, we train vision-based grasp evaluators that outperform both analytic and generative modeling-based baselines on extensive simulated and real-world trials across a diverse range of objects. We show via numerous ablations that the key factor for performance is indeed the evaluator, and that its quality degrades as the dataset shrinks, demonstrating the importance of our new dataset. Project website at: https://sites.google.com/view/get-a-grip-dataset.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.23701

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration

Shorinwa, Ola, Devlin, Matthew, Hawkes, Elliot W., Schwager, Mac

arXiv.org Artificial IntelligenceOct-30-2024

We present DisCo, a distributed algorithm for contact-rich, multi-robot tasks. DisCo is a distributed contact-implicit trajectory optimization algorithm, which allows a group of robots to optimize a time sequence of forces to objects and to their environment to accomplish tasks such as collaborative manipulation, robot team sports, and modular robot locomotion. We build our algorithm on a variant of the Alternating Direction Method of Multipliers (ADMM), where each robot computes its own contact forces and contact-switching events from a smaller single-robot, contact-implicit trajectory optimization problem, while cooperating with other robots through dual variables, enforcing constraints between robots. Each robot iterates between solving its local problem, and communicating over a wireless mesh network to enforce these consistency constraints with its neighbors, ultimately converging to a coordinated plan for the group. The local problems solved by each robot are significantly less challenging than a centralized problem with all robots' contact forces and switching events, improving the computational efficiency, while also preserving the privacy of some aspects of each robot's operation. We demonstrate the effectiveness of our algorithm in simulations of collaborative manipulation, multi-robot team sports scenarios, and in modular robot locomotion, where DisCo achieves $3$x higher success rates with a 2.5x to 5x faster computation time. Further, we provide results of hardware experiments on a modular truss robot, with three collaborating truss nodes planning individually while working together to produce a punctuated rolling-gate motion of the composite structure. Videos are available on the project page: https://disco-opt.github.io.

artificial intelligence, optimization problem, robot, (15 more...)

arXiv.org Artificial Intelligence

2410.23283

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Sun, Jiankai, Curtis, Aidan, You, Yang, Xu, Yan, Koehle, Michael, Guibas, Leonidas, Chitta, Sachin, Schwager, Mac, Li, Hui

arXiv.org Artificial IntelligenceSep-24-2024

Generalizable long-horizon robotic assembly requires reasoning at multiple levels of abstraction. End-to-end imitation learning (IL) has been proven a promising approach, but it requires a large amount of demonstration data for training and often fails to meet the high-precision requirement of assembly tasks. Reinforcement Learning (RL) approaches have succeeded in high-precision assembly tasks, but suffer from sample inefficiency and hence, are less competent at long-horizon tasks. To address these challenges, we propose a hierarchical modular approach, named ARCH (Adaptive Robotic Composition Hierarchy), which enables long-horizon high-precision assembly in contact-rich settings. ARCH employs a hierarchical planning framework, including a low-level primitive library of continuously parameterized skills and a high-level policy. The low-level primitive library includes essential skills for assembly tasks, such as grasping and inserting. These primitives consist of both RL and model-based controllers. The high-level policy, learned via imitation learning from a handful of demonstrations, selects the appropriate primitive skills and instantiates them with continuous input parameters. We extensively evaluate our approach on a real robot manipulation platform. We show that while trained on a single task, ARCH generalizes well to unseen tasks and outperforms baseline methods in terms of success rate and data efficiency. Videos can be found at https://long-horizon-assembly.github.io.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2409.16451

Country: North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.48)

Add feedback

Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Shorinwa, Ola, Tucker, Johnathan, Smith, Aliyah, Swann, Aiden, Chen, Timothy, Firoozi, Roya, Kennedy, Monroe III, Schwager, Mac

arXiv.org Artificial IntelligenceJun-8-2024

We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills semantic and grasp affordance features into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical in many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a "digital twin" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose affordance-aligned candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks and in four multi-stage manipulation tasks, using the edited scene to reflect changes due to prior manipulation stages, which is not possible with existing baselines. The project page is available at https://splatmover.github.io, and the code for the project will be made available after review.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2405.04378

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback

Out-of-Distribution Runtime Adaptation with Conformalized Neural Network Ensembles

Contreras, Polo, Shorinwa, Ola, Schwager, Mac

arXiv.org Artificial IntelligenceJun-4-2024

We present a method to integrate real-time out-of-distribution (OOD) detection for neural network trajectory predictors, and to adapt the control strategy of a robot (e.g., a self-driving car or drone) to preserve safety while operating in OOD regimes. Specifically, we use a neural network ensemble to predict the trajectory for a dynamic obstacle (such as a pedestrian), and use the maximum singular value of the empirical covariance among the ensemble as a signal for OOD detection. We calibrate this signal with a small fraction of held-out training data using the methodology of conformal prediction, to derive an OOD detector with probabilistic guarantees on the false-positive rate of the detector, given a user-specified confidence level. During in-distribution operation, we use an MPC controller to avoid collisions with the obstacle based on the trajectory predicted by the neural network ensemble. When OOD conditions are detected, we switch to a reachability-based controller to guarantee safety under the worst-case actions of the obstacle. We verify our method in extensive autonomous driving simulations in a pedestrian crossing scenario, showing that our OOD detector obtains the desired accuracy rate within a theoretically-predicted range. We also demonstrate the effectiveness of our method with real pedestrian data. We show improved safety and less conservatism in comparison with two state-of-the-art methods that also use conformal prediction, but without OOD adaptation.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2406.02436

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback