AITopics | Shkurti, Florian

Collaborating Authors

Shkurti, Florian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SICNav-Diffusion: Safe and Interactive Crowd Navigation with Diffusion Trajectory Predictions

Samavi, Sepehr, Lem, Anthony, Sato, Fumiaki, Chen, Sirui, Gu, Qiao, Yano, Keijiro, Schoellig, Angela P., Shkurti, Florian

arXiv.org Artificial IntelligenceMar-11-2025

To navigate crowds without collisions, robots must interact with humans by forecasting their future motion and reacting accordingly. While learning-based prediction models have shown success in generating likely human trajectory predictions, integrating these stochastic models into a robot controller presents several challenges. The controller needs to account for interactive coupling between planned robot motion and human predictions while ensuring both predictions and robot actions are safe (i.e. collision-free). To address these challenges, we present a receding horizon crowd navigation method for single-robot multi-human environments. We first propose a diffusion model to generate joint trajectory predictions for all humans in the scene. We then incorporate these multi-modal predictions into a SICNav Bilevel MPC problem that simultaneously solves for a robot plan (upper-level) and acts as a safety filter to refine the predictions for non-collision (lower-level). Combining planning and prediction refinement into one bilevel problem ensures that the robot plan and human predictions are coupled. We validate the open-loop trajectory prediction performance of our diffusion model on the commonly used ETH/UCY benchmark and evaluate the closed-loop performance of our robot navigation method in simulation and extensive real-robot experiments demonstrating safe, efficient, and reactive robot motion.

artificial intelligence, prediction, robot, (14 more...)

arXiv.org Artificial Intelligence

2503.08858

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Promising Solution (0.68)

Industry: Energy > Oil & Gas (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

AnyPlace: Learning Generalized Object Placement for Robot Manipulation

Zhao, Yuchi, Bogdanovic, Miroslav, Luo, Chengyuan, Tohme, Steven, Darvish, Kourosh, Aspuru-Guzik, Alán, Shkurti, Florian, Garg, Animesh

arXiv.org Artificial IntelligenceFeb-6-2025

Object placement in robotic tasks is inherently challenging due to the diversity of object geometries and placement configurations. To address this, we propose AnyPlace, a two-stage method trained entirely on synthetic data, capable of predicting a wide range of feasible placement poses for real-world tasks. Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. For training, we generate a fully synthetic dataset of randomly generated objects in different placement configurations (insertion, stacking, hanging) and train local placement-prediction models. We conduct extensive evaluations in simulation, demonstrating that our method outperforms baselines in terms of success rate, coverage of possible placement modes, and precision. In real-world experiments, we show how our approach directly transfers models trained purely on synthetic data to the real world, where it successfully performs placements in scenarios where other models struggle -- such as with varying object geometries, diverse placement modes, and achieving high precision for fine placement. More at: https://any-place.github.io.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.04531

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.41)

Add feedback

Accelerating Discovery in Natural Science Laboratories with AI and Robotics: Perspectives and Challenges from the 2024 IEEE ICRA Workshop, Yokohama, Japan

Cooper, Andrew I., Courtney, Patrick, Darvish, Kourosh, Eckhoff, Moritz, Fakhruldeen, Hatem, Gabrielli, Andrea, Garg, Animesh, Haddadin, Sami, Harada, Kanako, Hein, Jason, Hübner, Maria, Knobbe, Dennis, Pizzuto, Gabriella, Shkurti, Florian, Shrestha, Ruja, Thurow, Kerstin, Vescovi, Rafael, Vogel-Heuser, Birgit, Wolf, Ádám, Yoshikawa, Naruki, Zeng, Yan, Zhou, Zhengxue, Zwirnmann, Henning

arXiv.org Artificial IntelligenceJan-12-2025

Fundamental breakthroughs across many scientific disciplines are becoming increasingly rare (1). At the same time, challenges related to the reproducibility and scalability of experiments, especially in the natural sciences (2,3), remain significant obstacles. For years, automating scientific experiments has been viewed as the key to solving this problem. However, existing solutions are often rigid and complex, designed to address specific experimental tasks with little adaptability to protocol changes. With advancements in robotics and artificial intelligence, new possibilities are emerging to tackle this challenge in a more flexible and human-centric manner.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.06847

Country:

North America (1.00)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.40)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Synthetica: Large Scale Synthetic Data for Robot Perception

Singh, Ritvik, Liu, Jingzhou, Van Wyk, Karl, Chao, Yu-Wei, Lafleche, Jean-Francois, Shkurti, Florian, Ratliff, Nathan, Handa, Ankur

arXiv.org Artificial IntelligenceOct-28-2024

Vision-based object detectors are a crucial basis for robotics applications as they provide valuable information about object localisation in the environment. These need to ensure high reliability in different lighting conditions, occlusions, and visual artifacts, all while running in real-time. Collecting and annotating real-world data for these networks is prohibitively time consuming and costly, especially for custom assets, such as industrial objects, making it untenable for generalization to in-the-wild scenarios. To this end, we present Synthetica, a method for large-scale synthetic data generation for training robust state estimators. This paper focuses on the task of object detection, an important problem which can serve as the front-end for most state estimation problems, such as pose estimation. Leveraging data from a photorealistic ray-tracing renderer, we scale up data generation, generating 2.7 million images, to train highly accurate real-time detection transformers. We present a collection of rendering randomization and training-time data augmentation techniques conducive to robust sim-to-real performance for vision tasks. We demonstrate state-of-the-art performance on the task of object detection while having detectors that run at 50-100Hz which is 9 times faster than the prior SOTA. We further demonstrate the usefulness of our training methodology for robotics applications by showcasing a pipeline for use in the real world with custom objects for which there do not exist prior datasets. Our work highlights the importance of scaling synthetic data generation for robust sim-to-real transfer while achieving the fastest real-time inference speeds. Videos and supplementary information can be found at this URL: https://sites.google.com/view/synthetica-vision.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.21153

Country: North America (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Automated Planning Domain Inference for Task and Motion Planning

Huang, Jinbang, Tao, Allen, Marco, Rozilyn, Bogdanovic, Miroslav, Kelly, Jonathan, Shkurti, Florian

arXiv.org Artificial IntelligenceOct-21-2024

Task and motion planning (TAMP) frameworks address long and complex planning problems by integrating high-level task planners with low-level motion planners. However, existing TAMP methods rely heavily on the manual design of planning domains that specify the preconditions and postconditions of all high-level actions. This paper proposes a method to automate planning domain inference from a handful of test-time trajectory demonstrations, reducing the reliance on human design. Our approach incorporates a deep learning-based estimator that predicts the appropriate components of a domain for a new task and a search algorithm that refines this prediction, reducing the size and ensuring the utility of the inferred domain. Our method is able to generate new domains from minimal demonstrations at test time, enabling robots to handle complex tasks more efficiently. We demonstrate that our approach outperforms behavior cloning baselines, which directly imitate planner behavior, in terms of planning performance and generalization across a variety of tasks. Additionally, our method reduces computational costs and data amount requirements at test time for inferring new planning domains.

artificial intelligence, machine learning, planning & scheduling, (20 more...)

arXiv.org Artificial Intelligence

2410.16445

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Gaussian Splatting Visual MPC for Granular Media Manipulation

Tseng, Wei-Cheng, Zhang, Ellina, Jatavallabhula, Krishna Murthy, Shkurti, Florian

arXiv.org Artificial IntelligenceOct-13-2024

Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2410.0974

Country: North America (0.46)

Genre: Research Report (0.85)

Industry:

Energy > Oil & Gas > Downstream (0.41)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Generating Transferable Adversarial Simulation Scenarios for Self-Driving via Neural Rendering

Abeysirigoonawardena, Yasasa, Xie, Kevin, Chen, Chuhan, Hosseini, Salar, Chen, Ruiting, Wang, Ruiqi, Shkurti, Florian

arXiv.org Artificial IntelligenceJan-23-2024

Self-driving software pipelines include components that are learned from a significant number of training examples, yet it remains challenging to evaluate the overall system's safety and generalization performance. Together with scaling up the real-world deployment of autonomous vehicles, it is of critical importance to automatically find simulation scenarios where the driving policies will fail. We propose a method that efficiently generates adversarial simulation scenarios for autonomous driving by solving an optimal control problem that aims to maximally perturb the policy from its nominal trajectory. Given an image-based driving policy, we show that we can inject new objects in a neural rendering representation of the deployment scene, and optimize their texture in order to generate adversarial sensor inputs to the policy. We demonstrate that adversarial scenarios discovered purely in the neural renderer (surrogate scene) can often be successfully transferred to the deployment scene, without further optimization. We demonstrate this transfer occurs both in simulated and real environments, provided the learned surrogate scene is sufficiently close to the deployment scene.

artificial intelligence, machine learning, simulator, (16 more...)

arXiv.org Artificial Intelligence

2309.1577

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.35)
Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization

Darvish, Kourosh, Skreta, Marta, Zhao, Yuchi, Yoshikawa, Naruki, Som, Sagnik, Bogdanovic, Miroslav, Cao, Yang, Hao, Han, Xu, Haoping, Aspuru-Guzik, Alán, Garg, Animesh, Shkurti, Florian

arXiv.org Artificial IntelligenceJan-12-2024

Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapting to new chemistry experiments. To address this issue, we propose a human-friendly and flexible robotic system, ORGANA, that automates a diverse set of chemistry experiments. It is capable of interacting with chemists in the lab through natural language, using Large Language Models (LLMs). ORGANA keeps scientists informed by providing timely reports that incorporate statistical analyses. Additionally, it actively engages with users when necessary for disambiguation or troubleshooting. ORGANA can reason over user input to derive experiment goals, and plan long sequences of both high-level tasks and low-level robot actions while using feedback from the visual perception of the environment. It also supports scheduling and parallel execution for experiments that require resource allocation and coordination between multiple robots and experiment stations. We show that ORGANA successfully conducts a diverse set of chemistry experiments, including solubility assessment, pH measurement, recrystallization, and electrochemistry experiments. For the latter, we show that ORGANA robustly executes a long-horizon plan, comprising 19 steps executed in parallel, to characterize the electrochemical properties of quinone derivatives, a class of molecules used in rechargeable flow batteries. Our user study indicates that ORGANA significantly improves many aspects of user experience while reducing their physical workload. More details about ORGANA can be found at https://ac-rad.github.io/organa/.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2401.06949

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Materials > Chemicals (1.00)
Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

Lee, Yewon, Huang, Philip, Jatavallabhula, Krishna Murthy, Li, Andrew Z., Damken, Fabian, Heiden, Eric, Smith, Kevin, Nowrouzezahrai, Derek, Ramos, Fabio, Shkurti, Florian

arXiv.org Artificial IntelligenceJan-7-2024

Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task and Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. This can be inefficient as the width of the tree can grow exponentially with the number of possible actions and objects. In this paper, we propose a novel approach to TAMP that relaxes discrete-and-continuous TAMP problems into inference problems on a continuous domain. Our method, Stein Task and Motion Planning (STAMP) subsequently solves this new problem using a gradient-based variational inference algorithm called Stein Variational Gradient Descent, by obtaining gradients from a parallelized differentiable physics simulator. By introducing relaxations to the discrete variables, leveraging parallelization, and approaching TAMP as an Bayesian inference problem, our method is able to efficiently find multiple diverse plans in a single optimization run. We demonstrate our method on two TAMP problems and benchmark them against existing TAMP baselines.

artificial intelligence, machine learning, stamp, (16 more...)

arXiv.org Artificial Intelligence

2310.01775

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

ConceptFusion: Open-set Multimodal 3D Mapping

Jatavallabhula, Krishna Murthy, Kuwajerwala, Alihusein, Gu, Qiao, Omama, Mohd, Chen, Tao, Maalouf, Alaa, Li, Shuang, Iyer, Ganesh, Saryazdi, Soroush, Keetha, Nikhil, Tewari, Ayush, Tenenbaum, Joshua B., de Melo, Celso Miguel, Krishna, Madhava, Paull, Liam, Shkurti, Florian, Torralba, Antonio

arXiv.org Artificial IntelligenceOct-23-2023

Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approaches that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent work, using text prompts. We address both these issues with ConceptFusion, a scene representation that is (1) fundamentally open-set, enabling reasoning beyond a closed set of concepts and (ii) inherently multimodal, enabling a diverse range of possible queries to the 3D map, from language, to images, to audio, to 3D geometry, all working in concert. ConceptFusion leverages the open-set capabilities of today's foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio. We demonstrate that pixel-aligned open-set features can be fused into 3D maps via traditional SLAM and multi-view fusion approaches. This enables effective zero-shot spatial reasoning, not needing any additional training or finetuning, and retains long-tailed concepts better than supervised approaches, outperforming them by more than 40% margin on 3D IoU. We extensively evaluate ConceptFusion on a number of real-world datasets, simulated home environments, a real-world tabletop manipulation task, and an autonomous driving platform. We showcase new avenues for blending foundation models with 3D open-set multimodal mapping. For more information, visit our project page https://concept-fusion.github.io or watch our 5-minute explainer video https://www.youtube.com/watch?v=rkXgws8fiDs

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2302.07241

Country:

Asia > Middle East > Israel (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.66)
Transportation > Ground > Road (0.66)
Automobiles & Trucks (0.48)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback