AITopics | Dong, Yifei

Plotting

Dong, Yifei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation

Lu, Haofei, Dong, Yifei, Weng, Zehang, Lundell, Jens, Kragic, Danica

arXiv.org Artificial IntelligenceMar-31-2025

-- We introduce the sequential multi-object robotic grasp sampling algorithm SeqGrasp that can robustly synthesize stable grasps on diverse objects using the robotic hand's partial Degrees of Freedom (DoF). We use SeqGrasp to construct the large-scale Allegro Hand sequential grasping dataset SeqDataset and use it for training the diffusion-based sequential grasp generator SeqDiffuser . We experimentally evaluate SeqGrasp and SeqDiffuser against the state-of-the-art non-sequential multi-object grasp generation method Multi-Grasp in simulation and on a real robot. Furthermore, SeqDiffuser is approximately 1000 times faster at generating grasps than SeqGrasp and MultiGrasp. Generation of dexterous grasps has been studied for a long time, both from a technical perspective on generating grasps on robots [1]-[11] and understanding human grasping [12]- [15]. Most of these methods rely on bringing the robotic hand close to the object and then simultaneously enveloping it with all fingers. While this strategy often results in efficient and successful grasp generation, it simplifies dexterous grasping to resemble parallel-jaw grasping, thereby underutilizing the many DoF of multi-fingered robotic hands [10]. In contrast, grasping multiple objects with a robotic hand, particularly in a sequential manner that mirrors human-like dexterity, as shown in Figure 1, is still an unsolved problem. In this work, we introduce SeqGrasp, a novel hand-agnostic algorithm for generating sequential multi-object grasps.

artificial intelligence, seqdataset, seqdiffuser, (11 more...)

arXiv.org Artificial Intelligence

2503.2237

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Dong, Yifei, Wu, Fengyi, He, Qi, Li, Heng, Li, Minghan, Cheng, Zebang, Zhou, Yuxuan, Sun, Jingdong, Dai, Qi, Cheng, Zhi-Qi, Hauptmann, Alexander G

arXiv.org Artificial IntelligenceMar-18-2025

Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: 1. A standardized task definition that balances discrete-continuous navigation with personal-space requirements; 2. An enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; 3. Extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; 4. Real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and 5. A public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.

evolutionary algorithm, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.14229

Genre:

Workflow (0.67)
Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

T-DOM: A Taxonomy for Robotic Manipulation of Deformable Objects

Blanco-Mulero, David, Dong, Yifei, Borras, Julia, Pokorny, Florian T., Torras, Carme

arXiv.org Artificial IntelligenceDec-30-2024

Robotic grasp and manipulation taxonomies, inspired by observing human manipulation strategies, can provide key guidance for tasks ranging from robotic gripper design to the development of manipulation algorithms. The existing grasp and manipulation taxonomies, however, often assume object rigidity, which limits their ability to reason about the complex interactions in the robotic manipulation of deformable objects. Hence, to assist in tasks involving deformable objects, taxonomies need to capture more comprehensively the interactions inherent in deformable object manipulation. To this end, we introduce T-DOM, a taxonomy that analyses key aspects involved in the manipulation of deformable objects, such as robot motion, forces, prehensile and non-prehensile interactions and, for the first time, a detailed classification of object deformations. To evaluate T-DOM, we curate a dataset of ten tasks involving a variety of deformable objects, such as garments, ropes, and surgical gloves, as well as diverse types of deformations. We analyse the proposed tasks comparing the T-DOM taxonomy with previous well established manipulation taxonomies. Our analysis demonstrates that T-DOM can effectively distinguish between manipulation skills that were not identified in other taxonomies, across different deformable objects and manipulation actions, offering new categories to characterize a skill. The proposed taxonomy significantly extends past work, providing a more fine-grained classification that can be used to describe the robotic manipulation of deformable objects. This work establishes a foundation for advancing deformable object manipulation, bridging theoretical understanding and practical implementation in robotic systems.

artificial intelligence, deformation, manipulation, (17 more...)

arXiv.org Artificial Intelligence

2412.20998

Country: Europe (0.67)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

TactV: A Class of Hybrid Terrestrial/Aerial Coaxial Tilt-Rotor Vehicles

Dong, Yifei, Zhu, Yimin, Zhang, Lixian, Ding, Yihang

arXiv.org Artificial IntelligenceNov-19-2024

To enhance the obstacle-crossing and endurance capabilities of vehicles operating in complex environments, this paper presents the design of a hybrid terrestrial/aerial coaxial tilt-rotor vehicle, TactV, which integrates advantages such as lightweight construction and high maneuverability. Unlike existing tandem dual-rotor vehicles, TactV employs a tiltable coaxial dual-rotor design and features a spherical cage structure that encases the body, allowing for omnidirectional movement while further reducing its overall dimensions. To enable TactV to maneuver flexibly in aerial, planar, and inclined surfaces, we established corresponding dynamic and control models for each mode. Additionally, we leveraged TactV's tiltable center of gravity to design energy-saving and high-mobility modes for ground operations, thereby further enhancing its endurance. Experimental designs for both aerial and ground tests corroborated the superiority of TactV's movement capabilities and control strategies.

artificial intelligence, mechanism, tactv, (15 more...)

arXiv.org Artificial Intelligence

2411.12359

Genre: Research Report (1.00)

Industry: Aerospace & Defense > Aircraft (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Caging in Motion: Characterizing Robustness in Manipulation through Energy Margin and Dynamic Caging Analysis

Dong, Yifei, Cheng, Xianyi, Pokorny, Florian T.

arXiv.org Artificial IntelligenceApr-18-2024

To develop robust manipulation policies, quantifying robustness is essential. Evaluating robustness in general dexterous manipulation, nonetheless, poses significant challenges due to complex hybrid dynamics, combinatorial explosion of possible contact interactions, global geometry, etc. This paper introduces ``caging in motion'', an approach for analyzing manipulation robustness through energy margins and caging-based analysis. Our method assesses manipulation robustness by measuring the energy margin to failure and extends traditional caging concepts for a global analysis of dynamic manipulation. This global analysis is facilitated by a kinodynamic planning framework that naturally integrates global geometry, contact changes, and robot compliance. We validate the effectiveness of our approach in the simulation and real-world experiments of multiple dynamic manipulation scenarios, highlighting its potential to predict manipulation success and robustness.

artificial intelligence, init, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.12115

Country:

Europe (0.46)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.66)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.48)

Add feedback

AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization

Li, Jiyao, Ni, Mingze, Dong, Yifei, Zhu, Tianqing, Liu, Wei

arXiv.org Artificial IntelligenceFeb-20-2024

Recent advances in deep learning research have shown remarkable achievements across many tasks in computer vision (CV) and natural language processing (NLP). At the intersection of CV and NLP is the problem of image captioning, where the related models' robustness against adversarial attacks has not been well studied. In this paper, we present a novel adversarial attack strategy, which we call AICAttack (Attention-based Image Captioning Attack), designed to attack image captioning models through subtle perturbations on images. Operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information. We introduce an attention-based candidate selection mechanism that identifies the optimal pixels to attack, followed by Differential Evolution (DE) for perturbing pixels' RGB values. We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets with multiple victim models. The experimental results demonstrate that our method surpasses current leading-edge techniques by effectively distributing the alignment and semantics of words in the output.

machine learning, natural language, pixel, (18 more...)

arXiv.org Artificial Intelligence

2402.1194

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ACGAN-GNNExplainer: Auxiliary Conditional Generative Explainer for Graph Neural Networks

Li, Yiqiao, Zhou, Jianlong, Dong, Yifei, Shafiabady, Niusha, Chen, Fang

arXiv.org Artificial IntelligenceOct-10-2023

Graph neural networks (GNNs) have proven their efficacy in a variety of real-world applications, but their underlying mechanisms remain a mystery. To address this challenge and enable reliable decision-making, many GNN explainers have been proposed in recent years. However, these methods often encounter limitations, including their dependence on specific instances, lack of generalizability to unseen graphs, producing potentially invalid explanations, and yielding inadequate fidelity. To overcome these limitations, we, in this paper, introduce the Auxiliary Classifier Generative Adversarial Network (ACGAN) into the field of GNN explanation and propose a new GNN explainer dubbed~\emph{ACGAN-GNNExplainer}. Our approach leverages a generator to produce explanations for the original input graphs while incorporating a discriminator to oversee the generation process, ensuring explanation fidelity and improving accuracy. Experimental evaluations conducted on both synthetic and real-world graph datasets demonstrate the superiority of our proposed method compared to other existing GNN explainers.

artificial intelligence, auxiliary conditional generative explainer, machine learning, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3583780.3614772

2309.16918

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Quasi-static Soft Fixture Analysis of Rigid and Deformable Objects

Dong, Yifei, Pokorny, Florian T.

arXiv.org Artificial IntelligenceSep-3-2023

We present a sampling-based approach to reasoning about the caging-based manipulation of rigid and a simplified class of deformable 3D objects subject to energy constraints. Towards this end, we propose the notion of soft fixtures extending earlier work on energy-bounded caging to include a broader set of energy function constraints and settings, such as gravitational and elastic potential energy of 3D deformable objects. Previous methods focused on establishing provably correct algorithms to compute lower bounds or analytically exact estimates of escape energy for a very restricted class of known objects with low-dimensional C-spaces, such as planar polygons. We instead propose a practical sampling-based approach that is applicable in higher-dimensional C-spaces but only produces a sequence of upper-bound estimates that, however, appear to converge rapidly to actual escape energy. We present 8 simulation experiments demonstrating the applicability of our approach to various complex quasi-static manipulation scenarios. Quantitative results indicate the effectiveness of our approach in providing upper-bound estimates for escape energy in quasi-static manipulation scenarios. Two real-world experiments also show that the computed normalized escape energy estimates appear to correlate strongly with the probability of escape of an object under randomized pose perturbation.

artificial intelligence, escape energy, init, (18 more...)

arXiv.org Artificial Intelligence

2309.01224

Country: Europe (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.47)

Add feedback

Chat-PM: A Class of Composite Hybrid Aerial/Terrestrial Precise Manipulator

Ding, Yihang, Ji, Xiaoyu, Zhang, Lixian, Dong, Yifei, Wu, Tong, Han, Chengzhe

arXiv.org Artificial IntelligenceJul-22-2023

This paper concentrates on the development of Chat-PM, a class of composite hybrid aerial/terrestrial manipulator, in concern with composite configuration design, dynamics modeling, motion control and force estimation. Compared with existing aerial or terrestrial mobile manipulators, Chat-PM demonstrates advantages in terms of reachability, energy efficiency and manipulation precision. To achieve precise manipulation in terrestrial mode, the dynamics is analyzed with consideration of surface contact, based on which a cascaded controller is designed with compensation for the interference force and torque from the arm. Benefiting from the kinematic constraints caused by the surface contact, the position deviation and the vehicle vibration are effectively decreased, resulting in higher control precision of the end gripper. For manipulation on surfaces with unknown inclination angles, the moving horizon estimation (MHE) is exploited to obtain the precise estimations of force and inclination angle, which are used in the control loop to compensate for the effect of the unknown surface. Real-world experiments are performed to evaluate the superiority of the developed manipulator and the proposed controllers.

artificial intelligence, chat-pm, international conference, (16 more...)

arXiv.org Artificial Intelligence

2307.12056

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion

Wang, Yu, Bu, Shuhui, Chen, Lin, Dong, Yifei, Li, Kun, Cao, Xuefeng, Li, Ke

arXiv.org Artificial IntelligenceApr-10-2023

Recently, cross-source point cloud registration from different sensors has become a significant research focus. However, traditional methods confront challenges due to the varying density and structure of cross-source point clouds. In order to solve these problems, we propose a cross-source point cloud fusion algorithm called HybridFusion. It can register cross-source dense point clouds from different viewing angle in outdoor large scenes. The entire registration process is a coarse-to-fine procedure. First, the point cloud is divided into small patches, and a matching patch set is selected based on global descriptors and spatial distribution, which constitutes the coarse matching process. To achieve fine matching, 2D registration is performed by extracting 2D boundary points from patches, followed by 3D adjustment. Finally, the results of multiple patch pose estimates are clustered and fused to determine the final pose. The proposed approach is evaluated comprehensively through qualitative and quantitative experiments. In order to compare the robustness of cross-source point cloud registration, the proposed method and generalized iterative closest point method are compared. Furthermore, a metric for describing the degree of point cloud filling is proposed. The experimental results demonstrate that our approach achieves state-of-the-art performance in cross-source point cloud registration.

artificial intelligence, information fusion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.04508

Genre: Research Report > New Finding (0.68)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)
Information Technology > Sensing and Signal Processing (0.93)
(2 more...)

Add feedback