AITopics | Du, Yi

Collaborating Authors

Du, Yi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FastFT: Accelerating Reinforced Feature Transformation via Advanced Exploration Strategies

He, Tianqi, Huang, Xiaohan, Du, Yi, Long, Qingqing, Qiao, Ziyue, Wu, Min, Fu, Yanjie, Zhou, Yuanchun, Xiao, Meng

arXiv.org Artificial IntelligenceMar-26-2025

Feature Transformation is crucial for classic machine learning that aims to generate feature combinations to enhance the performance of downstream tasks from a data-centric perspective. Current methodologies, such as manual expert-driven processes, iterative-feedback techniques, and exploration-generative tactics, have shown promise in automating such data engineering workflow by minimizing human involvement. However, three challenges remain in those frameworks: (1) It predominantly depends on downstream task performance metrics, as assessment is time-consuming, especially for large datasets. (2) The diversity of feature combinations will hardly be guaranteed after random exploration ends. (3) Rare significant transformations lead to sparse valuable feedback that hinders the learning processes or leads to less effective results. In response to these challenges, we introduce FastFT, an innovative framework that leverages a trio of advanced strategies.We first decouple the feature transformation evaluation from the outcomes of the generated datasets via the performance predictor. To address the issue of reward sparsity, we developed a method to evaluate the novelty of generated transformation sequences. Incorporating this novelty into the reward function accelerates the model's exploration of effective transformations, thereby improving the search productivity. Additionally, we combine novelty and performance to create a prioritized memory buffer, ensuring that essential experiences are effectively revisited during exploration. Our extensive experimental evaluations validate the performance, efficiency, and traceability of our proposed framework, showcasing its superiority in handling complex feature transformation tasks.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2503.20394

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)
Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

Du, Yi, Zhao, Zhipeng, Su, Shaoshu, Golluri, Sharath, Zheng, Haoze, Yao, Runmao, Wang, Chen

arXiv.org Artificial IntelligenceMar-21-2025

Point cloud (PC) processing tasks-such as completion, upsampling, denoising, and colorization-are crucial in applications like autonomous driving and 3D reconstruction. Despite substantial advancements, prior approaches often address each of these tasks independently, with separate models focused on individual issues. However, this isolated approach fails to account for the fact that defects like incompleteness, low resolution, noise, and lack of color frequently coexist, with each defect influencing and correlating with the others. Simply applying these models sequentially can lead to error accumulation from each model, along with increased computational costs. To address these challenges, we introduce SuperPC, the first unified diffusion model capable of concurrently handling all four tasks. Our approach employs a three-level-conditioned diffusion framework, enhanced by a novel spatial-mix-fusion strategy, to leverage the correlations among these four defects for simultaneous, efficient processing. We show that SuperPC outperforms the state-of-the-art specialized models as well as their combination on all four individual tasks.

artificial intelligence, machine learning, point cloud, (17 more...)

arXiv.org Artificial Intelligence

2503.14558

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.48)
Transportation (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning

Du, Yi, Fu, Taimeng, Chen, Zhuoqun, Li, Bowen, Su, Shaoshu, Zhao, Zhipeng, Wang, Chen

arXiv.org Artificial IntelligenceFeb-10-2025

Vision-language navigation in unknown environments is crucial for mobile robots. In scenarios such as household assistance and rescue, mobile robots need to understand a human command, such as "find a person wearing black". We present a novel vision-language navigation (VL-Nav) system that integrates efficient spatial reasoning on low-power robots. Unlike prior methods that rely on a single image-level feature similarity to guide a robot, our method integrates pixel-wise vision-language features with curiosity-driven exploration. This approach enables robust navigation to human-instructed instances across diverse environments. We deploy VL-Nav on a four-wheel mobile robot and evaluate its performance through comprehensive navigation tasks in both indoor and outdoor environments, spanning different scales and semantic complexities. Remarkably, VL-Nav operates at a real-time frequency of 30 Hz with a Jetson Orin NX, highlighting its ability to conduct efficient vision-language navigation. Results show that VL-Nav achieves an overall success rate of 86.3%, outperforming previous methods by 44.15%.

artificial intelligence, navigation, spatial reasoning, (13 more...)

arXiv.org Artificial Intelligence

2502.00931

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.86)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.75)

Add feedback

Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

Wang, Chen, Ji, Kaiyi, Geng, Junyi, Ren, Zhongqiang, Fu, Taimeng, Yang, Fan, Guo, Yifan, He, Haonan, Chen, Xiangyu, Zhan, Zitong, Du, Qiwei, Su, Shaoshu, Li, Bowen, Qiu, Yuheng, Du, Yi, Li, Qihang, Yang, Yifan, Lin, Xiao, Zhao, Zhipeng

arXiv.org Artificial IntelligenceJul-6-2024

Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2406.16087

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Transportation (1.00)
Education (0.67)
Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(5 more...)

Add feedback

M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

Song, Wei, Li, Yadong, Xu, Jianhua, Wu, Guowei, Ming, Lingfeng, Yi, Kexin, Luo, Weihua, Li, Houyi, Du, Yi, Guo, Fangda, Yu, Kaicheng

arXiv.org Artificial IntelligenceJun-14-2024

As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed cognitive science to understand the intelligence of MLLMs beyond superficial achievements remains largely unexplored. To this end, we introduce the first cognitive-driven multi-lingual and multi-modal benchmark to evaluate the general intelligence ability of MLLMs, dubbed M3GIA. Specifically, we identify five key cognitive factors based on the well-recognized Cattell-Horn-Carrol (CHC) model of intelligence and propose a novel evaluation metric. In addition, since most MLLMs are trained to perform in different languages, a natural question arises: is language a key factor influencing the cognitive ability of MLLMs? As such, we go beyond English to encompass other languages based on their popularity, including Chinese, French, Spanish, Portuguese and Korean, to construct our M3GIA. We make sure all the data relevant to the cultural backgrounds are collected from their native context to avoid English-centric bias. We collected a significant corpus of data from human participants, revealing that the most advanced MLLM reaches the lower boundary of human intelligence in English. Yet, there remains a pronounced disparity in the other five languages assessed. We also reveals an interesting winner takes all phenomenon that are aligned with the discovery in cognitive studies. Our benchmark will be open-sourced, with the aspiration of facilitating the enhancement of cognitive capabilities in MLLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.05343

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

CataLM: Empowering Catalyst Design Through Large Language Models

Wang, Ludi, Chen, Xueqing, Du, Yi, Zhou, Yuanchun, Gao, Yang, Cui, Wenjuan

arXiv.org Artificial IntelligenceMay-12-2024

The field of catalysis holds paramount importance in shaping the trajectory of sustainable development, prompting intensive research efforts to leverage artificial intelligence (AI) in catalyst design. Presently, the fine-tuning of open-source large language models (LLMs) has yielded significant breakthroughs across various domains such as biology and healthcare. Drawing inspiration from these advancements, we introduce CataLM Cata}lytic Language Model), a large language model tailored to the domain of electrocatalytic materials. Our findings demonstrate that CataLM exhibits remarkable potential for facilitating human-AI collaboration in catalyst knowledge exploration and design. To the best of our knowledge, CataLM stands as the pioneering LLM dedicated to the catalyst domain, offering novel avenues for catalyst discovery and development.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.1744

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.86)

Industry:

Materials > Chemicals > Specialty Chemicals (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

PhysORD: A Neuro-Symbolic Approach for Physics-infused Motion Prediction in Off-road Driving

Zhao, Zhipeng, Li, Bowen, Du, Yi, Fu, Taimeng, Wang, Chen

arXiv.org Artificial IntelligenceApr-1-2024

Motion prediction is critical for autonomous off-road driving, however, it presents significantly more challenges than on-road driving because of the complex interaction between the vehicle and the terrain. Traditional physics-based approaches encounter difficulties in accurately modeling dynamic systems and external disturbance. In contrast, data-driven neural networks require extensive datasets and struggle with explicitly capturing the fundamental physical laws, which can easily lead to poor generalization. By merging the advantages of both methods, neuro-symbolic approaches present a promising direction. These methods embed physical laws into neural models, potentially significantly improving generalization capabilities. However, no prior works were evaluated in real-world settings for off-road driving. To bridge this gap, we present PhysORD, a neural-symbolic approach integrating the conservation law, i.e., the Euler-Lagrange equation, into data-driven neural models for motion prediction in off-road driving. Our experiments showed that PhysORD can accurately predict vehicle motion and tolerate external disturbance by modeling uncertainties. It outperforms existing methods both in accuracy and efficiency and demonstrates data-efficient learning and generalization ability in long-term prediction.

artificial intelligence, machine learning, physord, (20 more...)

arXiv.org Artificial Intelligence

2404.01596

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Automobiles & Trucks (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)

Add feedback

DP-CRE: Continual Relation Extraction via Decoupled Contrastive Learning and Memory Structure Preservation

Huang, Mengyi, Xiao, Meng, Wang, Ludi, Du, Yi

arXiv.org Artificial IntelligenceMar-5-2024

Continuous Relation Extraction (CRE) aims to incrementally learn relation knowledge from a non-stationary stream of data. Since the introduction of new relational tasks can overshadow previously learned information, catastrophic forgetting becomes a significant challenge in this domain. Current replay-based training paradigms prioritize all data uniformly and train memory samples through multiple rounds, which would result in overfitting old tasks and pronounced bias towards new tasks because of the imbalances of the replay set. To handle the problem, we introduce the DecouPled CRE (DP-CRE) framework that decouples the process of prior information preservation and new knowledge acquisition. This framework examines alterations in the embedding space as new relation classes emerge, distinctly managing the preservation and acquisition of knowledge. Extensive experiments show that DP-CRE significantly outperforms other CRE baselines across two datasets.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.02718

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Zhao, Shibo, Gao, Yuanjun, Wu, Tianhao, Singh, Damanpreet, Jiang, Rushan, Sun, Haoxiang, Sarawata, Mansi, Qiu, Yuheng, Whittaker, Warren, Higgins, Ian, Du, Yi, Su, Shaoshu, Xu, Can, Keller, John, Karhade, Jay, Nogueira, Lucas, Saha, Sourojit, Zhang, Ji, Wang, Wenshan, Wang, Chen, Scherer, Sebastian

arXiv.org Artificial IntelligenceDec-31-2023

Simultaneous localization and mapping (SLAM) is a fundamental task for numerous applications such as autonomous navigation and exploration. Despite many SLAM datasets have been released, current SLAM solutions still struggle to have sustained and resilient performance. One major issue is the absence of high-quality datasets including diverse all-weather conditions and a reliable metric for assessing robustness. This limitation significantly restricts the scalability and generalizability of SLAM technologies, impacting their development, validation, and deployment. To address this problem, we present SubT-MRS, an extremely challenging real-world dataset designed to push SLAM towards all-weather environments to pursue the most robust SLAM performance. It contains multi-degraded environments including over 30 diverse scenes such as structureless corridors, varying lighting conditions, and perceptual obscurants like smoke and dust; multimodal sensors such as LiDAR, fisheye camera, IMU, and thermal camera; and multiple locomotions like aerial, legged, and wheeled robots. We develop accuracy and robustness evaluation tracks for SLAM and introduced novel robustness metrics. Comprehensive studies are performed, revealing new observations, challenges, and opportunities for future research.

artificial intelligence, dataset, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2307.07607

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

PyPose v0.6: The Imperative Programming Interface for Robotics

Zhan, Zitong, Li, Xiangfu, Li, Qihang, He, Haonan, Pandey, Abhinav, Xiao, Haitao, Xu, Yangmengfei, Chen, Xiangyu, Xu, Kuan, Cao, Kun, Zhao, Zhipeng, Wang, Zihan, Xu, Huan, Fang, Zihang, Chen, Yutian, Wang, Wentao, Fang, Xu, Du, Yi, Wu, Tianhao, Lin, Xiao, Qiu, Yuheng, Yang, Fan, Shi, Jingnan, Su, Shaoshu, Lu, Yiren, Fu, Taimeng, Dantu, Karthik, Wu, Jiajun, Xie, Lihua, Hutter, Marco, Carlone, Luca, Scherer, Sebastian, Huang, Daning, Hu, Yaoyu, Geng, Junyi, Wang, Chen

arXiv.org Artificial IntelligenceSep-22-2023

PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2309.13035

Country:

North America > United States > Pennsylvania (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Georgia > Fulton County (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback