AITopics | Li, Zhongyu

Collaborating Authors

Li, Zhongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Decentralized Navigation of a Cable-Towed Load using Quadrupedal Robot Team via MARL

Chen, Wen-Tse, Nguyen, Minh, Li, Zhongyu, Sue, Guo Ning, Sreenath, Koushil

arXiv.org Artificial IntelligenceMar-23-2025

This work addresses the challenge of enabling a team of quadrupedal robots to collaboratively tow a cable-connected load through cluttered and unstructured environments while avoiding obstacles. Leveraging cables allows the multi-robot system to navigate narrow spaces by maintaining slack when necessary. However, this introduces hybrid physical interactions due to alternating taut and slack states, with computational complexity that scales exponentially as the number of agents increases. To tackle these challenges, we developed a scalable and decentralized system capable of dynamically coordinating a variable number of quadrupedal robots while managing the hybrid physical interactions inherent in the load-towing task. At the core of this system is a novel multi-agent reinforcement learning (MARL)-based planner, designed for decentralized coordination. The MARL-based planner is trained using a centralized training with decentralized execution (CTDE) framework, enabling each robot to make decisions autonomously using only local (ego) observations. To accelerate learning and ensure effective collaboration across varying team sizes, we introduce a tailored training curriculum for MARL. Experimental results highlight the flexibility and scalability of the framework, demonstrating successful deployment with one to four robots in real-world scenarios and up to twelve robots in simulation. The decentralized planner maintains consistent inference times, regardless of the team size. Additionally, the proposed system demonstrates robustness to environment perturbations and adaptability to varying load weights. This work represents a step forward in achieving flexible and efficient multi-legged robotic collaboration in complex and real-world environments.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

2503.18221

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.68)
Instructional Material > Course Syllabus & Notes (0.54)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos

Cui, Xiangxiang, Li, Zhongyu, Fan, Xiayue, Huang, Peng, Wang, Ying, Yang, Meng, Chang, Shi, Zhu, Jihua

arXiv.org Artificial IntelligenceFeb-17-2025

The intersection of medical imaging and artificial intelligence has become an important research direction in intelligent medical treatment, particularly in the analysis of medical images using deep learning for clinical diagnosis. Despite the advances, existing keyframe classification methods lack extraction of time series features, while ultrasonic video classification based on three-dimensional convolution requires uniform frame numbers across patients, resulting in poor feature extraction efficiency and model classification performance. This study proposes a novel video classification method based on CNN and LSTM, introducing NLP's long and short sentence processing scheme into video classification for the first time. The method reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training. Specifically, feature vectors are sorted by patient video frame numbers and populated with padding value 0 to form variable batches, with invalid padding values compressed before LSTM training to conserve computing resources. Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics, showing improvements of 3-6% in F1 score and 1.5% in specificity compared to keyframe methods. The variable-frame CNNLSTM also achieves better accuracy and precision than equal-frame CNNLSTM. These findings validate the effectiveness of our approach in classifying variable-frame ultrasound videos and suggest potential applications in other medical imaging modalities.

artificial intelligence, feature vector, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.11481

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Lei, Jiayi, Zhang, Renrui, Hu, Xiangfei, Lin, Weifeng, Li, Zhen, Sun, Wenjian, Du, Ruoyi, Zhuo, Le, Li, Zhongyu, Li, Xinyue, Zhao, Shitian, Guo, Ziyu, Lu, Yiting, Gao, Peng, Li, Hongsheng

arXiv.org Artificial IntelligenceJan-23-2025

With the rapid development of diffusion models, text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation. Recently launched models such as FLUX.1 and Ideogram2.0, along with others like Dall-E3 and Stable Diffusion 3, have demonstrated exceptional performance across various complex tasks, raising questions about whether T2I models are moving towards general-purpose applicability. Beyond traditional image generation, these models exhibit capabilities across a range of fields, including controllable generation, image editing, video, audio, 3D, and motion generation, as well as computer vision tasks like semantic segmentation and depth estimation. However, current evaluation frameworks are insufficient to comprehensively assess these models' performance across expanding domains. To thoroughly evaluate these models, we developed the IMAGINE-E and tested six prominent models: FLUX.1, Ideogram2.0, Midjourney, Dall-E3, Stable Diffusion 3, and Jimeng. Our evaluation is divided into five key domains: structured output generation, realism, and physical consistency, specific domain generation, challenging scenario generation, and multi-style creation tasks. This comprehensive assessment highlights each model's strengths and limitations, particularly the outstanding performance of FLUX.1 and Ideogram2.0 in structured and specific domain tasks, underscoring the expanding applications and potential of T2I models as foundational AI tools. This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability. Evaluation scripts will be released at https://github.com/jylei16/Imagine-e.

artificial intelligence, ideogram2, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.1392

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Photography (1.00)
Health & Medicine (1.00)
Information Technology (0.92)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Traversability-Aware Legged Navigation by Learning from Real-World Visual Data

Zhang, Hongbo, Li, Zhongyu, Zeng, Xuanqi, Smith, Laura, Stachowicz, Kyle, Shah, Dhruv, Yue, Linzhu, Song, Zhitao, Xia, Weipeng, Levine, Sergey, Sreenath, Koushil, Liu, Yun-hui

arXiv.org Artificial IntelligenceNov-11-2024

The enhanced mobility brought by legged locomotion empowers quadrupedal robots to navigate through complex and unstructured environments. However, optimizing agile locomotion while accounting for the varying energy costs of traversing different terrains remains an open challenge. Most previous work focuses on planning trajectories with traversability cost estimation based on human-labeled environmental features. However, this human-centric approach is insufficient because it does not account for the varying capabilities of the robot locomotion controllers over challenging terrains. To address this, we develop a novel traversability estimator in a robot-centric manner, based on the value function of the robot's locomotion controller. This estimator is integrated into a new learning-based RGBD navigation framework. The framework employs multiple training stages to develop a planner that guides the robot in avoiding obstacles and hard-to-traverse terrains while reaching its goals. The training of the navigation planner is directly performed in the real world using a sample efficient reinforcement learning method that utilizes both online data and offline datasets. Through extensive benchmarking, we demonstrate that the proposed framework achieves the best performance in accurate traversability cost estimation and efficient learning from multi-modal data (including the robot's color and depth vision, as well as proprioceptive feedback) for real-world training. Using the proposed method, a quadrupedal robot learns to perform traversability-aware navigation through trial and error in various real-world environments with challenging terrains that are difficult to classify using depth vision alone. Moreover, the robot demonstrates the ability to generalize the learned navigation skills to unseen scenarios. Video can be found at https://youtu.be/RSqnIWZ1qks.

artificial intelligence, robot, traversability cost, (15 more...)

arXiv.org Artificial Intelligence

2410.10621

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback

Online Omnidirectional Jumping Trajectory Planning for Quadrupedal Robots on Uneven Terrains

Yue, Linzhu, Song, Zhitao, Dong, Jinhu, Li, Zhongyu, Zhang, Hongbo, Zhang, Lingwei, Zeng, Xuanqi, Sreenath, Koushil, Liu, Yun-hui

arXiv.org Artificial IntelligenceNov-9-2024

Natural terrain complexity often necessitates agile movements like jumping in animals to improve traversal efficiency. To enable similar capabilities in quadruped robots, complex real-time jumping maneuvers are required. Current research does not adequately address the problem of online omnidirectional jumping and neglects the robot's kinodynamic constraints during trajectory generation. This paper proposes a general and complete cascade online optimization framework for omnidirectional jumping for quadruped robots. Our solution systematically encompasses jumping trajectory generation, a trajectory tracking controller, and a landing controller. It also incorporates environmental perception to navigate obstacles that standard locomotion cannot bypass, such as jumping from high platforms. We introduce a novel jumping plane to parameterize omnidirectional jumping motion and formulate a tightly coupled optimization problem accounting for the kinodynamic constraints, simultaneously optimizing CoM trajectory, Ground Reaction Forces (GRFs), and joint states. To meet the online requirements, we propose an accelerated evolutionary algorithm as the trajectory optimizer to address the complexity of kinodynamic constraints. To ensure stability and accuracy in environmental perception post-landing, we introduce a coarse-to-fine relocalization method that combines global Branch and Bound (BnB) search with Maximum a Posteriori (MAP) estimation for precise positioning during navigation and jumping. The proposed framework achieves jump trajectory generation in approximately 0.1 seconds with a warm start and has been successfully validated on two quadruped robots on uneven terrains. Additionally, we extend the framework's versatility to humanoid robots.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.04494

Country: North America > United States (0.92)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies

Chen, Zixuan, He, Xialin, Wang, Yen-Jen, Liao, Qiayuan, Ze, Yanjie, Li, Zhongyu, Sastry, S. Shankar, Wu, Jiajun, Sreenath, Koushil, Gupta, Saurabh, Peng, Xue Bin

arXiv.org Artificial IntelligenceOct-28-2024

Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers. All simulation and deployment code, along with complete checkpoints, is available on our project page: https://lipschitz-constrained-policy.github.io.

artificial intelligence, robot, smoothness reward, (14 more...)

arXiv.org Artificial Intelligence

2410.11825

Country: Asia (0.28)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.67)

Add feedback

Interactive Navigation with Adaptive Non-prehensile Mobile Manipulation

Dai, Cunxi, Liu, Xiaohan, Sreenath, Koushil, Li, Zhongyu, Hollis, Ralph

arXiv.org Artificial IntelligenceOct-17-2024

This paper introduces a framework for interactive navigation through adaptive non-prehensile mobile manipulation. A key challenge in this process is handling objects with unknown dynamics, which are difficult to infer from visual observation. To address this, we propose an adaptive dynamics model for common movable indoor objects via learned SE(2) dynamics representations. This model is integrated into Model Predictive Path Integral (MPPI) control to guide the robot's interactions. Additionally, the learned dynamics help inform decision-making when navigating around objects that cannot be manipulated.Our approach is validated in both simulation and real-world scenarios, demonstrating its ability to accurately represent object dynamics and effectively manipulate various objects. We further highlight its success in the Navigation Among Movable Objects (NAMO) task by deploying the proposed framework on a dynamically balancing mobile robot, Shmoobot. Project website: https://cmushmoobot.github.io/AdaptivePushing/.

artificial intelligence, machine learning, robot, (15 more...)

arXiv.org Artificial Intelligence

2410.13418

Country: North America > United States > California (0.28)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.49)

Add feedback

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

Ryu, Kanghyun, Liao, Qiayuan, Li, Zhongyu, Sreenath, Koushil, Mehr, Negar

arXiv.org Artificial IntelligenceSep-26-2024

Curriculum learning is a training mechanism in reinforcement learning (RL) that facilitates the achievement of complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs), with their extensive training on diverse language data and ability to encapsulate world knowledge, present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex target tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks that aid target task learning in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world. The code is provided in https://github.com/labicon/CurricuLLM

curriculum, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.18382

Country: North America > United States (0.28)

Genre:

Workflow (0.89)
Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Education > Curriculum (0.61)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Huang, Xiaoyu, Liao, Qiayuan, Ni, Yiming, Li, Zhongyu, Smith, Laura, Levine, Sergey, Peng, Xue Bin, Sreenath, Koushil

arXiv.org Artificial IntelligenceJul-9-2024

This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2407.06584

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Soccer (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.47)

Add feedback

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

Huang, Xiaoyu, Chi, Yufeng, Wang, Ruofeng, Li, Zhongyu, Peng, Xue Bin, Shao, Sophia, Nikolic, Borivoje, Sreenath, Koushil

arXiv.org Artificial IntelligenceApr-30-2024

This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2404.19264

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback