AITopics | Zhao, Shiyu

Collaborating Authors

Zhao, Shiyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation

Zhou, Yang, Zhao, Shiyu, Chen, Yuxiao, Wang, Zhenting, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceMar-17-2025

Large foundation models trained on large-scale visual-text data can significantly enhance Open Vocabulary Object Detection (OVD) through data generation. However, this may lead to biased synthetic data and overfitting to specific configurations. It can sidestep biases of manually curated data generation by directly leveraging hidden states of Large Language Models (LLMs), which is surprisingly rarely explored. This paper presents a systematic method to enhance visual grounding by utilizing decoder layers of the LLM of a MLLM. We introduce a zero-initialized cross-attention adapter to enable efficient knowledge transfer from LLMs to object detectors, an new approach called LED (LLM Enhanced Open-Vocabulary Object Detection). We demonstrate that intermediate hidden states from early LLM layers retain strong spatial-semantic correlations that are beneficial to grounding tasks. Experiments show that our adaptation strategy significantly enhances the performance on complex free-form text queries while remaining the same on plain categories. With our adaptation, Qwen2-0.5B with Swin-T as the vision encoder improves GroundingDINO by 2.33% on Omnilabel, at the overhead of 8.7% more GFLOPs. Qwen2-0.5B with a larger vision encoder can further boost the performance by 6.22%. We further validate our design by ablating on varied adapter architectures, sizes of LLMs, and which layers to add adaptation.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.13794

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Cooperative Bearing-Only Target Pursuit via Multiagent Reinforcement Learning: Design and Experiment

Li, Jianan, Wang, Zhikun, Ding, Susheng, Guo, Shiliang, Zhao, Shiyu

arXiv.org Artificial IntelligenceMar-11-2025

This paper addresses the multi-robot pursuit problem for an unknown target, encompassing both target state estimation and pursuit control. First, in state estimation, we focus on using only bearing information, as it is readily available from vision sensors and effective for small, distant targets. Challenges such as instability due to the nonlinearity of bearing measurements and singularities in the two-angle representation are addressed through a proposed uniform bearing-only information filter. This filter integrates multiple 3D bearing measurements, provides a concise formulation, and enhances stability and resilience to target loss caused by limited field of view (FoV). Second, in target pursuit control within complex environments, where challenges such as heterogeneity and limited FoV arise, conventional methods like differential games or Voronoi partitioning often prove inadequate. To address these limitations, we propose a novel multiagent reinforcement learning (MARL) framework, enabling multiple heterogeneous vehicles to search, localize, and follow a target while effectively handling those challenges. Third, to bridge the sim-to-real gap, we propose two key techniques: incorporating adjustable low-level control gains in training to replicate the dynamics of real-world autonomous ground vehicles (AGVs), and proposing spectral-normalized RL algorithms to enhance policy smoothness and robustness. Finally, we demonstrate the successful zero-shot transfer of the MARL controllers to AGVs, validating the effectiveness and practical feasibility of our approach. The accompanying video is available at https://youtu.be/HO7FJyZiJ3E.

artificial intelligence, machine learning, vehicle, (17 more...)

arXiv.org Artificial Intelligence

2503.0874

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Non-Equilibrium MAV-Capture-MAV via Time-Optimal Planning and Reinforcement Learning

Zheng, Canlun, Guo, Zhanyu, Yin, Zikang, Wang, Chunyu, Wang, Zhikun, Zhao, Shiyu

arXiv.org Artificial IntelligenceMar-9-2025

The capture of flying MAVs (micro aerial vehicles) has garnered increasing research attention due to its intriguing challenges and promising applications. Despite recent advancements, a key limitation of existing work is that capture strategies are often relatively simple and constrained by platform performance. This paper addresses control strategies capable of capturing high-maneuverability targets. The unique challenge of achieving target capture under unstable conditions distinguishes this task from traditional pursuit-evasion and guidance problems. In this study, we transition from larger MAV platforms to a specially designed, compact capture MAV equipped with a custom launching device while maintaining high maneuverability. We explore both time-optimal planning (TOP) and reinforcement learning (RL) methods. Simulations demonstrate that TOP offers highly maneuverable and shorter trajectories, while RL excels in real-time adaptability and stability. Moreover, the RL method has been tested in real-world scenarios, successfully achieving target capture even in unstable states.

machine learning, mav, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2503.06578

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.47)

Add feedback

Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction

Li, Kai, Ma, Zhao, Li, Liang, Zhao, Shiyu

arXiv.org Artificial IntelligenceMar-9-2025

In this paper, we propose a framework, collective behavioral cloning (CBC), to learn the underlying interaction mechanism and control policy of a swarm system. Given the trajectory data of a swarm system, we propose a graph variational autoencoder (GVAE) to learn the local interaction graph. Based on the interaction graph and swarm trajectory, we use behavioral cloning to learn the control policy of the swarm system. To demonstrate the practicality of CBC, we deploy it on a real-world decentralized vision-based robot swarm system. A visual attention network is trained based on the learned interaction graph for online neighbor selection. Experimental results show that our method outperforms previous approaches in predicting both the interaction graph and swarm actions with higher accuracy. This work offers a promising approach for understanding interaction mechanisms and swarm dynamics in future swarm robotics research. Code and data are available.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.06869

Country:

Asia > China (0.14)
North America > United States > Montana > Roosevelt County (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Vision-Based Cooperative MAV-Capturing-MAV

Zheng, Canlun, Mi, Yize, Guo, Hanqing, Chen, Huaben, Zhao, Shiyu

arXiv.org Artificial IntelligenceMar-8-2025

MAV-capturing-MAV (MCM) is one of the few effective methods for physically countering misused or malicious MAVs.This paper presents a vision-based cooperative MCM system, where multiple pursuer MAVs equipped with onboard vision systems detect, localize, and pursue a target MAV. To enhance robustness, a distributed state estimation and control framework enables the pursuer MAVs to autonomously coordinate their actions. Pursuer trajectories are optimized using Model Predictive Control (MPC) and executed via a low-level SO(3) controller, ensuring smooth and stable pursuit. Once the capture conditions are satisfied, the pursuer MAVs automatically deploy a flying net to intercept the target. These capture conditions are determined based on the predicted motion of the net. To enable real-time decision-making, we propose a lightweight computational method to approximate the net motion, avoiding the prohibitive cost of solving the full net dynamics. The effectiveness of the proposed system is validated through simulations and real-world experiments. In real-world tests, our approach successfully captures a moving target traveling at 4 meters per second with an acceleration of 1 meter per square second, achieving a success rate of 64.7 percent.

artificial intelligence, mav, real time system, (16 more...)

arXiv.org Artificial Intelligence

2503.06412

Country:

Asia > China (0.14)
North America > Canada (0.14)

Genre: Research Report (0.50)

Industry:

Energy > Oil & Gas (0.55)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning

Yin, Zikang, Zheng, Canlun, Guo, Shiliang, Wang, Zhikun, Zhao, Shiyu

arXiv.org Artificial IntelligenceMar-7-2025

Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.

machine learning, reinforcement learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2503.01125

Country:

Asia (0.68)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Air (0.48)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

A Cooperative Bearing-Rate Approach for Observability-Enhanced Target Motion Estimation

Zheng, Canlun, Guo, Hanqing, Zhao, Shiyu

arXiv.org Artificial IntelligenceFeb-11-2025

Vision-based target motion estimation is a fundamental problem in many robotic tasks. The existing methods have the limitation of low observability and, hence, face challenges in tracking highly maneuverable targets. Motivated by the aerial target pursuit task where a target may maneuver in 3D space, this paper studies how to further enhance observability by incorporating the \emph{bearing rate} information that has not been well explored in the literature. The main contribution of this paper is to propose a new cooperative estimator called STT-R (Spatial-Temporal Triangulation with bearing Rate), which is designed under the framework of distributed recursive least squares. This theoretical result is further verified by numerical simulation and real-world experiments. It is shown that the proposed STT-R algorithm can effectively generate more accurate estimations and effectively reduce the lag in velocity estimation, enabling tracking of more maneuverable targets.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.08089

Country:

Asia > China (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Token-Budget-Aware LLM Reasoning

Han, Tingxu, Wang, Zhenting, Fang, Chunrong, Zhao, Shiyu, Ma, Shiqing, Chen, Zhenyu

arXiv.org Artificial IntelligenceDec-31-2024

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework, which dynamically estimates token budgets for different problems based on reasoning complexity and uses the estimated token budgets to guide the reasoning process. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.18547

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

MLLM-as-a-Judge for Image Safety without Human Labeling

Wang, Zhenting, Hu, Shuming, Zhao, Shiyu, Lin, Xiaowen, Juefei-Xu, Felix, Li, Zhuowei, Han, Ligong, Subramanyam, Harihar, Chen, Li, Chen, Jianfa, Jiang, Nan, Lyu, Lingjuan, Ma, Shiqing, Metaxas, Dimitris N., Jain, Ankit

arXiv.org Artificial IntelligenceDec-30-2024

Image content safety has become a significant challenge with the rise of visual media on online platforms. Meanwhile, in the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content, such as images containing sexual or violent material. Thus, it becomes crucial to identify such unsafe images based on established safety rules. Pre-trained Multimodal Large Language Models (MLLMs) offer potential in this regard, given their strong pattern recognition abilities. Existing approaches typically fine-tune MLLMs with human-labeled datasets, which however brings a series of drawbacks. First, relying on human annotators to label data following intricate and detailed guidelines is both expensive and labor-intensive. Furthermore, users of safety judgment systems may need to frequently update safety rules, making fine-tuning on human-based annotation more challenging. This raises the research question: Can we detect unsafe images by querying MLLMs in a zero-shot setting using a predefined safety constitution (a set of safety rules)? Our research showed that simply querying pre-trained MLLMs does not yield satisfactory results. This lack of effectiveness stems from factors such as the subjectivity of safety rules, the complexity of lengthy constitutions, and the inherent biases in the models. To address these challenges, we propose a MLLM-based method includes objectifying safety rules, assessing the relevance between rules and images, making quick judgments based on debiased token probabilities with logically complete yet simplified precondition chains for safety rules, and conducting more in-depth reasoning with cascaded chain-of-thought processes if necessary. Experiment results demonstrate that our method is highly effective for zero-shot image safety judgment tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.00192

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Avian-Inspired High-Precision Tracking Control for Aerial Manipulators

Ji, Mengyu, Shen, Jiahao, Cao, Huazi, Zhao, Shiyu

arXiv.org Artificial IntelligenceNov-17-2024

Aerial manipulators, composed of multirotors and robotic arms, have a structure and function highly reminiscent of avian species. This paper studies the tracking control problem for aerial manipulators. This paper studies the tracking control problem for aerial manipulators. We propose an avian-inspired aerial manipulation system, which includes an avian-inspired robotic arm design, a Recursive Newton-Euler (RNE) method-based nonlinear flight controller, and a coordinated controller with two modes. Compared to existing methods, our proposed approach offers several attractive features. First, the morphological characteristics of avian species are used to determine the size proportion of the multirotor and the robotic arm in the aerial manipulator. Second, the dynamic coupling of the aerial manipulator is addressed by the RNE-based flight controller and a dual-mode coordinated controller. Specifically, under our proposed algorithm, the aerial manipulator can stabilize the end-effector's pose, similar to avian head stabilization. The proposed approach is verified through three numerical experiments. The results show that even when the quadcopter is disturbed by different forces, the position error of the end-effector achieves millimeter-level accuracy, and the attitude error remains within 1 degree. The limitation of this work is not considering aggressive manipulation like that seen in birds. Addressing this through future studies that explore real-world experiments will be a key direction for research.

artificial intelligence, quadcopter, robotic arm, (17 more...)

arXiv.org Artificial Intelligence

2411.10966

Country:

Asia (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Infrastructure & Services (0.54)
Transportation > Air (0.54)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback