AITopics | Ma, Lei

Collaborating Authors

Ma, Lei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Behavioral Conflict Avoidance Between Humans and Quadruped Robots in Shared Environments

Wei, Shuang, Zhang, Muhua, Gan, Yun, Huang, Deqing, Ma, Lei, Yang, Chenguang

arXiv.org Artificial IntelligenceMar-21-2025

Nowadays, robots are increasingly operated in environments shared with humans, where conflicts between human and robot behaviors may compromise safety. This paper presents a proactive behavioral conflict avoidance framework based on the principle of adaptation to trends for quadruped robots that not only ensures the robot's safety but also minimizes interference with human activities. It can proactively avoid potential conflicts with approaching humans or other dynamic objects, whether the robot is stationary or in motion, then swiftly resume its tasks once the conflict subsides. An enhanced approach is proposed to achieve precise human detection and tracking on vibratory robot platform equipped with low-cost hybrid solid-state LiDAR. When potential conflict detected, the robot selects an avoidance point and executes an evasion maneuver before resuming its task. This approach contrasts with conventional methods that remain goal-driven, often resulting in aggressive behaviors, such as forcibly bypassing obstacles and causing conflicts or becoming stuck in deadlock scenarios. The selection of avoidance points is achieved by integrating static and dynamic obstacle to generate a potential field map. The robot then searches for feasible regions within this map and determines the optimal avoidance point using an evaluation function. Experimental results demonstrate that the framework significantly reduces interference with human activities, enhances the safety of both robots and persons.

artificial intelligence, obstacle, robot, (16 more...)

arXiv.org Artificial Intelligence

2503.17014

Country:

Asia > Middle East > UAE (0.14)
Asia > China > Sichuan Province (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.86)

Add feedback

Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions

Zhang, Muhua, Ma, Lei, Wu, Ying, Shen, Kai, Sun, Yongkui, Leung, Henry

arXiv.org Artificial IntelligenceMar-21-2025

This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion and global RRT pruning upon map updates eliminate unreachable frontiers, reducing potential collisions and deadlocks. Adaptive sampling density adjustments, informed by obstacle distribution, enhance exploration coverage potential. For frontier point navigation, a stepwise consistent motion strategy is adopted, wherein the robot strictly drives straight on approximately equidistant line segments in the polyline path and rotates in place at segment junctions. This simplified, decoupled motion pattern improves scan-matching stability and mitigates map drift. For process control, the framework serializes frontier point selection and navigation, avoiding oscillation caused by frequent goal changes in conventional parallelized processes. The waypoint retracing mechanism is introduced to generate repeated observations, triggering loop closure detection and backend optimization in graph-based SLAM, thereby improving map consistency and precision. Experiments in both simulation and real-world scenarios validate the effectiveness of the framework. It achieves improved mapping coverage and precision in more challenging environments compared to baseline 2D exploration algorithms. It also shows robustness in supporting resource-constrained robot platforms and maintaining mapping consistency across various LiDAR field-of-view (FoV) configurations.

artificial intelligence, exploration, frontier point, (14 more...)

arXiv.org Artificial Intelligence

2503.17005

Country:

Europe (0.93)
Asia > China (0.47)
North America > United States (0.46)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.61)

Add feedback

Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update

Li, Qing, Geng, Jiahui, Chen, Zongxiong, Song, Kun, Ma, Lei, Karray, Fakhri

arXiv.org Artificial IntelligenceJan-24-2025

Vision-language models (VLMs) demonstrate strong multimodal capabilities but have been found to be more susceptible to generating harmful content compared to their backbone large language models (LLMs). Our investigation reveals that the integration of images significantly shifts the model's internal activations during the forward pass, diverging from those triggered by textual input. Moreover, the safety alignments of LLMs embedded within VLMs are not sufficiently robust to handle the activations discrepancies, making the models vulnerable to even the simplest jailbreaking attacks. To address this issue, we propose an \textbf{internal activation revision} approach that efficiently revises activations during generation, steering the model toward safer outputs. Our framework incorporates revisions at both the layer and head levels, offering control over the model's generation at varying levels of granularity. In addition, we explore three strategies for constructing positive and negative samples and two approaches for extracting revision vectors, resulting in different variants of our method. Comprehensive experiments demonstrate that the internal activation revision method significantly improves the safety of widely used VLMs, reducing attack success rates by an average of 48.94\%, 34.34\%, 43.92\%, and 52.98\% on SafeBench, Safe-Unsafe, Unsafe, and MM-SafetyBench, respectively, while minimally impacting model helpfulness.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.16378

Country:

Europe (0.93)
Asia (0.68)
North America > United States (0.28)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Aligning Instruction Tuning with Pre-training

Liang, Yiming, Zheng, Tianyu, Du, Xinrun, Zhang, Ge, Qu, Xingwei, Yue, Xiang, Zheng, Chujie, Liu, Jiaheng, Ma, Lei, Chen, Wenhu, Wang, Guoyin, Zhang, Zhaoxiang, Huang, Wenhao, Zhang, Jiajun

arXiv.org Artificial IntelligenceJan-16-2025

Instruction tuning enhances large language models (LLMs) to follow human instructions across diverse tasks, relying on high-quality datasets to guide behavior. However, these datasets, whether manually curated or synthetically generated, are often narrowly focused and misaligned with the broad distributions captured during pre-training, limiting LLM generalization and effective use of pre-trained knowledge. We propose *Aligning Instruction Tuning with Pre-training* (AITP), a method that bridges this gap by identifying coverage shortfalls in instruction-tuning datasets and rewriting underrepresented pre-training data into high-quality instruction-response pairs. This approach enriches dataset diversity while preserving task-specific objectives. Evaluations on three fully open LLMs across eight benchmarks demonstrate consistent performance improvements with AITP. Ablations highlight the benefits of adaptive data selection, controlled rewriting, and balanced integration, emphasizing the importance of aligning instruction tuning with pre-training distributions to unlock the full potential of LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.09368

Country: Asia > Thailand (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling

Yang, Yuguang, Pan, Yu, Yao, Jixun, Zhang, Xiang, Ye, Jianhao, Zhou, Hongbin, Xie, Lei, Ma, Lei, Zhao, Jianjun

arXiv.org Artificial IntelligenceJan-10-2025

Expressive zero-shot voice conversion (VC) is a critical and challenging task that aims to transform the source timbre into an arbitrary unseen speaker while preserving the original content and expressive qualities. Despite recent progress in zero-shot VC, there remains considerable potential for improvements in speaker similarity and speech naturalness. Moreover, existing zero-shot VC systems struggle to fully reproduce paralinguistic information in highly expressive speech, such as breathing, crying, and emotional nuances, limiting their practical applicability. To address these issues, we propose Takin-VC, a novel expressive zero-shot VC framework via adaptive hybrid content encoding and memory-augmented context-aware timbre modeling. Specifically, we introduce an innovative hybrid content encoder that incorporates an adaptive fusion module, capable of effectively integrating quantized features of the pre-trained WavLM and HybridFormer in an implicit manner, so as to extract precise linguistic features while enriching paralinguistic elements. For timbre modeling, we propose advanced memory-augmented and context-aware modules to generate high-quality target timbre features and fused representations that seamlessly align source content with target timbre. To enhance real-time performance, we advocate a conditional flow matching model to reconstruct the Mel-spectrogram of the source speech. Experimental results show that our Takin-VC consistently surpasses state-of-the-art VC systems, achieving notable improvements in terms of speech naturalness, speech expressiveness, and speaker similarity, while offering enhanced inference speed.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.0135

Country:

Europe (0.68)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents

Xing, Yun, Chung, Nhat, Zhang, Jie, Cao, Yue, Tsang, Ivor, Liu, Yang, Ma, Lei, Guo, Qing

arXiv.org Artificial IntelligenceDec-10-2024

Physical adversarial attacks in driving scenarios can expose critical vulnerabilities in visual perception models. However, developing such attacks remains challenging due to diverse real-world backgrounds and the requirement for maintaining visual naturality. Building upon this challenge, we reformulate physical adversarial attacks as a one-shot patch-generation problem. Our approach generates adversarial patches through a deep generative model that considers the specific scene context, enabling direct physical deployment in matching environments. The primary challenge lies in simultaneously achieving two objectives: generating adversarial patches that effectively mislead object detection systems while determining contextually appropriate placement within the scene. We propose MAGIC (Mastering Physical Adversarial Generation In Context), a novel framework powered by multi-modal LLM agents to address these challenges. MAGIC automatically understands scene context and orchestrates adversarial patch generation through the synergistic interaction of language and vision capabilities. MAGIC orchestrates three specialized LLM agents: The adv-patch generation agent (GAgent) masters the creation of deceptive patches through strategic prompt engineering for text-to-image models. The adv-patch deployment agent (DAgent) ensures contextual coherence by determining optimal placement strategies based on scene understanding. The self-examination agent (EAgent) completes this trilogy by providing critical oversight and iterative refinement of both processes. We validate our method on both digital and physical level, \ie, nuImage and manually captured real scenes, where both statistical and visual results prove that our MAGIC is powerful and effectively for attacking wide-used object detection systems.

detector, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.08014

Country:

Asia (0.93)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (0.88)
Government > Military (0.88)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

Zhao, Shengming, Huang, Yuheng, Song, Jiayang, Wang, Zhijie, Wan, Chengcheng, Ma, Lei

arXiv.org Artificial IntelligenceNov-28-2024

Retrieval-Augmented Generation (RAG) is a pivotal technique for enhancing the capability of large language models (LLMs) and has demonstrated promising efficacy across a diverse spectrum of tasks. While LLM-driven RAG systems show superior performance, they face unique challenges in stability and reliability. Their complexity hinders developers' efforts to design, maintain, and optimize effective RAG systems. Therefore, it is crucial to understand how RAG's performance is impacted by its design. In this work, we conduct an early exploratory study toward a better understanding of the mechanism of RAG systems, covering three code datasets, three QA datasets, and two LLMs. We focus on four design factors: retrieval document type, retrieval recall, document selection, and prompt techniques. Our study uncovers how each factor impacts system correctness and confidence, providing valuable insights for developing an accurate and reliable RAG system. Based on these findings, we present nine actionable guidelines for detecting defects and optimizing the performance of RAG systems. We hope our early exploration can inspire further advancements in engineering, improving and maintaining LLM-driven intelligent software systems for greater efficiency and reliability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.19463

Country:

Asia (0.67)
North America > Canada > Alberta (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively

Lu, Yao, Zhang, Peixin, Wang, Jingyi, Ma, Lei, Yang, Xiaoniu, Xuan, Qi

arXiv.org Artificial IntelligenceNov-15-2024

Deep learning has revolutionized computing in many real-world applications, arguably due to its remarkable performance and extreme convenience as an end-to-end solution. However, deep learning models can be costly to train and to use, especially for those large-scale models, making it necessary to optimize the original overly complicated models into smaller ones in scenarios with limited resources such as mobile applications or simply for resource saving. The key question in such model optimization is, how can we effectively identify and measure the redundancy in a deep learning model structure. While several common metrics exist in the popular model optimization techniques to measure the performance of models after optimization, they are not able to quantitatively inform the degree of remaining redundancy. To address the problem, we present a novel testing approach, i.e., RedTest, which proposes a novel testing metric called Model Structural Redundancy Score (MSRS) to quantitatively measure the degree of redundancy in a deep learning model structure. We first show that MSRS is effective in both revealing and assessing the redundancy issues in many state-of-the-art models, which urgently calls for model optimization. Then, we utilize MSRS to assist deep learning model developers in two practical application scenarios: 1) in Neural Architecture Search, we design a novel redundancy-aware algorithm to guide the search for the optimal model structure and demonstrate its effectiveness by comparing it to existing standard NAS practice; 2) in the pruning of large-scale pre-trained models, we prune the redundant layers of pre-trained models with the guidance of layer similarity to derive less redundant ones of much smaller size. Extensive experimental results demonstrate that removing such redundancy has a negligible effect on the model utility.

artificial intelligence, machine learning, redundancy, (20 more...)

arXiv.org Artificial Intelligence

2411.10507

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Hawaii (0.14)
North America > Canada > Alberta (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning from Pattern Completion: Self-supervised Controllable Generation

Chen, Zhiqiang, Fan, Guofan, Gao, Jinying, Ma, Lei, Lei, Bo, Huang, Tiejun, Yu, Shan

arXiv.org Artificial IntelligenceNov-7-2024

The human brain exhibits a strong ability to spontaneously associate different visual attributes of the same or similar visual scene, such as associating sketches and graffiti with real-world visual objects, usually without supervising information. In contrast, in the field of artificial intelligence, controllable generation methods like ControlNet heavily rely on annotated training datasets such as depth maps, semantic segmentation maps, and poses, which limits the method's scalability. Inspired by the neural mechanisms that may contribute to the brain's associative power, specifically the cortical modularization and hippocampal pattern completion, here we propose a self-supervised controllable generation (SCG) framework. Firstly, we introduce an equivariant constraint to promote inter-module independence and intra-module correlation in a modular autoencoder network, thereby achieving functional specialization. Subsequently, based on these specialized modules, we employ a self-supervised pattern completion approach for controllable generation training. Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization, including the modular processing of color, brightness, and edge detection, and exhibits brain-like features including orientation selectivity, color antagonism, and center-surround receptive fields. Through self-supervised training, associative generation capabilities spontaneously emerge in SCG, demonstrating excellent generalization ability to various tasks such as associative generation on painting, sketches, and ancient graffiti. Compared to the previous representative method ControlNet, our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner.Codes are released on Github and Gitee.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2409.18694

Country:

Asia > China (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching

Pan, Yu, Yang, Yuguang, Yao, Jixun, Ye, Jianhao, Zhou, Hongbin, Ma, Lei, Zhao, Jianjun

arXiv.org Artificial IntelligenceNov-4-2024

Zero-shot voice conversion (VC) aims to transform the timbre of a source speaker into any previously unseen target speaker, while preserving the original linguistic content. Despite notable progress, attaining a degree of speaker similarity and naturalness on par with ground truth recordings continues to pose great challenge. In this paper, we propose CTEFM-VC, a zero-shot VC framework that leverages Content-aware Timbre Ensemble modeling and Flow Matching. Specifically, CTEFM-VC disentangles utterances into linguistic content and timbre representations, subsequently utilizing a conditional flow matching model and a vocoder to reconstruct the mel-spectrogram and waveform. To enhance its timbre modeling capability and the naturalness of generated speech, we propose a context-aware timbre ensemble modeling approach that adaptively integrates diverse speaker verification embeddings and enables the joint utilization of linguistic and timbre features through a cross-attention module. Experiments show that our CTEFM-VC system surpasses state-of-the-art VC methods in both speaker similarity and naturalness by at least 18.5% and 7.0%.

large language model, natural language, voice conversion, (18 more...)

arXiv.org Artificial Intelligence

2411.02026

Country: Asia > Japan (0.28)

Genre: Research Report (1.00)

Industry: Media (0.49)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback