AITopics | Zhang, Yuhao

Collaborating Authors

Zhang, Yuhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

D4orm: Multi-Robot Trajectories with Dynamics-aware Diffusion Denoised Deformations

Zhang, Yuhao, Okumura, Keisuke, Woo, Heedo, Shankar, Ajay, Prorok, Amanda

arXiv.org Artificial IntelligenceMar-15-2025

This work presents an optimization method for generating kinodynamically feasible and collision-free multi-robot trajectories that exploits an incremental denoising scheme in diffusion models. Our key insight is that high-quality trajectories can be discovered merely by denoising noisy trajectories sampled from a distribution. This approach has no learning component, relying instead on only two ingredients: a dynamical model of the robots to obtain feasible trajectories via rollout, and a score function to guide denoising with Monte Carlo gradient approximation. The proposed framework iteratively optimizes the deformation from the previous round with this denoising process, allows \textit{anytime} refinement as time permits, supports different dynamics, and benefits from GPU acceleration. Our evaluations for differential-drive and holonomic teams with up to 16 robots in 2D and 3D worlds show its ability to discover high-quality solutions faster than other black-box optimization methods such as MPPI, approximately three times faster in a 3D holonomic case with 16 robots. As evidence for feasibility, we demonstrate zero-shot deployment of the planned trajectories on eight multirotors.

artificial intelligence, optimization problem, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2503.12204

Genre: Research Report (0.64)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Efficient Reachability Analysis for Convolutional Neural Networks Using Hybrid Zonotopes

Zhang, Yuhao, Xu, Xiangru

arXiv.org Artificial IntelligenceMar-13-2025

Feedforward neural networks are widely used in autonomous systems, particularly for control and perception tasks within the system loop. However, their vulnerability to adversarial attacks necessitates formal verification before deployment in safety-critical applications. Existing set propagation-based reachability analysis methods for feedforward neural networks often struggle to achieve both scalability and accuracy. This work presents a novel set-based approach for computing the reachable sets of convolutional neural networks. The proposed method leverages a hybrid zonotope representation and an efficient neural network reduction technique, providing a flexible trade-off between computational complexity and approximation accuracy. Numerical examples are presented to demonstrate the effectiveness of the proposed approach.

artificial intelligence, ffnn, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.1084

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.64)

Industry:

Government (0.34)
Information Technology > Security & Privacy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning

Zhang, Borong, Zhang, Yuhao, Ji, Jiaming, Lei, Yingshan, Dai, Josef, Chen, Yuanpei, Yang, Yaodong

arXiv.org Artificial IntelligenceMar-5-2025

Vision-language-action models (VLAs) have shown great potential as generalist robot policies. However, these models pose urgent safety challenges during deployment, including the risk of physical harm to the environment, the robot itself, and humans. How can safety be explicitly incorporated into VLAs? In this work, we propose SafeVLA, a novel algorithm designed to integrate safety into VLAs, ensuring the protection of the environment, robot hardware and humans in real-world settings. SafeVLA effectively balances safety and task performance by employing large-scale constrained learning within simulated environments. We demonstrate that SafeVLA outperforms the current state-of-the-art method in both safety and task performance, achieving average improvements of 83.58% and 3.85%, respectively, in simulation. By prioritizing safety, our approach eliminates high-risk behaviors and reduces the upper bound of unsafe behaviors to 1/35 of that in the current state-of-the-art, thereby significantly mitigating long-tail risks. Furthermore, the learned safety constraints generalize to diverse, unseen scenarios, including multiple out-of-distribution perturbations and tasks. Our data, models and newly proposed benchmark environment are available at https://sites.google.com/view/pku-safevla.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.0348

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Soundwave: Less is More for Speech-Text Alignment in LLMs

Zhang, Yuhao, Liu, Zhiheng, Bu, Fan, Zhang, Ruiyu, Wang, Benyou, Li, Haizhou

arXiv.org Artificial IntelligenceFeb-18-2025

Existing end-to-end speech large language models (LLMs) usually rely on large-scale annotated data for training, while data-efficient training has not been discussed in depth. We focus on two fundamental problems between speech and text: the representation space gap and sequence length inconsistency. We propose Soundwave, which utilizes an efficient training strategy and a novel architecture to address these issues. Results show that Soundwave outperforms the advanced Qwen2-Audio in speech translation and AIR-Bench speech tasks, using only one-fiftieth of the training data. Further analysis shows that Soundwave still retains its intelligence during conversation. The project is available at https://github.com/FreedomIntelligence/Soundwave.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.129

Country:

Europe (1.00)
Asia > China (0.46)
Oceania > Australia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Shan, Weiqiao, Zhang, Yuhao, Han, Yuchen, Li, Bei, Zhao, Xiaofeng, Li, Yuang, Zhang, Min, Yang, Hao, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJan-14-2025

Recent advancements have highlighted the efficacy of self-supervised learning (SSL) features in various speech-related tasks, providing lightweight and versatile multi-view speech representations. However, our study reveals that while SSL features expedite model convergence, they conflict with traditional spectral features like FBanks in terms of update directions. In response, we propose a novel generalized feature fusion framework grounded in conditional computation, featuring a gradient-sensitive gating network and a multi-stage dropout strategy. This framework mitigates feature conflicts and bolsters model robustness to multi-view input features. By integrating SSL and spectral features, our approach accelerates convergence and maintains performance on par with spectral models across multiple speech translation tasks on the MUSTC dataset.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.08057

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

Roadmap towards Superhuman Speech Understanding using Large Language Models

Bu, Fan, Zhang, Yuhao, Wang, Xidong, Wang, Benyou, Liu, Qun, Li, Haizhou

arXiv.org Artificial IntelligenceOct-17-2024

The success of large language models (LLMs) has prompted efforts to integrate speech and audio data, aiming to create general foundation models capable of processing both textual and non-textual inputs. Recent advances, such as GPT-4o, highlight the potential for end-to-end speech LLMs, which preserves non-semantic information and world knowledge for deeper speech understanding. To guide the development of speech LLMs, we propose a five-level roadmap, ranging from basic automatic speech recognition (ASR) to advanced superhuman models capable of integrating non-semantic information with abstract acoustic knowledge for complex tasks. Moreover, we design a benchmark, SAGI Bechmark, that standardizes critical aspects across various tasks in these five levels, uncovering challenges in using abstract acoustic knowledge and completeness of capability. Our findings reveal gaps in handling paralinguistic cues and abstract acoustic knowledge, and we offer future directions. This paper outlines a roadmap for advancing speech LLMs, introduces a benchmark for evaluation, and provides key insights into their current limitations and potential.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.13268

Country:

Europe (1.00)
Asia > China (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.70)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems

Radulov, Nikola, Zhang, Yuhao, Bujanca, Mihai, Ye, Ruiqi, Luján, Mikel

arXiv.org Artificial IntelligenceOct-5-2024

We propose SLAMFuse, an open-source SLAM benchmarking framework that provides consistent crossplatform environments for evaluating multi-modal SLAM algorithms, along with tools for data fuzzing, failure detection, and diagnosis across different datasets. Our framework introduces a fuzzing mechanism to test the resilience of SLAM algorithms against dataset perturbations. This enables the assessment of pose estimation accuracy under varying conditions and identifies critical perturbation thresholds. SLAMFuse improves diagnostics with failure detection and analysis tools, examining algorithm behaviour against dataset characteristics. SLAMFuse uses Docker to ensure reproducible testing conditions across diverse datasets and systems by streamlining dependency management. Emphasizing the importance of reproducibility and introducing advanced tools for algorithm evaluation and performance diagnosis, our work sets a new precedent for reliable benchmarking of SLAM systems. We provide ready-to-use docker compatible versions of the algorithms and datasets used in the experiments, together with guidelines for integrating and benchmarking new algorithms. Code is available at https://github.com/nikolaradulov/slamfuse

algorithm, artificial intelligence, dataset, (17 more...)

arXiv.org Artificial Intelligence

2410.04242

Country: North America > United States (0.29)

Genre: Research Report (0.64)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness

Zhang, Yuhao, Albarghouthi, Aws, D'Antoni, Loris

arXiv.org Artificial IntelligenceMay-27-2024

This paper reveals a key insight that a one-layer decoder-only Transformer is equivalent to a two-layer Recurrent Neural Network (RNN). Building on this insight, we propose ARC-Tran, a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces. Compared to ARC-Tran, current robustness verification techniques are limited either to specific and length-preserving perturbations like word substitutions or to recursive models like LSTMs. ARC-Tran addresses these limitations by meticulously managing position encoding to prevent mismatches and by utilizing our key insight to achieve precise and scalable verification. Our evaluation shows that ARC-Tran (1) trains models more robust to arbitrary perturbation spaces than those produced by existing techniques and (2) shows high certification accuracy of the resulting models.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Artificial Intelligence

2405.17361

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)

Genre:

Research Report (0.70)
Overview (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NGD-SLAM: Towards Real-Time SLAM for Dynamic Environments without GPU

Zhang, Yuhao

arXiv.org Artificial IntelligenceMay-12-2024

Accurate and robust camera tracking in dynamic environments presents a significant challenge for visual SLAM (Simultaneous Localization and Mapping). Recent progress in this field often involves the use of deep learning techniques to generate mask for dynamic objects, which usually require GPUs to operate in real-time (30 fps). Therefore, this paper proposes a novel visual SLAM system for dynamic environments that obtains real-time performance on CPU by incorporating a mask prediction mechanism, which allows the deep learning method and the camera tracking to run entirely in parallel at different frequencies such that neither waits for the result from the other. Based on this, it further introduces a dual-stage optical flow tracking approach and employs a hybrid usage of optical flow and ORB features, which significantly enhance the efficiency and robustness of the system. Compared with state-of-the-art methods, this system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 56 fps on a single laptop CPU without any hardware acceleration, thus proving that deep learning methods are still feasible for dynamic SLAM even without GPU support. Based on the available information, this is the first SLAM system to achieve this.

artificial intelligence, dynamic environment, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2405.07392

Country: Asia > Japan (0.14)

Genre: Research Report (0.70)

Industry: Semiconductors & Electronics (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodeFort: Robust Training for Code Generation Models

Zhang, Yuhao, Wang, Shiqi, Qian, Haifeng, Wang, Zijian, Shang, Mingyue, Liu, Linbo, Gouda, Sanjay Krishna, Ray, Baishakhi, Ramanathan, Murali Krishna, Ma, Xiaofei, Deoras, Anoop

arXiv.org Artificial IntelligenceApr-11-2024

Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models. Improving the robustness of code generation models is crucial to better user experience when these models are deployed in real-world applications. However, existing efforts have not addressed this issue for code generation models. To fill this gap, we propose CodeFort, a framework to improve the robustness of code generation models, generalizing a large variety of code perturbations to enrich the training data and enabling various robust training strategies, mixing data augmentation, batch augmentation, adversarial logits pairing, and contrastive learning, all carefully designed to support high-throughput training. Extensive evaluations show that we improve the average robust pass rates of baseline CodeGen models from 14.79 to 21.74. Notably, the improvement in robustness against code-syntax perturbations is evidenced by a significant decrease in pass rate drop from 95.04% to 53.35%

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.01567

Country:

North America > United States > Wisconsin (0.14)
North America > United States > New York (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback