AITopics | Wu, Sihao

Collaborating Authors

Wu, Sihao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing

Wu, Sihao, Si, Xiaonan, Xing, Chi, Wang, Jianhong, Jin, Gaojie, Cheng, Guangliang, Zhang, Lijun, Huang, Xiaowei

arXiv.org Artificial IntelligenceFeb-10-2025

The integration of preference alignment with diffusion models (DMs) has emerged as a transformative approach to enhance image generation and editing capabilities. Although integrating diffusion models with preference alignment strategies poses significant challenges for novices at this intersection, comprehensive and systematic reviews of this subject are still notably lacking. To bridge this gap, this paper extensively surveys preference alignment with diffusion models in image generation and editing. First, we systematically review cutting-edge optimization techniques such as reinforcement learning with human feedback (RLHF), direct preference optimization (DPO), and others, highlighting their pivotal role in aligning preferences with DMs. Then, we thoroughly explore the applications of aligning preferences with DMs in autonomous driving, medical imaging, robotics, and more. Finally, we comprehensively discuss the challenges of preference alignment with DMs. To our knowledge, this is the first survey centered on preference alignment with DMs, providing insights to drive future innovation in this dynamic area.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Artificial Intelligence

2502.07829

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Information Technology (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Enhancing Robust Fairness via Confusional Spectral Regularization

Jin, Gaojie, Wu, Sihao, Liu, Jiaxu, Huang, Tianjin, Mu, Ronghui

arXiv.org Artificial IntelligenceJan-22-2025

Recent research has highlighted a critical issue known as "robust fairness", where robust accuracy varies significantly across different classes, undermining the reliability of deep neural networks (DNNs). A common approach to address this has been to dynamically reweight classes during training, giving more weight to those with lower empirical robust performance. However, we find there is a divergence of class-wise robust performance between training set and testing set, which limits the effectiveness of these explicit reweighting methods, indicating the need for a principled alternative. In this work, we derive a robust generalization bound for the worst-class robust error within the PAC-Bayesian framework, accounting for unknown data distributions. Our analysis shows that the worst-class robust error is influenced by two main factors: the spectral norm of the empirical robust confusion matrix and the information embedded in the model and training set. While the latter has been extensively studied, we propose a novel regularization technique targeting the spectral norm of the robust confusion matrix to improve worst-class robust accuracy and enhance robust fairness. Deep neural networks, spanning a diverse array of domains and applications, have shown impressive abilities to learn from training data and generalize effectively to new, unseen data. However, recent studies have uncovered a notable weakness in these DNNs - their vulnerability to subtle, often undetectable "adversarial attacks" (Biggio et al., 2013; Szegedy et al., 2013). It has been discovered that even slight perturbations to the input, typically imperceptible to humans, can drastically mislead the networks, resulting in significant prediction errors (Goodfellow et al., 2015; Wu et al., 2020a).

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.13273

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent

He, Linfeng, Sun, Yiming, Wu, Sihao, Liu, Jiaxu, Huang, Xiaowei

arXiv.org Artificial IntelligenceNov-8-2024

In this paper, we propose a novel framework for enhancing visual comprehension in autonomous driving systems by integrating visual language models (VLMs) with additional visual perception module specialised in object detection. We extend the Llama-Adapter architecture by incorporating a YOLOS-based detection network alongside the CLIP perception network, addressing limitations in object detection and localisation. Our approach introduces camera ID-separators to improve multi-view processing, crucial for comprehensive environmental awareness. Experiments on the DriveLM visual question answering challenge demonstrate significant improvements over baseline models, with enhanced performance in ChatGPT scores, BLEU scores, and CIDEr metrics, indicating closeness of model answer to ground truth. Our method represents a promising step towards more capable and interpretable autonomous driving systems. Possible safety enhancement enabled by detection modality is also discussed.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.05898

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.96)
Transportation > Ground > Road (0.86)
Information Technology > Robotics & Automation (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Wu, Sihao, Liu, Jiaxu, Yin, Xiangyu, Cheng, Guangliang, Zhao, Xingyu, Fang, Meng, Yi, Xinping, Huang, Xiaowei

arXiv.org Artificial IntelligenceOct-20-2024

The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require lengthy inference times and face challenges in interacting with real-time autonomous driving environments. A key open question is whether we can effectively leverage the knowledge from LLMs to train an efficient and robust Reinforcement Learning (RL) agent. This paper introduces RAPID, a novel \underline{\textbf{R}}obust \underline{\textbf{A}}daptive \underline{\textbf{P}}olicy \underline{\textbf{I}}nfusion and \underline{\textbf{D}}istillation framework, which trains specialized mix-of-policy RL agents using data synthesized by an LLM-based driving agent and online adaptation. RAPID features three key designs: 1) utilization of offline data collected from an LLM agent to distil expert knowledge into RL policies for faster real-time inference; 2) introduction of robust distillation in RL to inherit both performance and robustness from LLM-based teacher; and 3) employment of a mix-of-policy approach for joint decision decoding with a policy adapter. Through fine-tuning via online environment interaction, RAPID reduces the forgetting of LLM knowledge while maintaining adaptability to different tasks. Extensive experiments demonstrate RAPID's capability to effectively integrate LLM knowledge into scaled-down RL policies in an efficient, adaptable, and robust way. Code and checkpoints will be made publicly available upon acceptance.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.12568

Country: North America (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.92)
Automobiles & Trucks (0.92)
Information Technology > Robotics & Automation (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Liu, Jiaxu, Yin, Xiangyu, Wu, Sihao, Wang, Jianhong, Fang, Meng, Yi, Xinping, Huang, Xiaowei

arXiv.org Artificial IntelligenceJun-17-2024

With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, effectively reducing toxicity in responses from target LLMs. The sentinel model naturally overcomes the \textit{parameter inefficiency} and \textit{limited model accessibility} for fine-tuning large target models. We employ an interleaved training regimen using Proximal Policy Optimization (PPO) to optimize both red team and sentinel models dynamically, incorporating a value head-sharing mechanism inspired by the multi-agent centralized critic to manage the complex interplay between agents. Our extensive experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs, even when dealing with larger models like \texttt{Llama-2}, \texttt{GPT-3.5} and \texttt{Stable-Diffusion}, highlighting the potential of our framework in enhancing safety and robustness in various applications.

large language model, machine learning, target model, (16 more...)

arXiv.org Artificial Intelligence

2405.12604

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

Liu, Jiaxu, Yi, Xinping, Wu, Sihao, Yin, Xiangyu, Zhang, Tianle, Huang, Xiaowei, Jin, Shi

arXiv.org Artificial IntelligenceJun-7-2024

While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting node-wise attention undertake the role of diffusivity within the Hyperbolic Neural PDE (HPDE). By introducing theoretical principles \textit{e.g.,} field and flow, gradient, divergence, and diffusivity on a non-Euclidean manifold for HPDE integration, we discuss both implicit and explicit discretization schemes to formulate numerical HPDE solvers. Further, we propose the Hyperbolic Graph Diffusion Equation (HGDE) -- a flexible vector flow function that can be integrated to obtain expressive hyperbolic node embeddings. By analyzing potential energy decay of embeddings, we demonstrate that HGDE is capable of modeling both low- and high-order proximity with the benefit of local-global diffusivity functions. Experiments on node classification and link prediction and image-text classification tasks verify the superiority of the proposed method, which consistently outperforms various competitive models by a significant margin.

artificial intelligence, continuous geometry-aware graph diffusion, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2406.01282

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ReRoGCRL: Representation-based Robustness in Goal-Conditioned Reinforcement Learning

Yin, Xiangyu, Wu, Sihao, Liu, Jiaxu, Fang, Meng, Zhao, Xingyu, Huang, Xiaowei, Ruan, Wenjie

arXiv.org Artificial IntelligenceDec-19-2023

While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach inspired by the adversarial contrastive attack. Unlike existing attacks in RL, it only necessitates information from the policy function and can be seamlessly implemented during deployment. Then, to mitigate the vulnerability of existing GCRL algorithms, we introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer to improve the adversarial robustness of the underlying RL agent against various types of perturbations. Extensive experiments validate the superior performance of our attack and defence methods across multiple state-of-the-art GCRL algorithms. Our tool ReRoGCRL is available at https://github.com/TrustAI/ReRoGCRL.

machine learning, reinforcement learning, simsr, (15 more...)

arXiv.org Artificial Intelligence

2312.07392

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Huang, Xiaowei, Ruan, Wenjie, Huang, Wei, Jin, Gaojie, Dong, Yi, Wu, Changshun, Bensalem, Saddek, Mu, Ronghui, Qi, Yi, Zhao, Xingyu, Cai, Kaiwen, Zhang, Yanghao, Wu, Sihao, Xu, Peipei, Wu, Dengyu, Freitas, Andre, Mustafa, Mustafa A.

arXiv.org Artificial IntelligenceAug-27-2023

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

arxiv preprint arxiv, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.11391

Country:

Asia (0.67)
Europe > United Kingdom (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Promising Solution (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback