AITopics | Zhao, Xingyu

Collaborating Authors

Zhao, Xingyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Zhang, Hao, Chen, Wei, Zhao, Xingyu, Qi, Jianpeng, Jiang, Guiyuan, Yu, Yanwei

arXiv.org Artificial IntelligenceMar-19-2025

Trajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data. In this work, we propose a novel $\underline{Scal}$abl$\underline{e}$ Trajectory-User Linking with dual-stream representation networks for large-scale $\underline{TUL}$ problem, named ScaleTUL. Specifically, ScaleTUL generates two views using temporal and spatial augmentations to exploit supervised contrastive learning framework to effectively capture the irregularities of trajectories. In each view, a dual-stream trajectory encoder, consisting of a long-term encoder and a short-term encoder, is designed to learn unified trajectory representations that fuse different temporal-spatial dependencies. Then, a TUL layer is used to associate the trajectories with the corresponding users in the representation space using a two-stage training model. Experimental results on check-in mobility datasets from three real-world cities and the nationwide U.S. demonstrate the superiority of ScaleTUL over state-of-the-art baselines for large-scale TUL tasks.

data mining, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2503.15002

Country:

Asia > China (0.46)
North America > United States (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Yin, Xiangyu, Qi, Yi, Hu, Jinwei, Chen, Zhen, Dong, Yi, Zhao, Xingyu, Huang, Xiaowei, Ruan, Wenjie

arXiv.org Artificial IntelligenceMar-13-2025

Vision Language Models (VLMs) have demonstrated impressive inference capabilities, but remain vulnerable to jailbreak attacks that can induce harmful or unethical responses. Existing defence methods are predominantly white-box approaches that require access to model parameters and extensive modifications, making them costly and impractical for many real-world scenarios. Although some black-box defences have been proposed, they often impose input constraints or require multiple queries, limiting their effectiveness in safety-critical tasks such as autonomous driving. To address these challenges, we propose a novel black-box defence framework called \textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages (\textbf{TAIJI}). TAIJI leverages key phrase-based textual anchoring to enhance the model's ability to assess and mitigate the harmful content embedded within both visual and textual prompts. Unlike existing methods, TAIJI operates effectively with a single query during inference, while preserving the VLM's performance on benign tasks. Extensive experiments demonstrate that TAIJI significantly enhances the safety and reliability of VLMs, providing a practical and efficient solution for real-world deployment.

artificial intelligence, natural language, vlm, (18 more...)

arXiv.org Artificial Intelligence

2503.10872

Genre: Research Report (0.64)

Industry:

Transportation (0.75)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide

Zhao, Xingyu

arXiv.org Artificial IntelligenceMar-8-2025

Deep learning (DL) has demonstrated significant potential across various safety-critical applications, yet ensuring its robustness remains a key challenge. While adversarial robustness has been extensively studied in worst-case scenarios, probabilistic robustness (PR) offers a more practical perspective by quantifying the likelihood of failures under stochastic perturbations. This paper provides a concise yet comprehensive overview of PR, covering its formal definitions, evaluation and enhancement methods. We introduce a reformulated ''min-max'' optimisation framework for adversarial training specifically designed to improve PR. Furthermore, we explore the integration of PR verification evidence into system-level safety assurance, addressing challenges in translating DL model-level robustness to system-level claims. Finally, we highlight open research questions, including benchmarking PR evaluation methods, extending PR to generative AI tasks, and developing rigorous methodologies and case studies for system-level integration.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Artificial Intelligence

2502.14833

Country: Europe > United Kingdom (0.14)

Genre:

Research Report (1.00)
Overview (0.88)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Huang, Zhenglin, Hu, Jinwei, Li, Xiangtai, He, Yiwei, Zhao, Xingyu, Peng, Bei, Wu, Baoyuan, Huang, Xiaowei, Cheng, Guangliang

arXiv.org Artificial IntelligenceDec-5-2024

The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages: (1) extensive volume, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) broad diversity, encompassing fully synthetic and tampered images across various classes, and (3) elevated realism, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA (Social media Image Detection, localization, and explanation Assistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.

detection, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.04292

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Passenger (1.00)
Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Sports (0.93)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Wu, Sihao, Liu, Jiaxu, Yin, Xiangyu, Cheng, Guangliang, Zhao, Xingyu, Fang, Meng, Yi, Xinping, Huang, Xiaowei

arXiv.org Artificial IntelligenceOct-20-2024

The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require lengthy inference times and face challenges in interacting with real-time autonomous driving environments. A key open question is whether we can effectively leverage the knowledge from LLMs to train an efficient and robust Reinforcement Learning (RL) agent. This paper introduces RAPID, a novel \underline{\textbf{R}}obust \underline{\textbf{A}}daptive \underline{\textbf{P}}olicy \underline{\textbf{I}}nfusion and \underline{\textbf{D}}istillation framework, which trains specialized mix-of-policy RL agents using data synthesized by an LLM-based driving agent and online adaptation. RAPID features three key designs: 1) utilization of offline data collected from an LLM agent to distil expert knowledge into RL policies for faster real-time inference; 2) introduction of robust distillation in RL to inherit both performance and robustness from LLM-based teacher; and 3) employment of a mix-of-policy approach for joint decision decoding with a policy adapter. Through fine-tuning via online environment interaction, RAPID reduces the forgetting of LLM knowledge while maintaining adaptability to different tasks. Extensive experiments demonstrate RAPID's capability to effectively integrate LLM knowledge into scaled-down RL policies in an efficient, adaptable, and robust way. Code and checkpoints will be made publicly available upon acceptance.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.12568

Country: North America (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.92)
Automobiles & Trucks (0.92)
Information Technology > Robotics & Automation (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey

Zhang, Yi, Chen, Zhen, Cheng, Chih-Hong, Ruan, Wenjie, Huang, Xiaowei, Zhao, Dezong, Flynn, David, Khastgir, Siddartha, Zhao, Xingyu

arXiv.org Artificial IntelligenceSep-26-2024

Text-to-Image (T2I) Diffusion Models (DMs) have garnered widespread attention for their impressive advancements in image generation. However, their growing popularity has raised ethical and social concerns related to key non-functional properties of trustworthiness, such as robustness, fairness, security, privacy, factuality, and explainability, similar to those in traditional deep learning (DL) tasks. Conventional approaches for studying trustworthiness in DL tasks often fall short due to the unique characteristics of T2I DMs, e.g., the multi-modal nature. Given the challenge, recent efforts have been made to develop new methods for investigating trustworthiness in T2I DMs via various means, including falsification, enhancement, verification \& validation and assessment. However, there is a notable lack of in-depth analysis concerning those non-functional properties and means. In this survey, we provide a timely and focused review of the literature on trustworthy T2I DMs, covering a concise-structured taxonomy from the perspectives of property, means, benchmarks and applications. Our review begins with an introduction to essential preliminaries of T2I DMs, and then we summarise key definitions/metrics specific to T2I tasks and analyses the means proposed in recent literature based on these definitions/metrics. Additionally, we review benchmarks and domain applications of T2I DMs. Finally, we highlight the gaps in current research, discuss the limitations of existing methods, and propose future research directions to advance the development of trustworthy T2I DMs. Furthermore, we keep up-to-date updates in this field to track the latest developments and maintain our GitHub repository at: https://github.com/wellzline/Trustworthy_T2I_DMs

artificial intelligence, dms, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2409.18214

Country: Europe (0.92)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

Formal Specification, Assessment, and Enforcement of Fairness for Generative AIs

Cheng, Chih-Hong, Wu, Changshun, Ruess, Harald, Zhao, Xingyu, Bensalem, Saddek

arXiv.org Artificial IntelligenceMay-6-2024

Reinforcing or even exacerbating societal biases and inequalities will increase significantly as generative AI increasingly produces useful artifacts, from text to images and beyond, for the real world. We address these issues by formally characterizing the notion of fairness for generative AI as a basis for monitoring and enforcing fairness. We define two levels of fairness using the notion of infinite sequences of abstractions of AI-generated artifacts such as text or images. The first is the fairness demonstrated on the generated sequences, which is evaluated only on the outputs while agnostic to the prompts and models used. The second is the inherent fairness of the generative AI model, which requires that fairness be manifested when input prompts are neutral, that is, they do not explicitly instruct the generative AI to produce a particular type of output. We also study relative intersectional fairness to counteract the combinatorial explosion of fairness when considering multiple categories together with lazy fairness enforcement. Finally, fairness monitoring and enforcement are tested against some current generative AI models.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.16663

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

Zhang, Yi, Tang, Yun, Ruan, Wenjie, Huang, Xiaowei, Khastgir, Siddartha, Jennings, Paul, Zhao, Xingyu

arXiv.org Artificial IntelligenceFeb-23-2024

Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.

artificial intelligence, machine learning, uz 4, (15 more...)

arXiv.org Artificial Intelligence

2402.15429

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration

Cheng, Chih-Hong, Stöckel, Paul, Zhao, Xingyu

arXiv.org Artificial IntelligenceFeb-10-2024

Modeling and calibrating the fidelity of synthetic data is paramount in shaping the future of safe and reliable self-driving technology by offering a cost-effective and scalable alternative to real-world data collection. We focus on its role in safety-critical applications, introducing four types of instance-level fidelity that go beyond mere visual input characteristics. The aim is to align synthetic data with real-world safety issues. We suggest an optimization method to refine the synthetic data generator, reducing fidelity gaps identified by the DNN-based component. Our findings show this tuning enhances the correlation between safety-critical errors in synthetic and real images.

artificial intelligence, machine learning, synthetic data generator, (14 more...)

arXiv.org Artificial Intelligence

2402.07031

Country: Europe (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Automobiles & Trucks (0.95)
Information Technology > Robotics & Automation (0.67)
Transportation > Ground > Road (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.90)

Add feedback

Building Guardrails for Large Language Models

Dong, Yi, Mu, Ronghui, Jin, Gaojie, Qi, Yi, Hu, Jinwei, Zhao, Xingyu, Meng, Jie, Ruan, Wenjie, Huang, Xiaowei

arXiv.org Artificial IntelligenceFeb-2-2024

As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.01822

Country:

Asia (0.92)
North America > United States (0.46)
Europe > United Kingdom > England (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback