AITopics | Tao, Xiaofeng

Collaborating Authors

Tao, Xiaofeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices

Ma, Jian, Lyu, Xinchen, Jiang, Jun, Cui, Qimei, Yao, Haipeng, Tao, Xiaofeng

arXiv.org Artificial IntelligenceMar-23-2025

Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents. However, fine-tuning LLMs on resource-constrained edge devices faces significant challenges, including excessive computation overhead, device heterogeneity, and data imbalance. This paper proposes SplitFrozen, a split learning framework that enables efficient LLM fine-tuning by strategically freezing device-side model layers while centralizing parameter-efficient fine-tuning on the server. Our framework partitions LLMs into device-side frozen layers and server-side fine-tuning layers, where heterogeneous resource-constrained devices execute only forward propagation. To minimize server-side training costs, we integrate Low-Rank Adaptation (LoRA) into the server-side layers. A pipeline parallelism strategy further optimizes training efficiency by decoupling device-server computations and leveraging decomposed backward propagation. Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4\% model accuracy under extremely imbalanced data, while reducing up to 86.8\% device-side computations and 50.2\% total training time. Experiments also validate the scalability of SplitFrozen on content generation task using Llama-3.2 model on GSM8K dataset.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.18986

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions

Xu, Jingxin, Nan, Guoshun, Guan, Sheng, Leng, Sicong, Liu, Yilian, Wang, Zixiao, Ma, Yuyang, Zhou, Zhili, Hou, Yanzhao, Tao, Xiaofeng

arXiv.org Artificial IntelligenceFeb-8-2025

Recent AI agents, such as ChatGPT and LLaMA, primarily rely on instruction tuning and reinforcement learning to calibrate the output of large language models (LLMs) with human intentions, ensuring the outputs are harmless and helpful. Existing methods heavily depend on the manual annotation of high-quality positive samples, while contending with issues such as noisy labels and minimal distinctions between preferred and dispreferred response data. However, readily available toxic samples with clear safety distinctions are often filtered out, removing valuable negative references that could aid LLMs in safety alignment. In response, we propose PT-ALIGN, a novel safety self-alignment approach that minimizes human supervision by automatically refining positive and toxic samples and performing fine-grained dual instruction tuning. Positive samples are harmless responses, while toxic samples deliberately contain extremely harmful content, serving as a new supervisory signals. Specifically, we utilize LLM itself to iteratively generate and refine training instances by only exploring fewer than 50 human annotations. We then employ two losses, i.e., maximum likelihood estimation (MLE) and fine-grained unlikelihood training (UT), to jointly learn to enhance the LLM's safety. The MLE loss encourages an LLM to maximize the generation of harmless content based on positive samples. Conversely, the fine-grained UT loss guides the LLM to minimize the output of harmful words based on negative samples at the token-level, thereby guiding the model to decouple safety from effectiveness, directing it toward safer fine-tuning objectives, and increasing the likelihood of generating helpful and reliable content. Experiments on 9 popular open-source LLMs demonstrate the effectiveness of our PT-ALIGN for safety alignment, while maintaining comparable levels of helpfulness and usefulness.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.08657

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Overview of AI and Communication for 6G Network: Fundamentals, Challenges, and Future Research Opportunities

Cui, Qimei, You, Xiaohu, Ni, Wei, Nan, Guoshun, Zhang, Xuefei, Zhang, Jianhua, Lyu, Xinchen, Ai, Ming, Tao, Xiaofeng, Feng, Zhiyong, Zhang, Ping, Wu, Qingqing, Tao, Meixia, Huang, Yongming, Huang, Chongwen, Liu, Guangyi, Peng, Chenghui, Pan, Zhiwen, Sun, Tao, Niyato, Dusit, Chen, Tao, Khan, Muhammad Khurram, Jamalipour, Abbas, Guizani, Mohsen, Yuen, Chau

arXiv.org Artificial IntelligenceJan-5-2025

With the growing demand for seamless connectivity and intelligent communication, the integration of artificial intelligence (AI) and sixth-generation (6G) communication networks has emerged as a transformative paradigm. By embedding AI capabilities across various network layers, this integration enables optimized resource allocation, improved efficiency, and enhanced system robust performance, particularly in intricate and dynamic environments. This paper presents a comprehensive overview of AI and communication for 6G networks, with a focus on emphasizing their foundational principles, inherent challenges, and future research opportunities. We first review the integration of AI and communications in the context of 6G, exploring the driving factors behind incorporating AI into wireless communications, as well as the vision for the convergence of AI and 6G. The discourse then transitions to a detailed exposition of the envisioned integration of AI within 6G networks, delineated across three progressive developmental stages. The first stage, AI for Network, focuses on employing AI to augment network performance, optimize efficiency, and enhance user service experiences. The second stage, Network for AI, highlights the role of the network in facilitating and buttressing AI operations and presents key enabling technologies, such as digital twins for AI and semantic communication. In the final stage, AI as a Service, it is anticipated that future 6G networks will innately provide AI functions as services, supporting application scenarios like immersive communication and intelligent industrial robots. In addition, we conduct an in-depth analysis of the critical challenges faced by the integration of AI and communications in 6G. Finally, we outline promising future research opportunities that are expected to drive the development and refinement of AI and 6G communications.

large language model, machine learning, real time system, (28 more...)

arXiv.org Artificial Intelligence

2412.14538

Country:

Asia > China (1.00)
North America > United States > New York > New York County > New York City (0.13)
North America > United States > California > San Francisco County > San Francisco (0.13)
(2 more...)

Genre:

Overview > Innovation (0.92)
Research Report > Promising Solution (0.92)

Industry:

Transportation > Ground > Road (1.00)
Telecommunications > Networks (1.00)
Law (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(11 more...)

Add feedback

Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly

Du, Hang, Nan, Guoshun, Qian, Jiawen, Wu, Wangchenhui, Deng, Wendi, Mu, Hanqing, Chen, Zhenyan, Mao, Pengxuan, Tao, Xiaofeng, Liu, Jun

arXiv.org Artificial IntelligenceDec-9-2024

Recent advancements in video anomaly understanding (VAU) have opened the door to groundbreaking applications in various fields, such as traffic monitoring and industrial automation. While the current benchmarks in VAU predominantly emphasize the detection and localization of anomalies. Here, we endeavor to delve deeper into the practical aspects of VAU by addressing the essential questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we introduce a comprehensive benchmark for Exploring the Causation of Video Anomalies (ECVA). Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. Specifically, each instance of our ECVA involves three sets of human annotations to indicate "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. Building upon this foundation, we propose a novel prompt-based methodology that serves as a baseline for tackling the intricate challenges posed by ECVA. We utilize "hard prompt" to guide the model to focus on the critical parts related to video anomaly segments, and "soft prompt" to establish temporal and spatial relationships within these anomaly segments. Furthermore, we propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA. This metric leverages the unique features of the ECVA dataset to provide a more comprehensive and reliable assessment of various video large language models. We demonstrate the efficacy of our approach through rigorous experimental analysis and delineate possible avenues for further investigation into the comprehension of video anomaly causation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.07183

Country: Asia > China (0.31)

Genre:

Overview (0.66)
Research Report (0.64)

Industry: Law Enforcement & Public Safety > Fire & Emergency Services (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Adaptive Federated Learning in Heterogeneous Wireless Networks with Independent Sampling

Geng, Jiaxiang, Hou, Yanzhao, Tao, Xiaofeng, Wang, Juncheng, Luo, Bing

arXiv.org Artificial IntelligenceMay-13-2024

Federated Learning (FL) algorithms commonly sample a random subset of clients to address the straggler issue and improve communication efficiency. While recent works have proposed various client sampling methods, they have limitations in joint system and data heterogeneity design, which may not align with practical heterogeneous wireless networks. In this work, we advocate a new independent client sampling strategy to minimize the wall-clock training time of FL, while considering data heterogeneity and system heterogeneity in both communication and computation. We first derive a new convergence bound for non-convex loss functions with independent client sampling and then propose an adaptive bandwidth allocation scheme. Furthermore, we propose an efficient independent client sampling algorithm based on the upper bounds on the convergence rounds and the expected per-round training time, to minimize the wall-clock time of FL, while considering both the data and system heterogeneity. Experimental results under practical wireless network settings with real-world prototype demonstrate that the proposed independent sampling scheme substantially outperforms the current best sampling schemes under various training models and datasets.

artificial intelligence, heterogeneity, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.10097

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Du, Hang, Zhang, Sicheng, Xie, Binzhu, Nan, Guoshun, Zhang, Jiayang, Xu, Junrui, Liu, Hangyu, Leng, Sicong, Liu, Jiangming, Fan, Hehe, Huang, Dajiu, Feng, Jing, Chen, Linli, Zhang, Can, Li, Xuhuan, Zhang, Hao, Chen, Jianhang, Cui, Qimei, Tao, Xiaofeng

arXiv.org Artificial IntelligenceMay-6-2024

Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. In addition, we also introduce MMEval, a novel evaluation metric designed to better align with human preferences for CUVA, facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally, we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach. Our code and dataset are available at https://github.com/fesvhtr/CUVA.

data mining, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2405.00181

Country:

Asia > China (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Industry:

Law (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Law Enforcement & Public Safety > Fire & Emergency Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.91)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Refining Latent Homophilic Structures over Heterophilic Graphs for Robust Graph Convolution Networks

Qiu, Chenyang, Nan, Guoshun, Xiong, Tianyu, Deng, Wendi, Wang, Di, Teng, Zhiyang, Sun, Lijuan, Cui, Qimei, Tao, Xiaofeng

arXiv.org Artificial IntelligenceDec-27-2023

Graph convolution networks (GCNs) are extensively utilized in various graph tasks to mine knowledge from spatial data. Our study marks the pioneering attempt to quantitatively investigate the GCN robustness over omnipresent heterophilic graphs for node classification. We uncover that the predominant vulnerability is caused by the structural out-of-distribution (OOD) issue. This finding motivates us to present a novel method that aims to harden GCNs by automatically learning Latent Homophilic Structures over heterophilic graphs. We term such a methodology as LHS. To elaborate, our initial step involves learning a latent structure by employing a novel self-expressive technique based on multi-node interactions. Subsequently, the structure is refined using a pairwisely constrained dual-view contrastive learning approach. We iteratively perform the above procedure, enabling a GCN model to aggregate information in a homophilic way on heterophilic graphs. Armed with such an adaptable structure, we can properly mitigate the structural OOD threats over heterophilic graphs. Experiments on various benchmarks show the effectiveness of the proposed LHS approach for robust GCNs.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.16418

Country:

North America > United States (0.15)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding

Du, Hang, Nan, Guoshun, Zhang, Sicheng, Xie, Binzhu, Xu, Junrui, Fan, Hehe, Cui, Qimei, Tao, Xiaofeng, Jiang, Xudong

arXiv.org Artificial IntelligenceDec-26-2023

Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection. However, existing MSU benchmarks and approaches usually focus on sentence-level MSU. In document-level news, sarcasm clues are sparse or small and are often concealed in long text. Moreover, compared to sentence-level comments like tweets, which mainly focus on only a few trends or hot topics (e.g., sports events), content in the news is considerably diverse. Models created for sentence-level MSU may fail to capture sarcasm clues in document-level news. To fill this gap, we present a comprehensive benchmark for Document-level Multimodal Sarcasm Understanding (DocMSU). Our dataset contains 102,588 pieces of news with text-image pairs, covering 9 diverse topics such as health, business, etc. The proposed large-scale and diverse DocMSU significantly facilitates the research of document-level MSU in real-world scenarios. To take on the new challenges posed by DocMSU, we introduce a fine-grained sarcasm comprehension method to properly align the pixel-level image features with word-level textual features in documents. Experiments demonstrate the effectiveness of our method, showing that it can serve as a baseline approach to the challenging DocMSU. Our code and dataset are available at https://github.com/Dulpy/DocMSU.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.16023

Country: Asia > Japan (0.28)

Genre: Research Report (1.00)

Industry:

Media (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FedCompetitors: Harmonious Collaboration in Federated Learning with Competing Participants

Tan, Shanli, Cheng, Hao, Wu, Xiaohu, Yu, Han, He, Tiantian, Ong, Yew-Soon, Wang, Chongjun, Tao, Xiaofeng

arXiv.org Artificial IntelligenceDec-18-2023

Federated learning (FL) provides a privacy-preserving approach for collaborative training of machine learning models. Given the potential data heterogeneity, it is crucial to select appropriate collaborators for each FL participant (FL-PT) based on data complementarity. Recent studies have addressed this challenge. Similarly, it is imperative to consider the inter-individual relationships among FL-PTs where some FL-PTs engage in competition. Although FL literature has acknowledged the significance of this scenario, practical methods for establishing FL ecosystems remain largely unexplored. In this paper, we extend a principle from the balance theory, namely ``the friend of my enemy is my enemy'', to ensure the absence of conflicting interests within an FL ecosystem. The extended principle and the resulting problem are formulated via graph theory and integer linear programming. A polynomial-time algorithm is proposed to determine the collaborators of each FL-PT. The solution guarantees high scalability, allowing even competing FL-PTs to smoothly join the ecosystem without conflict of interest. The proposed framework jointly considers competition and data heterogeneity. Extensive experiments on real-world and synthetic data demonstrate its efficacy compared to five alternative approaches, and its ability to establish efficient collaboration networks among FL-PTs.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2312.11391

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

3D-IDS: Doubly Disentangled Dynamic Intrusion Detection

Qiu, Chenyang, Geng, Yingsheng, Lu, Junrui, Chen, Kaida, Zhu, Shitong, Su, Ya, Nan, Guoshun, Zhang, Can, Fu, Junsong, Cui, Qimei, Tao, Xiaofeng

arXiv.org Artificial IntelligenceJul-1-2023

Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SVM-based method) or detecting diverse known attacks (e.g., 31% F1 for the Backdoor and 93% F1 for DDoS by a GCN-based state-of-the-art method), and reveals that the underlying cause is entangled distributions of flow features. This motivates us to propose 3D-IDS, a novel method that aims to tackle the above issues through two-step feature disentanglements and a dynamic graph diffusion scheme. Specifically, we first disentangle traffic features by a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. Such differentiated features will be fed into a memory model to generate representations, which are further disentangled to highlight the attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. By doing so, we can effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected. Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599238

2307.11079

Country: North America > United States > California (0.15)

Genre: Research Report > Promising Solution (0.54)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(5 more...)

Add feedback