AITopics | Zhu, Guangxu

Collaborating Authors

Zhu, Guangxu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence

Cai, Zhijie, Cao, Xiaowen, Chen, Xu, Cui, Yuanhao, Zhu, Guangxu, Huang, Kaibin, Cui, Shuguang

arXiv.org Artificial IntelligenceFeb-14-2025

Recent breakthroughs in artificial intelligence (AI), wireless communications, and sensing technologies have accelerated the evolution of edge intelligence. However, conventional systems still grapple with issues such as low communication efficiency, redundant data acquisition, and poor model generalization. To overcome these challenges, we propose an innovative framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication (JSAC). This framework features an AI-driven closed-loop control architecture that jointly optimizes system resources, thereby delivering superior system-level performance. A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters. This insight enables dynamic reduction of the generalization error through AI-driven closed-loop control. Specifically, for sensing control, we introduce an adaptive data collection strategy based on gradient importance sampling, allowing edge devices to autonomously decide when to terminate data acquisition and how to allocate sample weights based on real-time model feedback. For communication control, drawing inspiration from stochastic gradient Langevin dynamics (SGLD), our joint optimization of transmission power and batch size converts channel and data noise into gradient perturbations that help mitigate overfitting. Experimental evaluations demonstrate that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of collected samples by up to 52 percent while significantly improving model generalization -- with up to 58 percent reductions of the final validation loss. It validates that the proposed scheme can harvest the mutual benefit of AI and JSAC systems by incorporating the model itself into the control loop of the system.

artificial intelligence, generalization error, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.10203

Country: Asia > China (0.47)

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Label Anything: An Interpretable, High-Fidelity and Prompt-Free Annotator

Kou, Wei-Bin, Zhu, Guangxu, Ye, Rongguang, Wang, Shuai, Tang, Ming, Wu, Yik-Chung

arXiv.org Artificial IntelligenceFeb-5-2025

Learning-based street scene semantic understanding in autonomous driving (AD) has advanced significantly recently, but the performance of the AD model is heavily dependent on the quantity and quality of the annotated training data. However, traditional manual labeling involves high cost to annotate the vast amount of required data for training robust model. To mitigate this cost of manual labeling, we propose a Label Anything Model (denoted as LAM), serving as an interpretable, high-fidelity, and prompt-free data annotator. Specifically, we firstly incorporate a pretrained Vision Transformer (ViT) to extract the latent features. On top of ViT, we propose a semantic class adapter (SCA) and an optimization-oriented unrolling algorithm (OptOU), both with a quite small number of trainable parameters. SCA is proposed to fuse ViT-extracted features to consolidate the basis of the subsequent automatic annotation. OptOU consists of multiple cascading layers and each layer contains an optimization formulation to align its output with the ground truth as closely as possible, though which OptOU acts as being interpretable rather than learning-based blackbox nature. In addition, training SCA and OptOU requires only a single pre-annotated RGB seed image, owing to their small volume of learnable parameters. Extensive experiments clearly demonstrate that the proposed LAM can generate high-fidelity annotations (almost 100% in mIoU) for multiple real-world datasets (i.e., Camvid, Cityscapes, and Apolloscapes) and CARLA simulation dataset.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.02972

Country: Asia > China (0.30)

Genre: Research Report (0.50)

Industry:

Information Technology (0.49)
Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.90)
(3 more...)

Add feedback

First Token Probability Guided RAG for Telecom Question Answering

Chen, Tingwei, Chen, Jiayi, Zhao, Zijian, Chen, Haolong, Zhang, Liang, Zhu, Guangxu

arXiv.org Artificial IntelligenceJan-11-2025

Large Language Models (LLMs) have garnered significant attention for their impressive general-purpose capabilities. For applications requiring intricate domain knowledge, Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into LLMs. However, existing RAG research has not fully addressed the challenges of Multiple Choice Question Answering (MCQA) in telecommunications, particularly in terms of retrieval quality and mitigating hallucinations. To tackle these challenges, we propose a novel first token probability guided RAG framework. This framework leverages confidence scores to optimize key hyperparameters, such as chunk number and chunk window size, while dynamically adjusting the context. Our method starts by retrieving the most relevant chunks and generates a single token as the potential answer. The probabilities of all options are then normalized to serve as confidence scores, which guide the dynamic adjustment of the context. By iteratively optimizing the hyperparameters based on these confidence scores, we can continuously improve RAG performance. We conducted experiments to validate the effectiveness of our framework, demonstrating its potential to enhance accuracy in domain-specific MCQA tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.06468

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Generalizable Autonomous Driving System across Diverse Adverse Weather Conditions

Kou, Wei-Bin, Zhu, Guangxu, Ye, Rongguang, Lin, Qingfeng, Ren, Zeyi, Tang, Ming, Wu, Yik-Chung

arXiv.org Artificial IntelligenceJan-9-2025

Various adverse weather conditions pose a significant challenge to autonomous driving (AD) street scene semantic understanding (segmentation). A common strategy is to minimize the disparity between images captured in clear and adverse weather conditions. However, this technique typically relies on utilizing clear image as a reference, which is challenging to obtain in practice. Furthermore, this method typically targets a single adverse condition, and thus perform poorly when confronting a mixture of multiple adverse weather conditions. To address these issues, we introduce a reference-free and Adverse weather-Immune scheme (called AdvImmu) that leverages the invariance of weather conditions over short periods (seconds). Specifically, AdvImmu includes three components: Locally Sequential Mechanism (LSM), Globally Shuffled Mechanism (GSM), and Unfolded Regularizers (URs). LSM leverages temporal correlations between adjacent frames to enhance model performance. GSM is proposed to shuffle LSM segments to prevent overfitting of temporal patterns. URs are the deep unfolding implementation of two proposed regularizers to penalize the model complexity to enhance across-weather generalization. In addition, to overcome the over-reliance on consecutive frame-wise annotations in the training of AdvImmu (typically unavailable in AD scenarios), we incorporate a foundation model named Segment Anything Model (SAM) to assist to annotate frames, and additionally propose a cluster algorithm (denoted as SBICAC) to surmount SAM's category-agnostic issue to generate pseudo-labels. Extensive experiments demonstrate that the proposed AdvImmu outperforms existing state-of-the-art methods by 88.56% in mean Intersection over Union (mIoU).

advimmu, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.14737

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)
Automobiles & Trucks (0.61)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

KNN-MMD: Cross Domain Wireless Sensing via Local Distribution Alignment

Zhao, Zijian, Cai, Zhijie, Chen, Tingwei, Li, Xiaoyang, Li, Hang, Chen, Qimei, Zhu, Guangxu

arXiv.org Artificial IntelligenceJan-7-2025

Wireless sensing has recently found widespread applications in diverse environments, including homes, offices, and public spaces. By analyzing patterns in channel state information (CSI), it is possible to infer human actions for tasks such as person identification, gesture recognition, and fall detection. However, CSI is highly sensitive to environmental changes, where even minor alterations can significantly distort the CSI patterns. This sensitivity often leads to performance degradation or outright failure when applying wireless sensing models trained in one environment to another. To address this challenge, Domain Alignment (DAL) has been widely adopted for cross-domain classification tasks, as it focuses on aligning the global distributions of the source and target domains in feature space. Despite its popularity, DAL often neglects inter-category relationships, which can lead to misalignment between categories across domains, even when global alignment is achieved. To overcome these limitations, we propose K-Nearest Neighbors Maximum Mean Discrepancy (KNN-MMD), a novel few-shot method for cross-domain wireless sensing. Our approach begins by constructing a help set using KNN from the target domain, enabling local alignment between the source and target domains within each category using MMD. Additionally, we address a key instability issue commonly observed in cross-domain methods, where model performance fluctuates sharply between epochs. Further, most existing methods struggle to determine an optimal stopping point during training due to the absence of labeled data from the target domain. Our method resolves this by excluding the support set from the target domain during training and employing it as a validation set to determine the stopping criterion.

artificial intelligence, machine learning, target domain, (18 more...)

arXiv.org Artificial Intelligence

2412.04783

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.49)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

Kou, Wei-Bin, Lin, Qingfeng, Tang, Ming, Wang, Shuai, Ye, Rongguang, Zhu, Guangxu, Wu, Yik-Chung

arXiv.org Artificial IntelligenceJan-3-2025

To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model's fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs' powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs' vast and diverse training data. On the other hand, to mitigate vehicles' computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48\% and converges faster over 6 times, compared to the existing state-of-the-art approach.

large language model, machine learning, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2501.0171

Country: Asia > China (0.29)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Mining Limited Data Sufficiently: A BERT-inspired Approach for CSI Time Series Application in Wireless Communication and Sensing

Zhao, Zijian, Meng, Fanyi, Li, Hang, Li, Xiaoyang, Zhu, Guangxu

arXiv.org Artificial IntelligenceDec-9-2024

Channel State Information (CSI) is the cornerstone in both wireless communication and sensing systems. In wireless communication systems, CSI provides essential insights into channel conditions, enabling system optimizations like channel compensation and dynamic resource allocation. However, the high computational complexity of CSI estimation algorithms necessitates the development of fast deep learning methods for CSI prediction. In wireless sensing systems, CSI can be leveraged to infer environmental changes, facilitating various functions, including gesture recognition and people identification. Deep learning methods have demonstrated significant advantages over model-based approaches in these fine-grained CSI classification tasks, particularly when classes vary across different scenarios. However, a major challenge in training deep learning networks for wireless systems is the limited availability of data, further complicated by the diverse formats of many public datasets, which hinder integration. Additionally, collecting CSI data can be resource-intensive, requiring considerable time and manpower. To address these challenges, we propose CSI-BERT2 for CSI prediction and classification tasks, effectively utilizing limited data through a pre-training and fine-tuning approach. Building on CSI-BERT1, we enhance the model architecture by introducing an Adaptive Re-Weighting Layer (ARL) and a Multi-Layer Perceptron (MLP) to better capture sub-carrier and timestamp information, effectively addressing the permutation-invariance problem. Furthermore, we propose a Mask Prediction Model (MPM) fine-tuning method to improve the model's adaptability for CSI prediction tasks. Experimental results demonstrate that CSI-BERT2 achieves state-of-the-art performance across all tasks.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.06861

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Personalizing Low-Rank Bayesian Neural Networks Via Federated Learning

Zhang, Boning, Liu, Dongzhu, Simeone, Osvaldo, Wang, Guanchu, Pezaros, Dimitrios, Zhu, Guangxu

arXiv.org Artificial IntelligenceOct-18-2024

To support real-world decision-making, it is crucial for models to be well-calibrated, i.e., to assign reliable confidence estimates to their predictions. Uncertainty quantification is particularly important in personalized federated learning (PFL), as participating clients typically have small local datasets, making it difficult to unambiguously determine optimal model parameters. Bayesian PFL (BPFL) methods can potentially enhance calibration, but they often come with considerable computational and memory requirements due to the need to track the variances of all the individual model parameters. Furthermore, different clients may exhibit heterogeneous uncertainty levels owing to varying local dataset sizes and distributions. To address these challenges, we propose LR-BPFL, a novel BPFL method that learns a global deterministic model along with personalized low-rank Bayesian corrections. To tailor the local model to each client's inherent uncertainty level, LR-BPFL incorporates an adaptive rank selection mechanism. We evaluate LR-BPFL across a variety of datasets, demonstrating its advantages in terms of calibration, accuracy, as well as computational and memory requirements.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.1439

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Fast-Convergent and Communication-Alleviated Heterogeneous Hierarchical Federated Learning in Autonomous Driving

Kou, Wei-Bin, Lin, Qingfeng, Tang, Ming, Ye, Rongguang, Wang, Shuai, Zhu, Guangxu, Wu, Yik-Chung

arXiv.org Artificial IntelligenceSep-29-2024

--S treet S cene S emantic U nderstanding (denoted as TriSU) is a complex task for autonomous driving (AD). However, inference model trained from data in a particular geographical region faces poor generalization when applied in other regions due to inter-city data domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization by collaborative privacy-preserving training over distributed datasets from different cities. Unfortunately, it suffers from slow convergence because data from different cities are with disparate statistical properties. Going beyond existing HFL methods, we propose a Gaussian heterogeneous HFL algorithm (FedGau) to address inter-city data heterogeneity so that convergence can be accelerated. In the proposed FedGau algorithm, both single RGB image and RGB dataset are modelled as Gaussian distributions for aggregation weight design. This approach not only differentiates each RGB image by respective statistical distribution, but also exploits the statistics of dataset from each city in addition to the conventionally considered data volume. With the proposed approach, the convergence is accelerated by 35.5%-40.6% On the other hand, to reduce the involved communication resource, we further introduce a novel performance-aware adaptive resource scheduling (AdapRS) policy. Unlike the traditional static resource scheduling policy that exchanges a fixed number of models between two adjacent aggregations, AdapRS adjusts the number of model aggregation at different levels of HFL so that unnecessary communications are minimized. Extensive experiments demonstrate that AdapRS saves 29.65% communication overhead compared to conventional static resource scheduling policy while maintaining almost the same performance. Wei-Bin Kou, Qingfeng Lin and Yik-Chung Wu are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China. Guangxu Zhu is with Shenzhen Research Institute of Big Data, Shenzhen, China.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2409.1956

Country:

Asia > China > Guangdong Province > Shenzhen (0.44)
Asia > China > Hong Kong (0.44)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.61)

Add feedback

Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

Jiao, Xiang, Wen, Dingzhu, Zhu, Guangxu, Jiang, Wei, Luo, Wu, Shi, Yuanming

arXiv.org Artificial IntelligenceJul-1-2024

Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme.

artificial intelligence, discriminant gain, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2407.00955

Country: Asia > China (0.48)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.41)

Add feedback