AITopics | Wang, Mengzhu

Collaborating Authors

Wang, Mengzhu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

In-Context Meta LoRA Generation

Shao, Yihua, Yan, Minxi, Liu, Yang, Chen, Siyu, Chen, Wenjie, Long, Xinwei, Yan, Ziyang, Li, Lei, Zhang, Chenyu, Sebe, Nicu, Tang, Hao, Wang, Yan, Zhao, Hao, Wang, Mengzhu, Guo, Jingcai

arXiv.org Artificial IntelligenceJan-30-2025

Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning. However, in scenarios that involve multiple tasks, training a separate LoRA model for each one results in considerable inefficiency in terms of storage and inference. Moreover, existing parameter generation methods fail to capture the correlations among these tasks, making multi-task LoRA parameter generation challenging. To address these limitations, we propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models (LLMs). Specifically, we use training data from all tasks to train a tailored generator, Conditional Variational Autoencoder (CVAE). CVAE takes task descriptions as inputs and produces task-aware LoRA weights as outputs. These LoRA weights are then merged with LLMs to create task-specialized models without the need for additional fine-tuning. Furthermore, we utilize in-context meta-learning for knowledge enhancement and task mapping, to capture the relationship between tasks and parameter distributions. As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE. ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods and is useful for implementing task-specific enhancements of LoRA parameters. At the same time, our method occupies 283MB, only 1\% storage compared with the original LoRA.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.17635

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation

Wang, Mengzhu, Li, Jiao, Su, Houcheng, Yin, Nan, Yang, Liang, Li, Shen

arXiv.org Artificial IntelligenceNov-22-2024

Semi-supervised learning (SSL) has made notable advancements in medical image segmentation (MIS), particularly in scenarios with limited labeled data and significantly enhancing data utilization efficiency. Previous methods primarily focus on complex training strategies to utilize unlabeled data but neglect the importance of graph structural information. Different from existing methods, we propose a graph-based clustering for semi-supervised medical image segmentation (GraphCL) by jointly modeling graph data structure in a unified deep model. The proposed GraphCL model enjoys several advantages. Firstly, to the best of our knowledge, this is the first work to model the data structure information for semi-supervised medical image segmentation (SSMIS). Secondly, to get the clustered features across different graphs, we integrate both pairwise affinities between local image features and raw features as inputs. Extensive experimental results on three standard benchmarks show that the proposed GraphCL algorithm outperforms state-of-the-art semi-supervised medical image segmentation methods.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2411.13147

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AGLP: A Graph Learning Perspective for Semi-supervised Domain Adaptation

Su, Houcheng, Wang, Mengzhu, Li, Jiao, Yin, Nan, Yang, Liang, Shen, Li

arXiv.org Artificial IntelligenceNov-22-2024

In semi-supervised domain adaptation (SSDA), the model aims to leverage partially labeled target domain data along with a large amount of labeled source domain data to enhance its generalization capability for the target domain. A key advantage of SSDA is its ability to significantly reduce reliance on labeled data, thereby lowering the costs and time associated with data preparation. Most existing SSDA methods utilize information from domain labels and class labels but overlook the structural information of the data. To address this issue, this paper proposes a graph learning perspective (AGLP) for semi-supervised domain adaptation. We apply the graph convolutional network to the instance graph which allows structural information to propagate along the weighted graph edges. The proposed AGLP model has several advantages. First, to the best of our knowledge, this is the first work to model structural information in SSDA. Second, the proposed model can effectively learn domain-invariant and semantic representations, reducing domain discrepancies in SSDA. Extensive experimental results on multiple standard benchmarks demonstrate that the proposed AGLP algorithm outperforms state-of-the-art semi-supervised domain adaptation methods.

adaptation, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.13152

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DiM: $f$-Divergence Minimization Guided Sharpness-Aware Optimization for Semi-supervised Medical Image Segmentation

Wang, Bingli, Su, Houcheng, Yin, Nan, Wang, Mengzhu, Shen, Li

arXiv.org Artificial IntelligenceNov-19-2024

As a technique to alleviate the pressure of data annotation, semi-supervised learning (SSL) has attracted widespread attention. In the specific domain of medical image segmentation, semi-supervised methods (SSMIS) have become a research hotspot due to their ability to reduce the need for large amounts of precisely annotated data. SSMIS focuses on enhancing the model's generalization performance by leveraging a small number of labeled samples and a large number of unlabeled samples. The latest sharpness-aware optimization (SAM) technique, which optimizes the model by reducing the sharpness of the loss function, has shown significant success in SSMIS. However, SAM and its variants may not fully account for the distribution differences between different datasets. To address this issue, we propose a sharpness-aware optimization method based on $f$-divergence minimization (DiM) for semi-supervised medical image segmentation. This method enhances the model's stability by fine-tuning the sensitivity of model parameters and improves the model's adaptability to different datasets through the introduction of $f$-divergence. By reducing $f$-divergence, the DiM method not only improves the performance balance between the source and target datasets but also prevents performance degradation due to overfitting on the source dataset.

artificial intelligence, machine learning, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2411.1235

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Degree Distribution based Spiking Graph Networks for Domain Adaptation

Wang, Yingxu, Liu, Siwei, Wang, Mengzhu, Liang, Shangsong, Yin, Nan

arXiv.org Artificial IntelligenceOct-9-2024

Spiking Graph Networks (SGNs) have garnered significant attraction from both researchers and industry due to their ability to address energy consumption challenges in graph classification. However, SGNs are only effective for in-distribution data and cannot tackle out-of-distribution data. In this paper, we first propose the domain adaptation problem in SGNs, and introduce a novel framework named Degree-aware Spiking Graph Domain Adaptation for Classification. The proposed DeSGDA addresses the spiking graph domain adaptation problem by three aspects: node degree-aware personalized spiking representation, adversarial feature distribution alignment, and pseudo-label distillation. First, we introduce the personalized spiking representation method for generating degree-dependent spiking signals. Specifically, the threshold of triggering a spike is determined by the node degree, allowing this personalized approach to capture more expressive information for classification. Then, we propose the graph feature distribution alignment module that is adversarially trained using membrane potential against a domain discriminator. Such an alignment module can efficiently maintain high performance and low energy consumption in the case of inconsistent distribution. Additionally, we extract consistent predictions across two spaces to create reliable pseudo-labels, effectively leveraging unlabeled data to enhance graph classification performance. Extensive experiments on benchmark datasets validate the superiority of the proposed DeSGDA compared with competitive baselines.

artificial intelligence, domain adaptation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.06883

Country: Asia (0.14)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement

Li, Qinglun, Zhang, Miao, Wang, Mengzhu, Yin, Quanjun, Shen, Li

arXiv.org Artificial IntelligenceOct-8-2024

Decentralized Federated Learning (DFL) surpasses Centralized Federated Learning (CFL) in terms of faster training, privacy preservation, and light communication, making it a promising alternative in the field of federated learning. However, DFL still exhibits significant disparities with CFL in terms of generalization ability such as rarely theoretical understanding and degraded empirical performance due to severe inconsistency. In this paper, we enhance the consistency of DFL by developing an opposite lookahead enhancement technique (Ole), yielding OledFL to optimize the initialization of each client in each communication round, thus significantly improving both the generalization and convergence speed. Moreover, we rigorously establish its convergence rate in non-convex setting and characterize its generalization bound through uniform stability, which provides concrete reasons why OledFL can achieve both the fast convergence speed and high generalization ability. Extensive experiments conducted on the CIFAR10 and CIFAR100 datasets with Dirichlet and Pathological distributions illustrate that our OledFL can achieve up to 5\% performance improvement and 8$\times$ speedup, compared to the most popular DFedAvg optimizer in DFL.

artificial intelligence, machine learning, oledfl, (12 more...)

arXiv.org Artificial Intelligence

2410.06482

Country: Asia > China (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

Dai, Yang, Ma, Oubo, Zhang, Longfei, Liang, Xingxing, Hu, Shengchao, Wang, Mengzhu, Ji, Shouling, Huang, Jincai, Shen, Li

arXiv.org Artificial IntelligenceMay-20-2024

Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba in offline RL (dubbed DeMa) from the aspect of data structures and network architectures with the following insights: (1) Long sequences impose a significant computational burden without contributing to performance improvements due to the fact that DeMa's focus on sequences diminishes approximately exponentially. Consequently, we introduce a Transformer-like DeMa as opposed to an RNN-like DeMa. (2) For the components of DeMa, we identify that the hidden attention mechanism is key to its success, which can also work well with other residual structures and does not require position embedding. Extensive evaluations from eight Atari games demonstrate that our specially designed DeMa is compatible with trajectory optimization and surpasses previous state-of-the-art methods, outdoing Decision Transformer (DT) by 80\% with 30\% fewer parameters, and exceeds DT in MuJoCo with only a quarter of the parameters.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2405.12094

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Dynamic Spiking Graph Neural Networks

Yin, Nan, Wang, Mengzhu, Chen, Zhenghan, De Masi, Giulia, Gu, Bin, Xiong, Huan

arXiv.org Artificial IntelligenceDec-15-2023

The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs instead of Recurrent Neural Networks (RNNs) by using binary features instead of continuous ones for efficient training, which would overlooks graph structure information and leads to the loss of details during propagation. Additionally, optimizing dynamic spiking models typically requires propagation of information across time steps, which increases memory requirements. To address these challenges, we present a framework named \underline{Dy}namic \underline{S}p\underline{i}king \underline{G}raph \underline{N}eural Networks (\method{}). To mitigate the information loss problem, \method{} propagates early-layer information directly to the last layer for information compensation. To accommodate the memory requirements, we apply the implicit differentiation on the equilibrium state, which does not rely on the exact reverse of the forward computation. While traditional implicit differentiation methods are usually used for static situations, \method{} extends it to the dynamic graph setting. Extensive experiments on three large-scale real-world dynamic graph datasets validate the effectiveness of \method{} on dynamic node classification tasks with lower computational costs.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.05373

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.68)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Singular Value Penalization and Semantic Data Augmentation for Fully Test-Time Adaptation

Su, Houcheng, Liu, Daixian, Wang, Mengzhu, Wang, Wei

arXiv.org Artificial IntelligenceDec-9-2023

Fully test-time adaptation (FTTA) adapts a model that is trained on a source domain to a target domain during the testing phase, where the two domains follow different distributions and source data is unavailable during the training phase. Existing methods usually adopt entropy minimization to reduce the uncertainty of target prediction results, and improve the FTTA performance accordingly. However, they fail to ensure the diversity in target prediction results. Recent domain adaptation study has shown that maximizing the sum of singular values of prediction results can simultaneously enhance their confidence (discriminability) and diversity. However, during the training phase, larger singular values usually take up a dominant position in loss maximization. This results in the model being more inclined to enhance discriminability for easily distinguishable classes, and the improvement in diversity is insufficiently effective. Furthermore, the adaptation and prediction in FTTA only use data from the current batch, which may lead to the risk of overfitting. To address the aforementioned issues, we propose maximizing the sum of singular values while minimizing their variance. This enables the model's focus toward the smaller singular values, enhancing discriminability between more challenging classes and effectively increasing the diversity of prediction results. Moreover, we incorporate data from the previous batch to realize semantic data augmentation for the current batch, reducing the risk of overfitting. Extensive experiments on benchmark datasets show our proposed approach outperforms some compared state-of-the-art FTTA methods.

artificial intelligence, machine learning, singular value, (17 more...)

arXiv.org Artificial Intelligence

2312.08378

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

Yin, Nan, Shen, Li, Wang, Mengzhu, Lan, Long, Ma, Zeyu, Chen, Chong, Hua, Xian-Sheng, Luo, Xiao

arXiv.org Artificial IntelligenceJun-10-2023

Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient exploration of graph topology and the significant domain discrepancy. In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. CoCo contains a graph convolutional network branch and a hierarchical graph kernel network branch, which explore graph topology in implicit and explicit manners. Besides, we incorporate coupled branches into a holistic multi-view contrastive learning framework, which not only incorporates graph representations learned from complementary views for enhanced understanding, but also encourages the similarity between cross-domain example pairs with the same semantics for domain alignment. Extensive experiments on popular datasets show that our CoCo outperforms these competing baselines in different settings generally.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2306.04979

Country:

Asia > China (0.46)
Asia > Middle East (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback