AITopics | Lu, Wang

Collaborating Authors

Lu, Wang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

Dai, Zilin, Wang, Lehong, Lin, Fangzhou, Wang, Yidong, Li, Zhigang, Yamada, Kazunori D, Zhang, Ziming, Lu, Wang

arXiv.org Artificial IntelligenceMar-21-2025

Abstract--Real-world machine learning applications are often hindered by two critical challenges: distribution shift and label noise. Networks inherently tend to overfit to redundant, uninformative features present in the training distribution, which undermines their ability to generalize effectively to the target domain's distribution. The presence of noisy data further exacerbates this issue by inducing additional overfitting to noise, causing existing domain generalization methods to fail in effectively distinguishing invariant features from spurious ones. We also introduce a weighted loss function that dynamically adjusts the contribution of each sample based on its distance to the corresponding NLP anchor, thereby improving the model's resilience to noisy labels. Generalization (DG) has emerged as a pivotal algorithm in machine learning, aiming to develop models that can maintain high performance on previously unseen environments--or domains. T raditional methods often assume that training and test data share the same distribution, yet in real-world scenarios, there is frequently a substantial shift between these distributions. This phenomenon, widely referred to as domain shift, can cause severe performance degradation in tasks spanning computer vision, natural language processing, and medical image analysis [1]. As shown in Figure 1(a)(b), even within the same class label, the distribution of feature representations can vary considerably . This variation may stem from differences in image acquisition conditions--such as lighting variations, changes in pose, or complex background environments--and even from more subtle domain-specific factors like sensor noise or camera calibration differences. Such intra-class variability poses a significant challenge for developing accurate and adaptable models, which must learn to extract invariant features that capture the true semantic essence of the class while ignoring irrelevant variations. Lin, Z. Zhang is with Worcester Polytechnic Institute, Worcester, MA, 01890. L.Wang is with Carnegie Mellon University, Pittsburgh, P A, 15213. Y .Wang is with Peking University, Beijing, China, 100871. Z.Li, W.Lu is with T singhua University, Beijing, China, 100190. K.Y amada is with T ohoku University, Sendai, Japan, 980-8572.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.17211

Country:

Asia > China > Beijing > Beijing (0.44)
North America > United States > Massachusetts > Worcester County > Worcester (0.24)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.24)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing

Xiao, Yang, Lu, Wang, Ji, Jie, Ye, Ruimeng, Li, Gen, Ma, Xiaolong, Hui, Bo

arXiv.org Artificial IntelligenceMar-9-2025

The design of artificial neural networks (ANNs) is inspired by the structure of the human brain, and in turn, ANNs offer a potential means to interpret and understand brain signals. Existing methods primarily align brain signals with real-world signals using Mean Squared Error (MSE), which solely focuses on local point-wise alignment, and ignores global matching, leading to coarse interpretations and inaccuracies in brain signal decoding. In this paper, we address these issues through optimal transport (OT) and theoretically demonstrate why OT provides a more effective alignment strategy than MSE. Specifically, we construct a transport plan between brain voxel embeddings and image embeddings, enabling more precise matching. By controlling the amount of transport, we mitigate the influence of redundant information. We apply our alignment model directly to the Brain Captioning task by feeding brain siginals into a large language model (LLM) instead of images. Our approach achieves state-of-the-art performance across ten evaluation metrics, surpassing the previous best method by an average of 6.11\% in single-subject training and 3.81\% in cross-subject training. Additionally, we have uncovered several insightful conclusions that align with existing brain research. We unveil the redundancy and synergy of brain information processing through region masking and data dimensionality reduction visualization experiments. We believe our approach paves the way for a more precise understanding of brain signals in the future. The code is available soon.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.10663

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

Yang, Chuanpeng, Lu, Wang, Zhu, Yao, Wang, Yidong, Chen, Qian, Gao, Chenlong, Yan, Bingjie, Chen, Yiqiang

arXiv.org Artificial IntelligenceJul-1-2024

Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models while maintaining their accuracy has become a focal point of research. Among the various methods, knowledge distillation has emerged as an effective technique to enhance inference speed without greatly compromising performance. This paper presents a thorough survey from three aspects: method, evaluation, and application, exploring knowledge distillation techniques tailored specifically for LLMs. Specifically, we divide the methods into white-box KD and black-box KD to better illustrate their differences. Furthermore, we also explored the evaluation tasks and distillation effects between different distillation methods, and proposed directions for future research. Through in-depth understanding of the latest advancements and practical applications, this survey provides valuable resources for researchers, paving the way for sustained progress in this field.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.01885

Country:

Asia > China (0.14)
Oceania > Australia (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Law (0.68)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

Lu, Wang, Wang, Jindong, Wang, Yidong, Xie, Xing

arXiv.org Artificial IntelligenceJan-3-2024

The distribution shifts between training and test data typically undermine the performance of models. In recent years, lots of work pays attention to domain generalization (DG) where distribution shifts exist, and target data are unseen. Despite the progress in algorithm design, two foundational factors have long been ignored: 1) the optimization for regularization-based objectives, and 2) the model selection for DG since no knowledge about the target domain can be utilized. In this paper, we propose Mixup guided optimization and selection techniques for DG. For optimization, we utilize an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization. For model selection, we generate a validation dataset with a closer distance to the target distribution, and thereby it can better represent the target data. We also present some theoretical insights behind our proposals. Comprehensive experiments demonstrate that our model optimization and selection techniques can largely improve the performance of existing domain generalization algorithms and even achieve new state-of-the-art results.

artificial intelligence, generalization, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2209.00652

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

FIXED: Frustratingly Easy Domain Generalization with Mixup

Lu, Wang, Wang, Jindong, Yu, Han, Huang, Lei, Zhang, Xiang, Chen, Yiqiang, Xie, Xing

arXiv.org Artificial IntelligenceNov-28-2023

Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains. A popular strategy is to augment training data to benefit generalization through methods such as Mixup~\cite{zhang2018mixup}. While the vanilla Mixup can be directly applied, theoretical and empirical investigations uncover several shortcomings that limit its performance. Firstly, Mixup cannot effectively identify the domain and class information that can be used for learning invariant representations. Secondly, Mixup may introduce synthetic noisy data points via random interpolation, which lowers its discrimination capability. Based on the analysis, we propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX). It learns domain-invariant representations for Mixup. To further enhance discrimination, we leverage existing techniques to enlarge margins among classes to further propose the domain-invariant Feature MIXup with Enhanced Discrimination (FIXED) approach. We present theoretical insights about guarantees on its effectiveness. Extensive experiments on seven public datasets across two modalities including image classification (Digits-DG, PACS, Office-Home) and time series (DSADS, PAMAP2, UCI-HAR, and USC-HAD) demonstrate that our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5\% on average in terms of test accuracy. Code is available at: https://github.com/jindongwang/transferlearning/tree/master/code/deep/fixed.

artificial intelligence, generalization, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.05228

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning

Lu, Wang, Yu, Hao, Wang, Jindong, Teney, Damien, Wang, Haohan, Chen, Yiqiang, Yang, Qiang, Xie, Xing, Ji, Xiangyang

arXiv.org Artificial IntelligenceOct-8-2023

When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. PFL that uses Zeroth-Order Optimization for Personalized Federated Learning. PFL avoids direct interference with the foundation models and instead learns to adapt its inputs through zeroth-order optimization. In addition, we employ simple yet effective linear projections to remap its predictions for personalization. To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings. PFL to analyze its convergence. Extensive empirical experiments on computer vision and natural language processing tasks using popular foundation models demonstrate its effectiveness for FL on black-box foundation models. In recent years, the growing emphasis on data privacy and security has led to the emergence of federated learning (FL) (Warnat-Herresthal et al., 2021; Chen & Chao, 2022; Chen et al., 2023b; Castiglia et al., 2023; Rodríguez-Barroso et al., 2023; Kuang et al., 2023). FL enables collaborative learning while safeguarding data privacy and security across distributed clients (Yang et al., 2019). However, FL faces two key challenges: limited resources and distribution shifts (Figure 1 (a, b)). The rise of large foundation models (Bommasani et al., 2021) has amplified these challenges. The computational demands and communication costs associated with such models hinder the deployment of existing FL approaches (Figure 1a).

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.05143

Country: North America > United States (0.28)

Genre:

Overview (0.67)
Research Report (0.50)
Workflow (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

Lu, Wang, Wang, Jindong, Sun, Xinwei, Chen, Yiqiang, Ji, Xiangyang, Yang, Qiang, Xie, Xing

arXiv.org Artificial IntelligenceAug-4-2023

Time series remains one of the most challenging modalities in machine learning research. The out-of-distribution (OOD) detection and generalization on time series tend to suffer due to its non-stationary property, i.e., the distribution changes over time. The dynamic distributions inside time series pose great challenges to existing algorithms to identify invariant distributions since they mainly focus on the scenario where the domain information is given as prior knowledge. In this paper, we attempt to exploit subdomains within a whole dataset to counteract issues induced by non-stationary for generalized representation learning. We propose DIVERSIFY, a general framework, for OOD detection and generalization on dynamic distributions of time series. DIVERSIFY takes an iterative process: it first obtains the "worst-case" latent distribution scenario via adversarial training, then reduces the gap between these latent distributions. We implement DIVERSIFY via combining existing OOD detection methods according to either extracted features or outputs of models for detection while we also directly utilize outputs for classification. In addition, theoretical insights illustrate that DIVERSIFY is theoretically supported. Extensive experiments are conducted on seven datasets with different OOD settings across gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition. Qualitative and quantitative results demonstrate that DIVERSIFY learns more generalized features and significantly outperforms other baselines.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2308.02282

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare

Chen, Yiqiang, Lu, Wang, Qin, Xin, Wang, Jindong, Xie, Xing

arXiv.org Artificial IntelligenceJul-9-2023

Federated learning has attracted increasing attention to building models without accessing the raw user data, especially in healthcare. In real applications, different federations can seldom work together due to possible reasons such as data heterogeneity and distrust/inexistence of the central server. In this paper, we propose a novel framework called MetaFed to facilitate trustworthy FL between different federations. MetaFed obtains a personalized model for each federation without a central server via the proposed Cyclic Knowledge Distillation. Specifically, MetaFed treats each federation as a meta distribution and aggregates knowledge of each federation in a cyclic manner. The training is split into two parts: common knowledge accumulation and personalization. Comprehensive experiments on three benchmarks demonstrate that MetaFed without a server achieves better accuracy compared to state-of-the-art methods (e.g., 10%+ accuracy improvement compared to the baseline for PAMAP2) with fewer communication costs.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.08516

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.70)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning

Lu, Wang, Hu, Xixu, Wang, Jindong, Xie, Xing

arXiv.org Artificial IntelligenceJul-9-2023

Federated learning (FL) has emerged as a new paradigm for privacy-preserving computation in recent years. Unfortunately, FL faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large foundation models. Specifically, the non-IID data in different clients make existing FL algorithms hard to converge while the high resource costs, including computational and communication costs that increase the deployment difficulty in real-world scenarios. In this paper, we propose an effective yet simple method, named FedCLIP, to achieve fast generalization and personalization for CLIP in federated learning. Concretely, we design an attention-based adapter for the large model, CLIP, and the rest operations merely depend on adapters. Lightweight adapters can make the most use of pretrained model information and ensure models be adaptive for clients in specific tasks. Simultaneously, small-scale operations can mitigate the computational burden and communication burden caused by large models. Extensive experiments are conducted on three datasets with distribution shifts. Qualitative and quantitative results demonstrate that FedCLIP significantly outperforms other baselines (9% overall improvements on PACS) and effectively reduces computational and communication costs (283x faster than FedAVG). Our code will be available at: https://github.com/microsoft/PersonalizedFL.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2302.13485

Country:

Asia > China (0.47)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning

Qin, Xin, Wang, Jindong, Ma, Shuo, Lu, Wang, Zhu, Yongchun, Xie, Xing, Chen, Yiqiang

arXiv.org Artificial IntelligenceJun-8-2023

Human activity recognition (HAR) is a time series classification task that focuses on identifying the motion patterns from human sensor readings. Adequate data is essential but a major bottleneck for training a generalizable HAR model, which assists customization and optimization of online web applications. However, it is costly in time and economy to collect large-scale labeled data in reality, i.e., the low-resource challenge. Meanwhile, data collected from different persons have distribution shifts due to different living habits, body shapes, age groups, etc. The low-resource and distribution shift challenges are detrimental to HAR when applying the trained model to new unseen subjects. In this paper, we propose a novel approach called Diverse and Discriminative representation Learning (DDLearn) for generalizable low-resource HAR. DDLearn simultaneously considers diversity and discrimination learning. With the constructed self-supervised learning task, DDLearn enlarges the data diversity and explores the latent activity properties. Then, we propose a diversity preservation module to preserve the diversity of learned features by enlarging the distribution divergence between the original and augmented domains. Meanwhile, DDLearn also enhances semantic discrimination by learning discriminative representations with supervised contrastive learning. Extensive experiments on three public HAR datasets demonstrate that our method significantly outperforms state-of-art methods by an average accuracy improvement of 9.5% under the low-resource distribution shift scenarios, while being a generic, explainable, and flexible framework. Code is available at: https://github.com/microsoft/robustlearn.

artificial intelligence, ddlearn, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2306.04641

Country:

North America > United States (0.30)
Asia (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback