AITopics | He, Yue

Collaborating Authors

He, Yue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sample Weight Averaging for Stable Prediction

Yu, Han, He, Yue, Xu, Renzhe, Li, Dongbai, Zhang, Jiayin, Zou, Wenchao, Cui, Peng

arXiv.org Artificial IntelligenceFeb-11-2025

The challenge of Out-of-Distribution (OOD) generalization poses a foundational concern for the application of machine learning algorithms to risk-sensitive areas. Inspired by traditional importance weighting and propensity weighting methods, prior approaches employ an independence-based sample reweighting procedure. They aim at decorrelating covariates to counteract the bias introduced by spurious correlations between unstable variables and the outcome, thus enhancing generalization and fulfilling stable prediction under covariate shift. Nonetheless, these methods are prone to experiencing an inflation of variance, primarily attributable to the reduced efficacy in utilizing training samples during the reweighting process. Existing remedies necessitate either environmental labels or substantially higher time costs along with additional assumptions and supervised information. To mitigate this issue, we propose SAmple Weight Averaging (SAWA), a simple yet efficacious strategy that can be universally integrated into various sample reweighting algorithms to decrease the variance and coefficient estimation error, thus boosting the covariate-shift generalization and achieving stable prediction across different environments. We prove its rationality and benefits theoretically. Experiments across synthetic datasets and real-world datasets consistently underscore its superiority against covariate shift.

artificial intelligence, generalization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.07414

Country:

Europe (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Error Slice Discovery via Manifold Compactness

Yu, Han, Liu, Jiashuo, Zou, Hao, Xu, Renzhe, He, Yue, Zhang, Xingxuan, Cui, Peng

arXiv.org Artificial IntelligenceJan-31-2025

Despite the great performance of deep learning models in many areas, they still make mistakes and underperform on certain subsets of data, i.e. error slices. Given a trained model, it is important to identify its semantically coherent error slices that are easy to interpret, which is referred to as the error slice discovery problem. However, there is no proper metric of slice coherence without relying on extra information like predefined slice labels. Current evaluation of slice coherence requires access to predefined slices formulated by metadata like attributes or subclasses. Its validity heavily relies on the quality and abundance of metadata, where some possible patterns could be ignored. Besides, current algorithms cannot directly incorporate the constraint of coherence into their optimization objective due to the absence of an explicit coherence metric, which could potentially hinder their effectiveness. In this paper, we propose manifold compactness, a coherence metric without reliance on extra information by incorporating the data geometry property into its design, and experiments on typical datasets empirically validate the rationality of the metric. Then we develop Manifold Compactness based error Slice Discovery (MCSD), a novel algorithm that directly treats risk and coherence as the optimization objective, and is flexible to be applied to models of various tasks. Extensive experiments on the benchmark and case studies on other typical datasets demonstrate the superiority of MCSD.

artificial intelligence, error slice discovery, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2501.19032

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Full Bayesian Significance Testing for Neural Networks

Liu, Zehua, Li, Zimeng, Wang, Jingyuan, He, Yue

arXiv.org Artificial IntelligenceJan-24-2024

Significance testing aims to determine whether a proposition about the population distribution is the truth or not given observations. However, traditional significance testing often needs to derive the distribution of the testing statistic, failing to deal with complex nonlinear relationships. In this paper, we propose to conduct Full Bayesian Significance Testing for neural networks, called \textit{n}FBST, to overcome the limitation in relationship characterization of traditional approaches. A Bayesian neural network is utilized to fit the nonlinear and multi-dimensional relationships with small errors and avoid hard theoretical derivation by computing the evidence value. Besides, \textit{n}FBST can test not only global significance but also local and instance-wise significance, which previous testing methods don't focus on. Moreover, \textit{n}FBST is a general framework that can be extended based on the measures selected, such as Grad-\textit{n}FBST, LRP-\textit{n}FBST, DeepLIFT-\textit{n}FBST, LIME-\textit{n}FBST. A range of experiments on both simulated and real data are conducted to show the advantages of our method.

artificial intelligence, machine learning, significance, (19 more...)

arXiv.org Artificial Intelligence

2401.13335

Country:

Asia > China (0.15)
North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.96)
Research Report > New Finding (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Self-Supervised Deconfounding Against Spatio-Temporal Shifts: Theory and Modeling

Ji, Jiahao, Zhang, Wentao, Wang, Jingyuan, He, Yue, Huang, Chao

arXiv.org Artificial IntelligenceNov-21-2023

As an important application of spatio-temporal (ST) data, ST traffic forecasting plays a crucial role in improving urban travel efficiency and promoting sustainable development. In practice, the dynamics of traffic data frequently undergo distributional shifts attributed to external factors such as time evolution and spatial differences. This entails forecasting models to handle the out-of-distribution (OOD) issue where test data is distributed differently from training data. In this work, we first formalize the problem by constructing a causal graph of past traffic data, future traffic data, and external ST contexts. We reveal that the failure of prior arts in OOD traffic data is due to ST contexts acting as a confounder, i.e., the common cause for past data and future ones. Then, we propose a theoretical solution named Disentangled Contextual Adjustment (DCA) from a causal lens. It differentiates invariant causal correlations against variant spurious ones and deconfounds the effect of ST contexts. On top of that, we devise a Spatio-Temporal sElf-superVised dEconfounding (STEVE) framework. It first encodes traffic data into two disentangled representations for associating invariant and variant ST contexts. Then, we use representative ST contexts from three conceptually different perspectives (i.e., temporal, spatial, and semantic) as self-supervised signals to inject context information into both representations. In this way, we improve the generalization ability of the learned context-oriented representations to OOD ST traffic forecasting. Comprehensive experiments on four large-scale benchmark datasets demonstrate that our STEVE consistently outperforms the state-of-the-art baselines across various ST OOD scenarios.

machine learning, natural language, st context, (19 more...)

arXiv.org Artificial Intelligence

2311.12472

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Transportation (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Towards Out-Of-Distribution Generalization: A Survey

Liu, Jiashuo, Shen, Zheyan, He, Yue, Zhang, Xingxuan, Xu, Renzhe, Yu, Han, Cui, Peng

arXiv.org Artificial IntelligenceJul-27-2023

Traditional machine learning paradigms are based on the assumption that both training and test data follow the same statistical pattern, which is mathematically referred to as Independent and Identically Distributed ($i.i.d.$). However, in real-world applications, this $i.i.d.$ assumption often fails to hold due to unforeseen distributional shifts, leading to considerable degradation in model performance upon deployment. This observed discrepancy indicates the significance of investigating the Out-of-Distribution (OOD) generalization problem. OOD generalization is an emerging topic of machine learning research that focuses on complex scenarios wherein the distributions of the test data differ from those of the training data. This paper represents the first comprehensive, systematic review of OOD generalization, encompassing a spectrum of aspects from problem definition, methodological development, and evaluation procedures, to the implications and future directions of the field. Our discussion begins with a precise, formal characterization of the OOD generalization problem. Following that, we categorize existing methodologies into three segments: unsupervised representation learning, supervised model learning, and optimization, according to their positions within the overarching learning process. We provide an in-depth discussion on representative methodologies for each category, further elucidating the theoretical links between them. Subsequently, we outline the prevailing benchmark datasets employed in OOD generalization studies. To conclude, we overview the existing body of work in this domain and suggest potential avenues for future research on OOD generalization. A summary of the OOD generalization methodologies surveyed in this paper can be accessed at http://out-of-distribution-generalization.com.

artificial intelligence, generalization, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2108.13624

Country:

North America > United States (1.00)
Europe (0.67)
Africa (0.67)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
(2 more...)

Add feedback

Rethinking the Evaluation Protocol of Domain Generalization

Yu, Han, Zhang, Xingxuan, Xu, Renzhe, Liu, Jiashuo, He, Yue, Cui, Peng

arXiv.org Artificial IntelligenceMay-24-2023

Domain generalization aims to solve the challenge of Out-of-Distribution (OOD) generalization by leveraging common knowledge learned from multiple training domains to generalize to unseen test domains. To accurately evaluate the OOD generalization ability, it is necessary to ensure that test data information is unavailable. However, the current domain generalization protocol may still have potential test data information leakage. This paper examines the potential risks of test data information leakage in two aspects of the current protocol: pretraining on ImageNet and oracle model selection. We propose that training from scratch and using multiple test domains would result in a more precise evaluation of OOD generalization ability. We also rerun the algorithms with the modified protocol and introduce a new leaderboard to encourage future research in domain generalization with a fairer comparison.

artificial intelligence, generalization, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.15253

Country: Asia (0.46)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Exploring and Exploiting Data Heterogeneity in Recommendation

Wang, Zimu, Liu, Jiashuo, Zou, Hao, Zhang, Xingxuan, He, Yue, Liang, Dongxu, Cui, Peng

arXiv.org Artificial IntelligenceMay-21-2023

Massive amounts of data are the foundation of data-driven recommendation models. As an inherent nature of big data, data heterogeneity widely exists in real-world recommendation systems. It reflects the differences in the properties among sub-populations. Ignoring the heterogeneity in recommendation data could limit the performance of recommendation models, hurt the sub-populational robustness, and make the models misled by biases. However, data heterogeneity has not attracted substantial attention in the recommendation community. Therefore, it inspires us to adequately explore and exploit heterogeneity for solving the above problems and assisting data analysis. In this work, we focus on exploring two representative categories of heterogeneity in recommendation data that is the heterogeneity of prediction mechanism and covariate distribution and propose an algorithm that explores the heterogeneity through a bilevel clustering method. Furthermore, the uncovered heterogeneity is exploited for two purposes in recommendation scenarios which are prediction with multiple sub-models and supporting debias. Extensive experiments on real-world data validate the existence of heterogeneity in recommendation data and the effectiveness of exploring and exploiting data heterogeneity in recommendation.

data mining, heterogeneity, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.15431

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Stable Learning via Sparse Variable Independence

Yu, Han, Cui, Peng, He, Yue, Shen, Zheyan, Lin, Yong, Xu, Renzhe, Zhang, Xingxuan

arXiv.org Artificial IntelligenceDec-2-2022

The problem of covariate-shift generalization has attracted intensive research attention. Previous stable learning algorithms employ sample reweighting schemes to decorrelate the covariates when there is no explicit domain information about training data. However, with finite samples, it is difficult to achieve the desirable weights that ensure perfect independence to get rid of the unstable variables. Besides, decorrelating within stable variables may bring about high variance of learned models because of the over-reduced effective sample size. A tremendous sample size is required for these algorithms to work. In this paper, with theoretical justification, we propose SVI (Sparse Variable Independence) for the covariate-shift generalization problem. We introduce sparsity constraint to compensate for the imperfectness of sample reweighting under the finite-sample setting in previous methods. Furthermore, we organically combine independence-based sample reweighting and sparsity-based variable selection in an iterative way to avoid decorrelating within stable variables, increasing the effective sample size to alleviate variance inflation. Experiments on both synthetic and real-world datasets demonstrate the improvement of covariate-shift generalization performance brought by SVI.

artificial intelligence, generalization, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.00992

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

He, Yue, Chen, Chen, Zhang, Jing, Liu, Juhua, He, Fengxiang, Wang, Chaoyue, Du, Bo

arXiv.org Artificial IntelligenceDec-23-2021

Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual semantics within and between character instances, making them not generalize well to arbitrary shape scene text. To address this issue, we make the first attempt to perform textual reasoning based on visual semantics in this paper. Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity. Then, these subgraphs are sequentially connected by their root nodes and merged into a complete graph. Based on this graph, we devise a graph convolutional network for textual reasoning (GTR) by supervising it with a cross-entropy loss. GTR can be easily plugged in representative STR models to improve their performance owing to better textual reasoning. Specifically, we construct our model, namely S-GTR, by paralleling GTR to the language model in a segmentation-based STR baseline, which can effectively exploit the visual-linguistic complementarity via mutual learning. S-GTR sets new state-of-the-art on six challenging STR benchmarks and generalizes well to multi-linguistic datasets. Code is available at https://github.com/adeline-cs/GTR.

machine learning, pattern recognition, recognition, (16 more...)

arXiv.org Artificial Intelligence

2112.12916

Country:

Europe (1.00)
North America > United States > Hawaii (0.14)
Asia > China > Hubei Province (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.64)

Add feedback

Deep Stable Learning for Out-Of-Distribution Generalization

Zhang, Xingxuan, Cui, Peng, Xu, Renzhe, Zhou, Linjun, He, Yue, Shen, Zheyan

arXiv.org Artificial IntelligenceApr-15-2021

Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.

art painting, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2104.07876

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback