AITopics | Cui, Peng

Collaborating Authors

Cui, Peng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series

Liu, Ying, Cui, Peng, Hu, Wenbo, Hong, Richang

arXiv.org Machine LearningDec-3-2023

Multivariate time series are everywhere. Nevertheless, real-world time series data often exhibit numerous missing values, which is the time series imputation task. Although previous deep learning methods have been shown to be effective for time series imputation, they are shown to produce overconfident imputations, which might be a potentially overlooked threat to the reliability of the intelligence system. Score-based diffusion method(i.e., CSDI) is effective for the time series imputation task but computationally expensive due to the nature of the generative diffusion model framework. In this paper, we propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty and meanwhile is computationally efficient. Specifically, we incorporate deep ensembles into quantile regression with a shared model backbone and a series of quantile discrimination functions.This framework combines the merits of accurate uncertainty estimation of deep ensembles and quantile regression and above all, the shared model backbone tremendously reduces most of the computation overhead of the multiple ensembles. We examine the performance of the proposed method on two real-world datasets: air quality and health-care datasets and conduct extensive experiments to show that our method excels at making deterministic and probabilistic predictions. Compared with the score-based diffusion method: CSDI, we can obtain comparable forecasting results and is better when more data is missing. Furthermore, as a non-generative model compared with CSDI, the proposed method consumes a much smaller computation overhead, yielding much faster training speed and fewer model parameters.

data mining, imputation, machine learning, (11 more...)

arXiv.org Machine Learning

2312.01294

Country:

Asia > China (0.48)
North America > United States (0.29)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

He, Guande, Cui, Peng, Chen, Jianfei, Hu, Wenbo, Zhu, Jun

arXiv.org Artificial IntelligenceNov-19-2023

Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.11732

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications

Liu, Jiashuo, Wu, Jiayun, Wang, Tianyu, Zou, Hao, Li, Bo, Cui, Peng

arXiv.org Artificial IntelligenceNov-8-2023

Machine learning algorithms minimizing average risk are susceptible to distributional shifts. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case risk within an uncertainty set. However, DRO suffers from over-pessimism, leading to low-confidence predictions, poor parameter estimations as well as poor generalization. In this work, we conduct a theoretical analysis of a probable root cause of over-pessimism: excessive focus on noisy samples. To alleviate the impact of noise, we incorporate data geometry into calibration terms in DRO, resulting in our novel Geometry-Calibrated DRO (GCDRO) for regression. We establish the connection between our risk objective and the Helmholtz free energy in statistical physics, and this free-energy-based risk can extend to standard DRO methods. Leveraging gradient flow in Wasserstein space, we develop an approximate minimax optimization algorithm with a bounded error ratio and elucidate how our approach mitigates noisy sample effects. Comprehensive experiments confirm GCDRO's superiority over conventional DRO methods.

artificial intelligence, combating over-pessimism, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2311.05054

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Accelerated Model Training via Bayesian Data Selection

Deng, Zhijie, Cui, Peng, Zhu, Jun

arXiv.org Artificial IntelligenceNov-7-2023

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy to implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging Web-Vision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.

machine learning, natural language, zero-shot predictor, (17 more...)

arXiv.org Artificial Intelligence

2308.10544

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction

Cui, Peng, Zhang, Dan, Deng, Zhijie, Dong, Yinpeng, Zhu, Jun

arXiv.org Artificial IntelligenceOct-30-2023

Large-scale pre-trained models have achieved remarkable success in many applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample's difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample difficulty, we simultaneously improve accuracy and uncertainty calibration across challenging benchmarks (e.g., +0.55% ACC and -3.7% ECE on ImageNet1k using ResNet34), consistently surpassing competitive baselines for reliable prediction. The improved uncertainty estimate further improves selective classification (abstaining from erroneous predictions) and out-of-distribution detection.

artificial intelligence, machine learning, sample difficulty, (15 more...)

arXiv.org Artificial Intelligence

2304.10127

Country:

Asia > China (0.46)
North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Heterogeneous Multi-Task Gaussian Cox Processes

Zhou, Feng, Kong, Quyu, Deng, Zhijie, He, Fengxiang, Cui, Peng, Zhu, Jun

arXiv.org Machine LearningAug-29-2023

Inhomogeneous Poisson process data defined on a continuous spatio-temporal domain has attracted immense attention recently in a wide variety of applications, including reliability analysis in manufacturing systems (Soleimani et al, 2017), event capture in sensing regions (Mutny and Krause, 2021), crime prediction in urban area (Shirota and Gelfand, 2017) and disease diagnosis based on medical records (Lasko, 2014). The reliable training of an inhomogeneous Poisson process model critically relies on a large amount of data to avoid overfitting, especially when modeling high-dimensional point processes. However, one challenge is that the available training data is routinely sparse or even partially missing in specific applications. Taking manufacturing failure and healthcare analysis as motivating examples: the modern manufacturing machines are reliable and sparsely fail; the individuals with healthy constitution will not visit hospital very often. The data missing problems also arise, e.g., the event location capture is intermittent for sensing systems because of weather or other related barriers.

artificial intelligence, cox process, machine learning, (17 more...)

arXiv.org Machine Learning

2308.15364

Country: North America > Canada (0.67)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Modeling & Simulation (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Xu, Renzhe, Wang, Haotian, Zhang, Xingxuan, Li, Bo, Cui, Peng

arXiv.org Artificial IntelligenceAug-4-2023

Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms' rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms' rewards are known. Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.

data mining, equilibrium, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.19158

Country:

Asia (0.28)
North America > United States (0.27)

Genre: Research Report (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

Towards Out-Of-Distribution Generalization: A Survey

Liu, Jiashuo, Shen, Zheyan, He, Yue, Zhang, Xingxuan, Xu, Renzhe, Yu, Han, Cui, Peng

arXiv.org Artificial IntelligenceJul-27-2023

Traditional machine learning paradigms are based on the assumption that both training and test data follow the same statistical pattern, which is mathematically referred to as Independent and Identically Distributed ($i.i.d.$). However, in real-world applications, this $i.i.d.$ assumption often fails to hold due to unforeseen distributional shifts, leading to considerable degradation in model performance upon deployment. This observed discrepancy indicates the significance of investigating the Out-of-Distribution (OOD) generalization problem. OOD generalization is an emerging topic of machine learning research that focuses on complex scenarios wherein the distributions of the test data differ from those of the training data. This paper represents the first comprehensive, systematic review of OOD generalization, encompassing a spectrum of aspects from problem definition, methodological development, and evaluation procedures, to the implications and future directions of the field. Our discussion begins with a precise, formal characterization of the OOD generalization problem. Following that, we categorize existing methodologies into three segments: unsupervised representation learning, supervised model learning, and optimization, according to their positions within the overarching learning process. We provide an in-depth discussion on representative methodologies for each category, further elucidating the theoretical links between them. Subsequently, we outline the prevailing benchmark datasets employed in OOD generalization studies. To conclude, we overview the existing body of work in this domain and suggest potential avenues for future research on OOD generalization. A summary of the OOD generalization methodologies surveyed in this paper can be accessed at http://out-of-distribution-generalization.com.

artificial intelligence, generalization, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2108.13624

Country:

North America > United States (1.00)
Europe (0.67)
Africa (0.67)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
(2 more...)

Add feedback

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

Liu, Jiashuo, Wang, Tianyu, Cui, Peng, Namkoong, Hongseok

arXiv.org Artificial IntelligenceJul-11-2023

The performance of predictive models has been observed to degrade under distribution shifts in a wide range of applications, such as healthcare [8, 68, 56, 67], economics [28, 18], education [5], vision [55, 47, 64, 70], and language [46, 6]. Distribution shifts vary in type, typically defined as either a change in the marginal distribution of the covariates (X-shifts), or changes in the conditional relationship between the outcome and covariate (Y |X-shifts). Real-world scenarios comprise of both types of shifts. In computer vision [46, 37, 60, 30, 72], Y |X-shifts are less likely as Y is constructed from human labels given an input X. Due to the prevalence of X-shifts, the implicit goal of many researchers is to develop a single robust model that can generalize effectively across multiple domains, akin to humans. For tabular data, Y |X-shifts may arise because of missing variables and hidden confounders. For example, the prevalence of diseases among patients may be affected by covariates that are not recorded in medical datasets but vary among individuals, such as lifestyle factors (e.g., diet, exercise, smoking status) and socioeconomic status [31, 74, 67]. Under Y |X-shifts, there may be a fundamental trade-off between learning algorithms: to perform well on a target distribution, a model may have to necessarily perform worse on others. Algorithmically, typical methods for addressing Y |X-shifts include distributionally robust optimization (DRO) [11, 63, 21, 59, 20] and causal learning methods [54, 7, 62, 36].

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.05284

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

Liu, Shixuan, Fan, Changjun, Cheng, Kewei, Wang, Yunfei, Cui, Peng, Sun, Yizhou, Liu, Zhong

arXiv.org Artificial IntelligenceJul-8-2023

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.

machine learning, reinforcement learning, relation, (19 more...)

arXiv.org Artificial Intelligence

2307.03937

Country:

North America > United States (1.00)
Europe (0.93)
Asia > China (0.69)

Genre: Research Report (1.00)

Industry:

Education (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(3 more...)

Add feedback