AITopics

2502.20681

Country: Asia > China (0.14)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-30-2024

Predictive Inference With Fast Feature Conformal Prediction

Tang, Zihao, Wang, Boyuan, Wen, Chuan, Teng, Jiaye

Conformal prediction is widely adopted in uncertainty quantification, due to its post-hoc, distribution-free, and model-agnostic properties. In the realm of modern deep learning, researchers have proposed Feature Conformal Prediction (FCP), which deploys conformal prediction in a feature space, yielding reduced band lengths. However, the practical utility of FCP is limited due to the time-consuming non-linear operations required to transform confidence bands from feature space to output space. In this paper, we introduce Fast Feature Conformal Prediction (FFCP), which features a novel non-conformity score and is convenient for practical applications. FFCP serves as a fast version of FCP, in that it equivalently employs a Taylor expansion to approximate the aforementioned non-linear operations in FCP. Empirical validations showcase that FFCP performs comparably with FCP (both outperforming the vanilla version) while achieving a significant reduction in computational time by approximately 50x. The code is available at https://github.com/ElvisWang1111/FastFeatureCP

artificial intelligence, ffcp, machine learning, (16 more...)

2412.00653

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-10-2023

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Zhang, Peiyuan, Teng, Jiaye, Zhang, Jingzhao

This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide tight excess risk lower bounds for Gradient Descent (GD) and Stochastic Gradient Descent (SGD) under the general non-realizable smooth SCO setting, suggesting that existing stability analyses are tight in step-size and iteration dependence, and that overfitting provably happens. Next, we study the case when the loss is realizable, i.e. an optimal solution minimizes all the data points. Recent works show better rates can be attained but the improvement is reduced when training time is long. Our paper examines this observation by providing excess risk lower bounds for GD and SGD in two realizable settings: 1) $\eta T = \bigO{n}$, and (2) $\eta T = \bigOmega{n}$, where $n$ is the size of dataset. In the first case $\eta T = \bigOmega{n}$, our lower bounds tightly match and certify the respective upper bounds. However, for the case $\eta T = \bigOmega{n}$, our analysis indicates a gap between the lower and upper bounds. A conjecture is proposed that the gap can be closed by improving upper bounds, supported by analyses in two special scenarios.

artificial intelligence, machine learning, sgd, (12 more...)

2303.10758

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

arXiv.org Artificial IntelligenceApr-8-2023

Predictive Inference with Feature Conformal Prediction

Teng, Jiaye, Wen, Chuan, Zhang, Dinghuai, Bengio, Yoshua, Gao, Yang, Yuan, Yang

Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation learning. From a theoretical perspective, we demonstrate that feature conformal prediction provably outperforms regular conformal prediction under mild assumptions. Our approach could be combined with not only vanilla conformal prediction, but also other adaptive conformal prediction methods. Apart from experiments on existing predictive inference benchmarks, we also demonstrate the state-of-the-art performance of the proposed methods on large-scale tasks such as ImageNet classification and Cityscapes image segmentation.The code is available at \url{https://github.com/AlvinWen428/FeatureCP}.

artificial intelligence, feature cp, machine learning, (17 more...)

2210.00173

Country:

Asia (0.67)
Europe (0.67)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science (0.93)

arXiv.org Artificial IntelligenceApr-3-2023

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

Wen, Kaiyue, Teng, Jiaye, Zhang, Jingzhao

Studies on benign overfitting provide insights for the success of overparameterized deep learning models. In this work, we examine whether overfitting is truly benign in real-world classification tasks. We start with the observation that a ResNet model overfits benignly on Cifar10 but not benignly on ImageNet. To understand why benign overfitting fails in the ImageNet experiment, we theoretically analyze benign overfitting under a more restrictive setup where the number of parameters is not significantly larger than the number of data points. Under this mild overparameterization setup, our analysis identifies a phase change: unlike in the previous heavy overparameterization settings, benign overfitting can now fail in the presence of label noise. Our analysis explains our empirical observations, and is validated by a set of control experiments with ResNets. Our work highlights the importance of understanding implicit bias in underfitting regimes as a future direction.

artificial intelligence, machine learning, regime, (20 more...)

2206.00501

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

arXiv.org Machine LearningFeb-12-2022

Relaxing the Feature Covariance Assumption: Time-Variant Bounds for Benign Overfitting in Linear Regression

Xu, Jing, Teng, Jiaye, Yao, Andrew Chi-Chih

Benign overfitting demonstrates that overparameterized models can perform well on test data while fitting noisy training data. However, it only considers the final min-norm solution in linear regression, which ignores the algorithm information and the corresponding training procedure. In this paper, we generalize the idea of benign overfitting to the whole training trajectory instead of the min-norm solution and derive a time-variant bound based on the trajectory analysis. Starting from the time-variant bound, we further derive a time interval that suffices to guarantee a consistent generalization error for a given feature covariance. Unlike existing approaches, the newly proposed generalization bound is characterized by a time-variant effective dimension of feature covariance. By introducing the time factor, we relax the strict assumption on the feature covariance matrix required in previous benign overfitting under the regimes of overparameterized linear regression with gradient descent. This paper extends the scope of benign overfitting, and experiment results indicate that the proposed bound accords better with empirical evidence.

artificial intelligence, excess risk, machine learning, (14 more...)

2202.06054

Country:

Europe (1.00)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.82)

arXiv.org Machine LearningJun-10-2021

Towards Understanding Generalization via Decomposing Excess Risk Dynamics

Teng, Jiaye, Ma, Jianhao, Yuan, Yang

Generalization is one of the critical issues in machine learning. However, traditional methods like uniform convergence are not powerful enough to fully explain generalization because they may yield vacuous bounds even in overparameterized linear regression regimes. An alternative solution is to analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Unfortunately, the stability-based bound is still far from explaining the remarkable generalization ability of neural networks due to the coarse-grained analysis of the signal and noise. Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics and applying stability-based bound only on the variance part (which measures how the model performs on pure noise). We provide two applications for the framework, including a linear case (overparameterized linear regression with gradient descent) and a non-linear case (matrix recovery with gradient flow). Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.

artificial intelligence, evolutionary algorithm, excess risk, (14 more...)

2106.06153

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Artificial IntelligenceMar-8-2021

T-SCI: A Two-Stage Conformal Inference Algorithm with Guaranteed Coverage for Cox-MLP

Teng, Jiaye, Tan, Zeren, Yuan, Yang

It is challenging to deal with censored data, where we only have access to the incomplete information of survival time instead of its exact value. Fortunately, under linear predictor assumption, people can obtain guaranteed coverage for the confidence band of survival time using methods like Cox Regression. However, when relaxing the linear assumption with neural networks (e.g., Cox-MLP \citep{katzman2018deepsurv,kvamme2019time}), we lose the guaranteed coverage. To recover the guaranteed coverage without linear assumption, we propose two algorithms based on conformal inference. In the first algorithm \emph{WCCI}, we revisit weighted conformal inference and introduce a new non-conformity score based on partial likelihood. We then propose a two-stage algorithm \emph{T-SCI}, where we run WCCI in the first stage and apply quantile conformal inference to calibrate the results in the second stage. Theoretical analysis shows that T-SCI returns guaranteed coverage under milder assumptions than WCCI. We conduct extensive experiments on synthetic data and real data using different methods, which validate our analysis.

health & medicine, neural network, t-sci, (15 more...)

2103.04556

Country: Asia (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.49)

Industry:

Health & Medicine (1.00)
Law > Civil Rights & Constitutional Law (0.39)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Artificial IntelligenceMar-5-2021

Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? A Theoretical Analysis

Teng, Jiaye, Huang, Weiran

Pretext-based self-supervised learning aims to learn the semantic representation via a handcrafted pretext task over unlabeled data and then use the learned representation for downstream prediction tasks. \citet{lee2020predicting} prove that pretext-based self-supervised learning can effectively reduce the sample complexity of downstream tasks under Conditional Independence (CI) between the components of the pretext task conditional on the downstream label. However, the CI condition rarely holds in practice, and the downstream sample complexity will get much worse if the CI condition does not hold. In this paper, we explore the idea of applying a learnable function to the input to make the CI condition hold. In particular, we first rigorously formulate the criteria that the function needs to satisfy. We then design an ingenious loss function for learning such a function and prove that the function minimizing the proposed loss satisfies the above criteria. We theoretically study the number of labeled data required, and give a model-free lower bound showing that taking limited downstream data will hurt the performance of self-supervised learning. Furthermore, we take the model structure into account and give a model-dependent lower bound, which gets higher when the model capacity gets larger. Moreover, we conduct several numerical experiments to verify our theoretical results.

artificial intelligence, inductive learning, representation, (16 more...)

2103.03568

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

arXiv.org Machine LearningJun-4-2020

Inject Machine Learning into Significance Test for Misspecified Linear Models

Teng, Jiaye, Yuan, Yang

Due to its strong interpretability, linear regression is widely used in social science, from which significance test provides the significance level of models or coefficients in the traditional statistical inference. However, linear regression methods rely on the linear assumptions of the ground truth function, which do not necessarily hold in practice. As a result, even for simple non-linear cases, linear regression may fail to report the correct significance level. In this paper, we present a simple and effective assumption-free method for linear approximation in both linear and non-linear scenarios. First, we apply a machine learning method to fit the ground truth function on the training set and calculate its linear approximation. Afterward, we get the estimator by adding adjustments based on the validation set. We prove the concentration inequalities and asymptotic properties of our estimator, which leads to the corresponding significance test. Experimental results show that our estimator significantly outperforms linear regression for non-linear ground truth functions, indicating that our estimator might be a better tool for the significance test.

artificial intelligence, estimator, machine learning, (16 more...)

2006.03167

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)