AITopics | Li, Yufan

Collaborating Authors

Li, Yufan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling

Li, Yufan, Sur, Pragya

arXiv.org Machine LearningFeb-20-2025

EDU Department of Statistics, Harvard University 1 Oxford Street, Cambridge, MA Abstract We study the fundamental problem of calibrating a linear binary classifier of the form σ ( ˆw x), where the feature vector x is Gaussian, σ is a link function, and ˆ w is an estimator of the true linear weight w . By interpolating with a noninformative chance classifier, we construct a well-calibrated predictor whose interpolation weight depends on the angle ( ˆw,w) between the estimator ˆ w and the true linear weight w . We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle ( ˆw,w) can be consistently estimated. Furthermore, the resulting predictor is uniquely Bregman-optimal, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions. Keywords: Calibration; Binary Classification; High Dimensions; Bregman Divergence 1. Introduction Calibration of predictive models is a fundamental problem in statistics and machine learning, especially in applications that require reliable uncertainty quantification.

artificial intelligence, machine learning, predictor, (15 more...)

arXiv.org Machine Learning

2502.15131

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data

Luo, Kevin, Li, Yufan, Sur, Pragya

arXiv.org Machine LearningJun-17-2024

Two key tasks in high-dimensional regularized regression are tuning the regularization strength for good predictions and estimating the out-of-sample risk. It is known that the standard approach -- $k$-fold cross-validation -- is inconsistent in modern high-dimensional settings. While leave-one-out and generalized cross-validation remain consistent in some high-dimensional cases, they become inconsistent when samples are dependent or contain heavy-tailed covariates. To model structured sample dependence and heavy tails, we use right-rotationally invariant covariate distributions - a crucial concept from compressed sensing. In the common modern proportional asymptotics regime where the number of features and samples grow comparably, we introduce a new framework, ROTI-GCV, for reliably performing cross-validation. Along the way, we propose new estimators for the signal-to-noise ratio and noise variance under these challenging conditions. We conduct extensive experiments that demonstrate the power of our approach and its superiority over existing methods.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Machine Learning

2406.11666

Country: North America > United States > New York (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Component Regression

Li, Yufan, Sur, Pragya

arXiv.org Machine LearningOct-16-2023

We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove the shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Components Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.

artificial intelligence, machine learning, null 1, (15 more...)

arXiv.org Machine Learning

2309.0781

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Balancing Risk and Reward: An Automated Phased Release Strategy

Li, Yufan, Mao, Jialiang, Bojinov, Iavor

arXiv.org Machine LearningMay-16-2023

Phased release, also known as staged rollout, is a widely used strategy in the technology industry that involves gradually releasing a new product or update to larger audiences over time [17, 30]. For example, Apple's App Store offers a phased release option where application updates are released over a 7-day period on a fixed schedule [1]. Google Play Console provides a similar feature with more flexibility in the release schedule [16]. Typically, the audiences are randomly selected at each stage from the set of all customers, and so phased releases can be thought of as a sequence of A/B tests (or randomized experiments) in which the proportion of units assigned to the treatment group changes until either the product or update is fully launched or deprecated [26, 18, 3, 33, 6]. The process of combining phased releases with A/B tests is often called controlled rollout or iterative experiments and provides companies with an important mechanism to gather feedback on early product versions [30, 20, 5]. The key advantage of phased release is its ability to mitigate risks associated with launching a new product or update directly to all users. The potential impact of faulty features is limited by releasing the update first to a small percentage of the users (i.e., the treatment group). However, this risk-averse approach introduces an opportunity cost for slowly launching beneficial features, which quickly adds up for companies that release thousands of features yearly [34].

data mining, experiment, machine learning, (19 more...)

arXiv.org Machine Learning

2305.09626

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.68)
Information Technology > Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.69)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback