AITopics

2503.00626

Country: Europe > Italy (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.45)

arXiv.org Machine LearningJun-20-2024

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Huang, Ziyi, Lam, Henry, Zhang, Haofeng

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Nevertheless, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a general theoretical framework to analyze stochastic linear bandits in the presence of approximate inference and conduct regret analysis on two Bayesian bandit algorithms, Linear Thompson sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that both LinTS and LinBUCB can preserve their original rates of regret upper bound but with a sacrifice of larger constant terms when applied with approximate inference. These results hold for general Bayesian inference approaches, under the assumption that the inference error measured by two different $\alpha$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB improves the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

artificial intelligence, big data, data mining, (18 more...)

2406.14071

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)

arXiv.org Artificial IntelligenceDec-26-2023

Revealing the Proximate Long-Tail Distribution in Compositional Zero-Shot Learning

Jiang, Chenyi, Zhang, Haofeng

Compositional Zero-Shot Learning (CZSL) aims to transfer knowledge from seen state-object pairs to novel unseen pairs. In this process, visual bias caused by the diverse interrelationship of state-object combinations blurs their visual features, hindering the learning of distinguishable class prototypes. Prevailing methods concentrate on disentangling states and objects directly from visual features, disregarding potential enhancements that could arise from a data viewpoint. Experimentally, we unveil the results caused by the above problem closely approximate the long-tailed distribution. As a solution, we transform CZSL into a proximate class imbalance problem. We mathematically deduce the role of class prior within the long-tailed distribution in CZSL. Building upon this insight, we incorporate visual bias caused by compositions into the classifier's training and inference by estimating it as a proximate class prior. This enhancement encourages the classifier to acquire more discernible class prototypes for each composition, thereby achieving more balanced predictions. Experimental results demonstrate that our approach elevates the model's performance to the state-of-the-art level, without introducing additional parameters. Our code is available at \url{https://github.com/LanchJL/ProLT-CZSL}.

large language model, machine learning, natural language, (18 more...)

2312.15923

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)

arXiv.org Machine LearningNov-9-2023

Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks

Huang, Ziyi, Lam, Henry, Zhang, Haofeng

Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated network retraining. Building upon the recent neural tangent kernel theory, we create statistically guaranteed schemes to principally \emph{characterize}, and \emph{remove}, the uncertainty of over-parameterized neural networks with very low computation effort. In particular, our approach, based on what we call a procedural-noise-correcting (PNC) predictor, removes the procedural uncertainty by using only \emph{one} auxiliary network that is trained on a suitably labeled dataset, instead of many retrained networks employed in deep ensembles. Moreover, by combining our PNC predictor with suitable light-computation resampling methods, we build several approaches to construct asymptotically exact-coverage confidence intervals using as low as four trained networks without additional overheads.

artificial intelligence, confidence interval, machine learning, (15 more...)

2306.05674

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-6-2023

Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective

Elmachtoub, Adam N., Lam, Henry, Zhang, Haofeng, Zhao, Yunfan

In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.

artificial intelligence, machine learning, saa, (18 more...)

2304.06833

Country:

North America > United States > New York (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceMar-7-2023

Deconstructed Generation-Based Zero-Shot Model

Chen, Dubing, Shen, Yuming, Zhang, Haofeng, Torr, Philip H. S.

Recent research on Generalized Zero-Shot Learning (GZSL) has focused primarily on generation-based methods. However, current literature has overlooked the fundamental principles of these methods and has made limited progress in a complex manner. In this paper, we aim to deconstruct the generator-classifier framework and provide guidance for its improvement and extension. We begin by breaking down the generator-learned unseen class distribution into class-level and instance-level distributions. Through our analysis of the role of these two types of distributions in solving the GZSL problem, we generalize the focus of the generation-based approach, emphasizing the importance of (i) attribute generalization in generator learning and (ii) independent classifier learning with partially biased data. We present a simple method based on this analysis that outperforms SotAs on four public GZSL datasets, demonstrating the validity of our deconstruction. Furthermore, our proposed method remains effective even without a generative model, representing a step towards simplifying the generator-classifier structure. Our code is available at \url{https://github.com/cdb342/DGZ}.

artificial intelligence, gaussian distribution, machine learning, (16 more...)

2204.1128

Country: Asia (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

arXiv.org Artificial IntelligenceNov-23-2022

Evolutionary Generalized Zero-Shot Learning

Chen, Dubing, Zhang, Haofeng, Shen, Yuming, Long, Yang, Shao, Ling

An open problem on the path to artificial intelligence is generalization from the known to the unknown, which is instantiated as Generalized Zero-Shot Learning (GZSL) task. In this work, we propose a novel Evolutionary Generalized Zero-Shot Learning setting, which (i) avoids the domain shift problem in inductive GZSL, and (ii) is more in line with the needs of real-world deployments than transductive GZSL. In the proposed setting, a zero-shot model with poor initial performance is able to achieve online evolution during application. We elaborate on three challenges of this special task, i.e., catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuing to evolve on a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets show that our model can learn from the test data stream while other baselines fail.

artificial intelligence, machine learning, zero-shot learning, (15 more...)

2211.13174

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningJun-9-2022

Evaluating Aleatoric Uncertainty via Conditional Generative Models

Huang, Ziyi, Lam, Henry, Zhang, Haofeng

Aleatoric uncertainty quantification seeks for distributional knowledge of random responses, which is important for reliability analysis and robustness improvement in machine learning applications. Previous research on aleatoric uncertainty estimation mainly targets closed-formed conditional densities or variances, which requires strong restrictions on the data distribution or dimensionality. To overcome these restrictions, we study conditional generative models for aleatoric uncertainty estimation. We introduce two metrics to measure the discrepancy between two conditional distributions that suit these models. Both metrics can be easily and unbiasedly computed via Monte Carlo simulation of the conditional generative models, thus facilitating their evaluation and training. We demonstrate numerically how our metrics provide correct measurements of conditional distributional discrepancies and can be used to train conditional models competitive against existing benchmarks.

artificial intelligence, machine learning, natural language, (14 more...)

2206.04287

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningJan-30-2022

Generalized Bayesian Upper Confidence Bound with Approximate Inference for Bandit Problems

Huang, Ziyi, Lam, Henry, Meisami, Amirhossein, Zhang, Haofeng

Bayesian bandit algorithms with approximate inference have been widely used in practice with superior performance. Yet, few studies regarding the fundamental understanding of their performances are available. In this paper, we propose a Bayesian bandit algorithm, which we call Generalized Bayesian Upper Confidence Bound (GBUCB), for bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that in Bernoulli multi-armed bandit, GBUCB can achieve $O(\sqrt{T}(\log T)^c)$ frequentist regret if the inference error measured by symmetrized Kullback-Leibler divergence is controllable. This analysis relies on a novel sensitivity analysis for quantile shifts with respect to inference errors. To our best knowledge, our work provides the first theoretical regret bound that is better than $o(T)$ in the setting of approximate inference. Our experimental evaluations on multiple approximate inference settings corroborate our theory, showing that our GBUCB is consistently superior to BUCB and Thompson sampling.

artificial intelligence, data mining, generalized bayesian upper confidence bound, (13 more...)

2201.12955

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

arXiv.org Machine LearningOct-22-2021

Quantifying Epistemic Uncertainty in Deep Learning

Huang, Ziyi, Lam, Henry, Zhang, Haofeng

Uncertainty quantification is at the core of the reliability and robustness of machine learning. It is well-known that uncertainty consists of two different types, often referred to as aleatoric and epistemic uncertainties. In this paper, we provide a systematic study on the epistemic uncertainty in deep supervised learning. We rigorously distinguish different sources of epistemic uncertainty, including in particular procedural variability (from the training procedure) and data variability (from the training data). We use our framework to explain how deep ensemble enhances prediction by reducing procedural variability. We also propose two approaches to estimate epistemic uncertainty for a well-trained neural network in practice. One uses influence function derived from the theory of neural tangent kernel that bypasses the convexity assumption violated by modern neural networks. Another uses batching that bypasses the time-consuming Gram matrix inversion in the influence function calculation, while expending minimal re-training effort. We discuss how both approaches overcome some difficulties in applying classical statistical methods to the inference on deep learning.

artificial intelligence, machine learning, neural network, (13 more...)

2110.12122

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)