Goto

Collaborating Authors

 exi


Prediction-powered Inference by Mixture of Experts

arXiv.org Machine Learning

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-supervised inference, in which labeled data are limited and expensive to obtain, whereas unlabeled data are abundant and widely available. Given a collection of predictors, we treat them as a mixture of experts (MOE) and introduce an MOE-powered semi-supervised inference framework built upon prediction-powered inference (PPI). Motivated by the variance reduction principle underlying PPI, the proposed framework seeks the mixture of experts that achieves the smallest possible variance. Compared with standard PPI, the MOE-powered inference framework adapts to the unknown performance of individual predictors, benefits from their collective predictive power, and enjoys a best-expert guarantee. The framework is flexible and applies to mean estimation, linear regression, quantile estimation, and general M-estimation. We develop non-asymptotic theory for the MOE-powered inference framework and establish upper bounds on the coverage error of the resulting confidence intervals. Numerical experiments demonstrate the practical effectiveness of MOE-powered inference and corroborate our theoretical findings.


ATheory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning (Supplementary File)

Neural Information Processing Systems

This supplementary document contains more additional experimental details and the technical proofs of convergence results of the NeurIPS'21 submission entitled "ATheory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning". It is structured as follows. In Appendix A, we provides more experimental details, including training algorithm, network architecture, optimizer details, loss construction and training cost of SANE. Appendix B presents the proof and details of the main results, namely, Theorem 1, in Section 2, which analyzes the generalization performance of MoCo. Next, Appendix C introduces the proof roadmap and details of the main results, i.e.


A unifying view of contrastive learning, importance sampling, and bridge sampling for energy-based models

arXiv.org Machine Learning

In the last decades, energy-based models (EBMs) have become an important class of probabilistic models in which a component of the likelihood is intractable and therefore cannot be evaluated explicitly. Consequently, parameter estimation in EBMs is challenging for conventional inference methods. In this work, we provide a unified framework that connects noise contrastive estimation (NCE), reverse logistic regression (RLR), multiple importance sampling (MIS), and bridge sampling within the context of EBMs. We further show that these methods are equivalent under specific conditions. This unified perspective clarifies relationships among existing methods and enables the development of new estimators, with the potential to improve statistical and computational efficiency. Furthermore, this study helps elucidate the success of NCE in terms of its flexibility and robustness, while also identifying scenarios in which its performance can be further improved. Hence, rather than being a purely descriptive review, this work offers a unifying perspective and additional methodological contributions. The MATLAB code used in the numerical experiments is also made freely available to support the reproducibility of the results.


ABest-of-both-worldsAlgorithmforBanditswith DelayedFeedbackwithRobustnesstoExcessiveDelays

Neural Information Processing Systems

Joulani et al. (2013) have studied multi-armed bandits with delayed feedback under the assumption that the rewards are stochastic and the delays are sampled from a fixed distribution.


f593c9c251d4d7cf14d4ab9861dfb7eb-Paper-Conference.pdf

Neural Information Processing Systems

However, some recent studies haverecognized that most ofthese approaches failtoimprovethe performance over empirical risk minimization especially when applied to overparameterized neural networks.



ATheory-DrivenSelf-LabelingRefinementMethodfor ContrastiveRepresentationLearning

Neural Information Processing Systems

Althoughintuitive,sucha nativelabelassignment strategycannot revealtheunderlying semantic similarity between aquery anditspositivesandnegatives,andimpairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query.