Statistical Learning
Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
Geometric Order Learning for Rank Estimation
A novel approach to rank estimation, called geometric order learning (GOL), is proposed in this paper. First, we construct an embedding space, in which the direction and distance between objects represent order and metric relations between their ranks, by enforcing two geometric constraints: the order constraint compels objects to be sorted according to their ranks, while the metric constraint makes the distance between objects reflect their rank difference. Then, we perform the simple knearest neighbor (k-NN) search in the embedding space to estimate the rank of a test object. Moreover, to assess the quality of embedding spaces for rank estimation, we propose a metric called discriminative ratio for ranking (DRR). Extensive experiments on facial age estimation, historical color image (HCI) classification, and aesthetic score regression demonstrate that GOL constructs effective embedding spaces and thus yields excellent rank estimation performances. The source codes are available at https://github.com/seon92/GOL
Supplementary Materials for the Paper " Towards Free Data Selection with General-Purpose Models " Anonymous Author(s) Affiliation Address email
In this supplementary material, we first explain the details of spectral clustering algorithm in Sec. B. We also analyze the sensitivity of FreeSel to the values of hyperparameters in3 Sec. C. Besides, FreeSel is compared with other intuitive baselines using the general-purpose model4 in Sec. D. Finally, implementation details of our experiments are explained in Sec. E. Our code will5 be made publicly available.6 In this section, we explain the spectral clustering algorithm [14, 18] in the semantic pattern extraction8 process for each image I (Sec.
Towards Free Data Selection with General-Purpose Models
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. However, current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. In this paper, we challenge this status quo by designing a distinct data selection pipeline that utilizes existing general-purpose models to select data from various datasets with a single-pass inference without the need for additional training or supervision. A novel free data selection (FreeSel) method is proposed following this new pipeline. Specifically, we define semantic patterns extracted from intermediate features of the general-purpose model to capture subtle local information in each image. We then enable the selection of all data samples in a single pass through distance-based sampling at the fine-grained semantic pattern level.
047397849f63b4fcfced4ff720159f3d-Supplemental-Conference.pdf
The ε-sensitivity of distributions is defined below. Next, we provide the following lemma. Suppose that the distribution map D(θ) forms a location family (7). To show the L-Lipschitz continuity of PR(θ), it suffices to show that there exists a positive constant L such that, for any θ,θ0 Θ, kPR(θ) PR(θ0)k2 Lkθ θ0k2. Thus, there exists a constant L Lθ + LZσmax(A) such that PR(θ) PR(θ0) 2 L θ θ0 2, θ,θ0 Θ, which proves the L-Lipschitz continuity of PR(θ).