Goto

Collaborating Authors

 eqn


A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning

arXiv.org Machine Learning

Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and independently distributed, an assumption violated in most practical settings where contrastive tuples are constructed from a finite pool of labeled data, inducing dependencies among tuples. While one recent work analyzed this learning setting using U-Statistics to estimate the population risk, the techniques used therein require the risk of each class to concentrate uniformly, making excess risk bounds scale in the order of $ฯ_{\min}^{-{1}/{2}}$ where $ฯ_{\min}$ denotes the probability of the rarest class. Such a dependency can be overly pessimistic in the extreme multiclass settings where there are many tail classes which contribute minimally to the overall population risk. Our contributions are two-fold. Firstly, we improve upon the previous work and prove a bound with a sample complexity of the same order as the number of classes $R$, regardless of the distribution over classes. Furthermore, we formulate a different estimator that captures the concentration of the risk \textit{across classes}, enabling sharper bounds in extreme multi-class learning scenarios, especially where class distributions are long-tailed. Under mild assumptions on the class distributions, the resulting sample complexity is $\mathcal{O}(k)$ where $k$ is the number of samples per tuple.


On Convergence of Polynomial Approximations to the Gaussian Mixture Entropy

Neural Information Processing Systems

Gaussian mixture models (GMMs) are fundamental to machine learning due to their flexibility as approximating densities. However, uncertainty quantification of GMMs remains a challenge as differential entropy lacks a closed form. This paper explores polynomial approximations, specifically Taylor and Legendre, to the GMM entropy from a theoretical and practical perspective. We provide new analysis of a widely used approach due to Huber et al. (2008) and show that the series diverges under simple conditions. Motivated by this divergence we provide a novel Taylor series that is provably convergent to the true entropy of any GMM.



Triple Eagle: Simple, Fast and Practical Budget-Feasible Mechanisms

Neural Information Processing Systems

We revisit the classical problem of designing Budget-Feasible Mechanisms (BFMs) for submodular valuation functions, which has been extensively studied since the seminal paper of Singer [FOCS'10] due to its wide applications in crowdsourcing and social marketing. We propose TripleEagle, a novel algorithmic framework for designing BFMs, based on which we present several simple yet effective BFMs that achieve better approximation ratios than the state-of-the-art work for both monotone and non-monotone submodular valuation functions. Moreover, our BFMs are the first in the literature to achieve linear complexities while ensuring obvious strategyproofness, making them more practical than the previous BFMs. We conduct extensive experiments to evaluate the empirical performance of our BFMs, and the experimental results strongly demonstrate the efficiency and effectiveness of our approach.



Efficient Equivariant Network Supplementary Materials AMNIST-rot Model Architecture

Neural Information Processing Systems

Please refer to Table 5. Table 5: Architecture of E4-Net on Mnist-rot classification, p means dropout rate. The hyperparameters we use in this architecture are kernel size k = 5, reduction ratio r = 1, and the number of slices s = 2. In the large model, we increase the channel dimension to 24, the number of slices to 12, the reduction ratio to 2, and keep other hyperparameters the same. We take ResNet-18 [2], which is composed of an initial convolution layer, followed by 4 stage Res-Blocks and one final classification layer.


10 Supplementary Material for the paper LeadCache Regret Optimal Caching in Networks by and

Neural Information Processing Systems

Following Cohen and Hazan [2015] we derive a general expression for the regret upper bound applicable to any linear reward function under an anytime FTPL policy. This is accomplished in the following steps. First, we extend the argument of Cohen and Hazan [2015] to the anytime setting. Then, we specialize this bound to our problem setting. Recall the notations used in the paper - the aggregate file-request sequence from all users is denoted by {xt}t 1 and the virtual cache configuration sequence is denoted by {zt}t 1. Define the cumulative requests up to time tas: Xt = Furthermore, since the max function 14 is convex, we may interchange the expectation and gradient to obtain ฮฆฮทt(Xt) =E(zt) [Bertsekas, 1973, Proposition 2.2]. Plugging in the expression of the inner product from Eqn. (25) in expression (26), we obtain: Bounding the term (a): Next, to upper bound the expected regret, we control term (a) in inequality (28).




5 Supplementary Material

Neural Information Processing Systems

Dendritic updates Complete versions of the dendritic update rules (summarised in Eqns (2) & (3)) are given below. This is valid in our regime where the environmental latent updates slowly compared to neural timescales. The notation we're using admits the possible presence of biases as well as the weights (though biases typically aren't used) by assuming a row of constant 1's could be added to the synaptic inputs effectively absorbing a bias into the weight matrix without loss of generality, for example wgB p(t) wgB p(t)+ bgB . Somatic updates Somatic updates rules (Eqns (4) & (5)) and are repeated here for completeness: p(t)= (t)pB(t)+(1 (t))pA(t) g(t)= (t)gB(t)+(1 (t))gA(t). Update ordering For this hierarchical network of multicompartmental neurons we must specify the order in which we perform these discrete updates to the different layers and the different compartments within these layers.