tuple
A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning
Hieu, Nong Minh, Ledent, Antoine
Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and independently distributed, an assumption violated in most practical settings where contrastive tuples are constructed from a finite pool of labeled data, inducing dependencies among tuples. While one recent work analyzed this learning setting using U-Statistics to estimate the population risk, the techniques used therein require the risk of each class to concentrate uniformly, making excess risk bounds scale in the order of $ฯ_{\min}^{-{1}/{2}}$ where $ฯ_{\min}$ denotes the probability of the rarest class. Such a dependency can be overly pessimistic in the extreme multiclass settings where there are many tail classes which contribute minimally to the overall population risk. Our contributions are two-fold. Firstly, we improve upon the previous work and prove a bound with a sample complexity of the same order as the number of classes $R$, regardless of the distribution over classes. Furthermore, we formulate a different estimator that captures the concentration of the risk \textit{across classes}, enabling sharper bounds in extreme multi-class learning scenarios, especially where class distributions are long-tailed. Under mild assumptions on the class distributions, the resulting sample complexity is $\mathcal{O}(k)$ where $k$ is the number of samples per tuple.
QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data
Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRLChecklist), a toolbox to highlight TRL models' strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP. Checklist generation is driven by a SQL query engine that crafts tests of different complexity. This design facilitates inherent portability, allowing the checks to be used by alternative models. We also introduce a set of cross-task performance metrics evaluating the TRL model's performance over its output. Finally, we show how QATCH automatically generates tests for proprietary datasets to evaluate various state-of-the-art models including TAPAS, TAPEX, and CHATGPT.
related work
In addition to the work on noisy convex optimization, the current paper is also thematically related to works in learning theory and complexity where the goal is to reconstruct simple classes of functions under outlier noise. This includes work on reconstruction of low-degree polynomials [4, 14, 15]. In particular, [15] gave an efficient algorithm whose error tolerance matches the information theoretic limits. In addition, recently, [9] achieved similar algorithmic guarantees for functions which are sparse in the Fourier space. While similar in spirit, the model in these works differ from the current paper in one crucial way - namely, while we only put a bound on the volume of the outlier locations, they, in addition, assume that the outlier locations are also uniformly distributed in the domain. At a more technical level, the results in [4, 14, 15, 9] crucially rely on techniques originating from coding theory such as the Goldreich-Levin theorem [13] and the Berlekamp-Welch algorithm [6].
Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman
Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by k-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this line of research. In particular, (1) k-WL/FWL requires at least O(nk)space complexity, which is impractical for large graphs even when k = 3; (2) The design space of k-WL/FWL is rigid, with the only adjustable hyper-parameter being k. To tackle the first limitation, we propose an extension, (k,t)-FWL. We theoretically prove that even if we fix the space complexity to O(nk) (for any k 2) in (k,t)-FWL, we can construct an expressiveness hierarchy up to solving the graph isomorphism problem. To tackle the second problem, we propose k-FWL+, which considers any equivariant set as neighbors instead of all nodes, thereby greatly expanding the design space of k-FWL.
CNN+ RPNClassificationRegressionMaskInput ImageLong-Tailed Object DetectionPost-Processing CalibrationBulldozerSchool busTruckBulldozerSchool busTruckNORCAL
Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or reweighting. In this paper, we investigate a largely overlooked approach -- postprocessing calibration of confidence scores. We propose NORCAL, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NORCAL can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach. Our code is publicly available at https://github.com/tydpan/NorCal.
Power-Law Spectrum of the Random Feature Model
Paquette, Elliot, Xiao, Ke Liang, Zhu, Yizhe
Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x \sim N(0,H)\in \mathbb{R}^v$ where $H$ has $ฮฑ$-power-law spectrum ($ฮป_j(H ) \asymp j^{-ฮฑ}$, $ฮฑ> 1$), a Gaussian sketch matrix $W \in \mathbb{R}^{v\times d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$. We prove matching upper and lower bounds: for all $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $\left(\log^{p-1}(j+1)/j\right)^ฮฑ$. For $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$, the $j$-th eigenvalue is of order $j^{-ฮฑ}$ up to a polylog factor. That is, the power-law exponent $ฮฑ$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.
Title
A common approach to create more expressive GNNs is to change the message passing function of MPNNs. If a GNN is more expressive than MPNNs by adapting the message passing function, we call this non-standard message passing . Examples of this are message passing variants that operate on subgraphs [Frasca et al., 2022, Bevilacqua