Goto

Collaborating Authors

 loco



LoCo: Local Contrastive Representation Learning

Neural Information Processing Systems

Deep neural nets typically perform end-to-end backpropagation to learn the weights, a procedure that creates synchronization constraints in the weight update step across layers and is not biologically plausible. Recent advances in unsupervised contrastive representation learning invite the question of whether a learning algorithm can also be made local, that is, the updates of lower layers do not directly depend on the computation of upper layers. While Greedy InfoMax separately learns each block with a local objective, we found that it consistently hurts readout accuracy in state-of-the-art unsupervised contrastive learning algorithms, possibly due to the greedy objective as well as gradient isolation. In this work, we discover that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks. This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time. Aside from standard ImageNet experiments, we also show results on complex downstream tasks such as object detection and instance segmentation directly using readout features.



Supplementary material of LoCo: Local Contrastive Representation Learning

Neural Information Processing Systems

In this section we show the block structure of each stage in Progressive ResNet-50 in Table 1. The results are shown in Fig 2. We can see LoCo learns image embedding Last, we show qualitative results of detection and instance segmentation tasks on COCO in Figure 1.



Mathematical Theory of Collinearity Effects on Machine Learning Variable Importance Measures

Bladen, Kelvyn K., Cutler, D. Richard, Wisler, Alan

arXiv.org Machine Learning

In many machine learning problems, understanding variable importance is a central concern. Two common approaches are Permute-and-Predict (PaP), which randomly permutes a feature in a validation set, and Leave-One-Covariate-Out (LOCO), which retrains models after permuting a training feature. Both methods deem a variable important if predictions with the original data substantially outperform those with permutations. In linear regression, empirical studies have linked PaP to regression coefficients and LOCO to $t$-statistics, but a formal theory has been lacking. We derive closed-form expressions for both measures, expressed using square-root transformations. PaP is shown to be proportional to the coefficient and predictor variability: $\text{PaP}_i = β_i \sqrt{2\operatorname{Var}(\mathbf{x}^v_i)}$, while LOCO is proportional to the coefficient but dampened by collinearity (captured by $Δ$): $\text{LOCO}_i = β_i (1 -Δ)\sqrt{1 + c}$. These derivations explain why PaP is largely unaffected by multicollinearity, whereas LOCO is highly sensitive to it. Monte Carlo simulations confirm these findings across varying levels of collinearity. Although derived for linear regression, we also show that these results provide reasonable approximations for models like Random Forests. Overall, this work establishes a theoretical basis for two widely used importance measures, helping analysts understand how they are affected by the true coefficients, dimension, and covariance structure. This work bridges empirical evidence and theory, enhancing the interpretability and application of variable importance measures.


Comparing Model-agnostic Feature Selection Methods through Relative Efficiency

Zheng, Chenghui, Raskutti, Garvesh

arXiv.org Machine Learning

Feature selection and importance estimation in a model-agnostic setting is an ongoing challenge of significant interest. Wrapper methods are commonly used because they are typically model-agnostic, even though they are computationally intensive. In this paper, we focus on feature selection methods related to the Generalized Covariance Measure (GCM) and Leave-One-Covariate-Out (LOCO) estimation, and provide a comparison based on relative efficiency. In particular, we present a theoretical comparison under three model settings: linear models, non-linear additive models, and single index models that mimic a single-layer neural network. We complement this with extensive simulations and real data examples. Our theoretical results, along with empirical findings, demonstrate that GCM-related methods generally outperform LOCO under suitable regularity conditions. Furthermore, we quantify the asymptotic relative efficiency of these approaches. Our simulations and real data analysis include widely used machine learning methods such as neural networks and gradient boosting trees.


LoCo: Learning 3D Location-Consistent Image Features with a Memory-Efficient Ranking Loss

Neural Information Processing Systems

Image feature extractors are rendered substantially more useful if different views of the same 3D location yield similar features while still being distinct from other locations. A feature extractor that achieves this goal even under significant viewpoint changes must recognise not just semantic categories in a scene, but also understand how different objects relate to each other in three dimensions. Existing work addresses this task by posing it as a patch retrieval problem, training the extracted features to facilitate retrieval of all image patches that project from the same 3D location. However, this approach uses a loss formulation that requires substantial memory and computation resources, limiting its applicability for large-scale training. We present a method for memory-efficient learning of location-consistent features that reformulates and approximates the smooth average precision objective.


Review for NeurIPS paper: LoCo: Local Contrastive Representation Learning

Neural Information Processing Systems

Weaknesses: The paper claims in the Abstract that "by overlapping local blocks" (i.e. the first proposed method), it "closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time." However, the presented empirical results can not support the claim. The comparisons with baseline SimCLR in Table-1 are not fair. SimCLR can achieve accuracy of 65.7% without extra layers in the decoder and 67.1% with extra layers according to Table-4. However, Table-1 is comparing SimCLR without extra layers versus the proposed solution with extra layers.


Review for NeurIPS paper: LoCo: Local Contrastive Representation Learning

Neural Information Processing Systems

Reviewers were satisfied by the author's response and clarifications. Discussion phase also contributed to harmonizing their view on the relevance and usefulness of well-working local criteria. As a result, R1 and R5 increased their score. The consensus is that the work is a novel and valuable contribution to research on local un/self-supervised learning criteria, with potential relevance for memory savings and biologically plausible alternatives to backpropagation. The AC agrees and recommends acceptance.