npn
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Secaucus (0.04)
- (3 more...)
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
Wei, Xiyuan, Lin, Chih-Jen, Yang, Tianbao
Accurately estimating the normalization term (also known as the partition function) in the contrastive loss is a central challenge for training Contrastive Language-Image Pre-training (CLIP) models. Conventional methods rely on large batches for approximation, demanding substantial computational resources. To mitigate this issue, prior works introduced per-sample normalizer estimators, which are updated at each epoch in a blockwise coordinate manner to keep track of updated encoders. To overcome this limitation, we propose NeuCLIP, a novel and elegant optimization framework based on two key ideas: (i) reformulating the contrastive loss for each sample via convex analysis into a minimization problem with an auxiliary variable representing its log-normalizer; and (ii) transforming the resulting minimization over n auxiliary variables (where n is the dataset size) via variational analysis into the minimization over a compact neural network that predicts the log-normalizers. We design an alternating optimization algorithm that jointly trains the CLIP model and the auxiliary network. By employing a tailored architecture and acceleration techniques for the auxiliary network, NeuCLIP achieves more accurate normalizer estimation, leading to improved performance compared with previous methods. Extensive experiments on large-scale CLIP training, spanning datasets from millions to billions of samples, demonstrate that NeuCLIP outperforms previous methods. Since its introduction, Contrastive Language-Image Pretraining (CLIP) (Radford et al., 2021) has emerged as the de facto standard for vision-language representation learning.
- Europe > Switzerland (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- Asia > Taiwan (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
BabyLM's First Constructions: Causal probing provides a signal of learning
Rozner, Joshua, Weissweiler, Leonie, Shain, Cory
Construction grammar posits that language learners acquire constructions (form-meaning pairings) from the statistics of their environment. Recent work supports this hypothesis by showing sensitivity to constructions in pretrained language models (PLMs), including one recent study (Rozner et al., 2025) demonstrating that constructions shape RoBERTa's output distribution. However, models under study have generally been trained on developmentally implausible amounts of data, casting doubt on their relevance to human language learning. Here we use Rozner et al.'s methods to evaluate construction learning in masked language models from the 2024 BabyLM Challenge. Our results show that even when trained on developmentally plausible quantities of data, models learn diverse constructions, even hard cases that are superficially indistinguishable. We further find correlational evidence that constructional performance may be functionally relevant: models that better represent construction perform better on the BabyLM benchmarks.
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (14 more...)
Reviews: Natural-Parameter Networks: A Class of Probabilistic Neural Networks
The paper presents a novel and potentially impactful way of learning uncertainty over model parameters. The derivation of novel activation functions for which first and second moments are computable in closed forms (for distributions in the exponential family) appears to be the main (novel) contribution, as this is what allows forward propagation of exponential distributions in the network, and learning of their parameters via backprop. The work does bear some resemblance to earlier work on "Implicit Variance Networks" Bayer et al. which ought to be discussed. On a technical level, the method appears to be effective and the authors empirically verify that: (1) the method is robust to overfitting (2) predictive uncertainty is well calibrated and (3) that propagating distributions over latent states can outperform deterministic methods (e.g. The fact that these second order representations outperform those of VAE is somewhat more surprising and may warrants further experimentation: this would imply that the approximation used by the VAE at inference, is worse than the approximation made by NPN that each layer's activation belongs to the exponential family.
Natural-Parameter Networks: A Class of Probabilistic Neural Networks
Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Secaucus (0.04)
- (3 more...)
Neural Plasticity Networks
Neural plasticity is an important functionality of human brain, in which number of neurons and synapses can shrink or expand in response to stimuli throughout the span of life. We model this dynamic learning process as an $L_0$-norm regularized binary optimization problem, in which each unit of a neural network (e.g., weight, neuron or channel, etc.) is attached with a stochastic binary gate, whose parameters determine the level of activity of a unit in the network. At the beginning, only a small portion of binary gates (therefore the corresponding neurons) are activated, while the remaining neurons are in a hibernation mode. As the learning proceeds, some neurons might be activated or deactivated if doing so can be justified by the cost-benefit tradeoff measured by the $L_0$-norm regularized objective. As the training gets mature, the probability of transition between activation and deactivation will diminish until a final hardening stage. We demonstrate that all of these learning dynamics can be modulated by a single parameter $k$ seamlessly. Our neural plasticity network (NPN) can prune or expand a network depending on the initial capacity of network provided by the user; it also unifies dropout (when $k=0$), traditional training of DNNs (when $k=\infty$) and interpolates between these two. To the best of our knowledge, this is the first learning framework that unifies network sparsification and network expansion in an end-to-end training pipeline. Extensive experiments on synthetic dataset and multiple image classification benchmarks demonstrate the superior performance of NPN. We show that both network sparsification and network expansion can yield compact models of similar architectures and of similar predictive accuracies that are close to or sometimes even higher than baseline networks. We plan to release our code to facilitate the research in this area.
Natural-Parameter Networks: A Class of Probabilistic Neural Networks
Wang, Hao, SHI, Xingjian, Yeung, Dit-Yan
Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunatelyin applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming ofNN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models.To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponential-family distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation beforeproducing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as second-order representations for the associated tasks such as link prediction. Experiments on real-world datasets show that NPN can achieve state-of-the-art performance.
- North America > United States (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Spain (0.14)
Natural-Parameter Networks: A Class of Probabilistic Neural Networks
Wang, Hao, Shi, Xingjian, Yeung, Dit-Yan
Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponential-family distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as second-order representations for the associated tasks such as link prediction. Experiments on real-world datasets show that NPN can achieve state-of-the-art performance.
- North America > United States (0.67)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Spain (0.14)
Stochastic Heavy Ball
Gadat, Sébastien, Panloup, Fabien, Saadane, Sofiane
This paper deals with a natural stochastic optimization procedure derived from the so-called Heavy-ball method differential equation, which was introduced by Polyak in the 1960s with his seminal contribution [Pol64]. The Heavy-ball method is a second-order dynamics that was investigated to minimize convex functions f . The family of second-order methods recently received a large amount of attention, until the famous contribution of Nesterov [Nes83], leading to the explosion of large-scale optimization problems. This work provides an in-depth description of the stochastic heavy-ball method, which is an adaptation of the deterministic one when only unbiased evalutions of the gradient are available and used throughout the iterations of the algorithm. We first describe some almost sure convergence results in the case of general non-convex coercive functions f . We then examine the situation of convex and strongly convex potentials and derive some non-asymptotic results about the stochastic heavy-ball method. We end our study with limit theorems on several rescaled algorithms.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > New York (0.04)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- (7 more...)