Goto

Collaborating Authors

 nce




Review for NeurIPS paper: Noise-Contrastive Estimation for Multivariate Point Processes

Neural Information Processing Systems

The paper derives a new estimation method for multi-variate point processes that is based on the'ranking'-variant of NCE. The paper is borderline: two reviewers think that the difference to previous work by Gao (who use NCE to estimate point-processes) and the empirical comparison is not sufficient. Two other reviewers disagree, with one in particular arguing that the paper should be accepted. The meta-reviewer thinks that the theory in the paper is sufficiently different from Gao's work, and that the theoretical aspects of the paper are deeper and more rigorous. The results do not follow directly from previous work by Gutmann & Hyvarinen (2012) or Ma & Collins (2018). The empirical results are good and the method should be useful in practice.


How to Train Your Energy-Based Model for Regression

Gustafsson, Fredrik K., Danelljan, Martin, Timofte, Radu, Schön, Thomas B.

arXiv.org Machine Learning

Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been explored for generative modeling, the application of EBMs to regression is not a well-studied problem. How EBMs should be trained for best possible regression performance is thus currently unclear. We therefore accept the task of providing the first detailed study of this problem. To that end, we propose a simple yet highly effective extension of noise contrastive estimation, and carefully compare its performance to six popular methods from literature on the tasks of 1D regression and object detection. The results of this comparison suggest that our training method should be considered the go-to approach. We also apply our method to the visual tracking task, achieving state-of-the-art performance on five datasets. Notably, our tracker achieves 63.7% AUC on LaSOT and 78.7% Success on TrackingNet. Code is available at https://github.com/fregu856/ebms_regression.


Estimation of Non-Normalized Mixture Models and Clustering Using Deep Representation

Matsuda, Takeru, Hyvarinen, Aapo

arXiv.org Machine Learning

We develop a general method for estimating a finite mixture of non-normalized models. Here, a non-normalized model is defined to be a parametric distribution with an intractable normalization constant. Existing methods for estimating non-normalized models without computing the normalization constant are not applicable to mixture models because they contain more than one intractable normalization constant. The proposed method is derived by extending noise contrastive estimation (NCE), which estimates non-normalized models by discriminating between the observed data and some artificially generated noise. We also propose an extension of NCE with multiple noise distributions. Then, based on the observation that conventional classification learning with neural networks is implicitly assuming an exponential family as a generative model, we introduce a method for clustering unlabeled data by estimating a finite mixture of distributions in an exponential family. Estimation of this mixture model is attained by the proposed extensions of NCE where the training data of neural networks are used as noise. Thus, the proposed method provides a probabilistically principled clustering method that is able to utilize a deep representation. Application to image clustering using a deep neural network gives promising results.


Improving Language Modelling with Noise Contrastive Estimation

Liza, Farhana Ferdousi (University of Kent, UK) | Grzes, Marek (University of Kent, UK)

AAAI Conferences

Neural language models do not scale well when the vocabulary is large. Noise contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, its full potential has not been demonstrated in the language modelling literature. A sufficient investigation of the hyperparameters in the NCE-based neural language models was clearly missing. In this paper, we showed that NCE can be a very successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the `search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. Using a popular benchmark, we showed that appropriate tuning of NCE in neural language models outperforms the state-of-the-art single-model methods based on standard dropout and the standard LSTM recurrent neural networks.