loglikelihood
ModelSelectionforBayesianAutoencoders: SupplementaryMaterial
In this section, we review some key results on the Wasserstein distance. Wpp Rπ(t,θi),Rρ(t,θi), (4) where the approximation comes from using Monte-Carlo integration by samplingθi uniformly in SD 1 [2]. M,M is the number of points used to approximate the integral. Calculating the Wasserstein distance with the empirical distribution function is computationally attractive. To do that, we first sortxms in an ascending order, such thatxi[m] xi[m+1], where i[m]istheindexofthesortedxms. Hamiltonian Monte Carlo (HMC)[24]isahighly-efficient MarkovChain Monte Carlo (MCMC) method used to generate samples from the posteriorw p(w|y).
- Europe > France (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (8 more...)
A Latent-Variable Formulation of the Poisson Canonical Polyadic Tensor Model: Maximum Likelihood Estimation and Fisher Information
Llosa-Vite, Carlos, Dunlavy, Daniel M., Lehoucq, Richard B., López, Oscar, Prasadan, Arvind
We establish parameter inference for the Poisson canonical polyadic (PCP) tensor model through a latent-variable formulation. Our approach exploits the observation that any random PCP tensor can be derived by marginalizing an unobservable random tensor of one dimension larger. The loglikelihood of this larger dimensional tensor, referred to as the "complete" loglikelihood, is comprised of multiple rank one PCP loglikelihoods. Using this methodology, we first derive non-iterative maximum likelihood estimators for the PCP model and demonstrate that several existing algorithms for fitting non-negative matrix and tensor factorizations are Expectation-Maximization algorithms. Next, we derive the observed and expected Fisher information matrices for the PCP model. The Fisher information provides us crucial insights into the well-posedness of the tensor model, such as the role that tensor rank plays in identifiability and indeterminacy. For the special case of rank one PCP models, we demonstrate that these results are greatly simplified.
- Africa > Senegal > Kolda Region > Kolda (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Lessons from the Trenches on Reproducible Evaluation of Language Models
Biderman, Stella, Schoelkopf, Hailey, Sutawika, Lintang, Gao, Leo, Tow, Jonathan, Abbasi, Baber, Aji, Alham Fikri, Ammanamanchi, Pawan Sasanka, Black, Sidney, Clive, Jordan, DiPofi, Anthony, Etxaniz, Julen, Fattori, Benjamin, Forde, Jessica Zosa, Foster, Charles, Hsu, Jeffrey, Jaiswal, Mimansa, Lee, Wilson Y., Li, Haonan, Lovering, Charles, Muennighoff, Niklas, Pavlick, Ellie, Phang, Jason, Skowron, Aviya, Tan, Samson, Tang, Xiangru, Wang, Kevin A., Winata, Genta Indra, Yvon, François, Zou, Andy
Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers. First, we provide an overview of common challenges faced in language model evaluation. Second, we delineate best practices for addressing or lessening the impact of these challenges on research. Third, we present the Language Model Evaluation Harness (lm-eval): an open source library for independent, reproducible, and extensible evaluation of language models that seeks to address these issues. We describe the features of the library as well as case studies in which the library has been used to alleviate these methodological concerns.
- Asia > Middle East > Jordan (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (13 more...)
- Overview (0.88)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Health & Medicine (0.67)
- Education (0.46)
Optimal Training of Mean Variance Estimation Neural Networks
Sluijterman, Laurens, Cator, Eric, Heskes, Tom
This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In our paper, we present two significant insights. Firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the simple yet often overlooked recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized with a fixed variance. We demonstrate the effectiveness of this step through experimentation, highlighting that it should be standard practice. As a sidenote, we examine whether, after the warm-up, it is beneficial to fix the mean while optimizing the variance or to optimize both simultaneously. Here, we do not observe a substantial difference. Secondly, we introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
Hawkes Process Based on Controlled Differential Equations
Jo, Minju, Kook, Seungji, Park, Noseong
Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics, but also ii) resort to heuristics to calculate the log-likelihood of events since they are mostly based on neural networks designed for regular discrete inputs. To this end, we present the concept of Hawkes process based on controlled differential equations (HP-CDE), by adopting the neural controlled differential equation (neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDE continuously reads data, i) irregular time-series datasets can be properly treated preserving their uneven temporal spaces, and ii) the log-likelihood can be exactly computed. Moreover, as both Hawkes processes and neural CDEs are first developed to model complicated human behavioral dynamics, neural CDE-based Hawkes processes are successful in modeling such occurrence dynamics. In our experiments with 4 real-world datasets, our method outperforms existing methods by non-trivial margins.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Parametrization Cookbook: A set of Bijective Parametrizations for using Machine Learning methods in Statistical Inference
We present in this paper a way to transform a constrained statistical inference problem into an unconstrained one in order to be able to use modern computational methods, such as those based on automatic differentiation, GPU computing, stochastic gradients with mini-batch. Unlike the parametrizations classically used in Machine Learning, the parametrizations introduced here are all bijective and are even diffeomorphisms, thus allowing to keep the important properties from a statistical inference point of view, first of all identifiability. This cookbook presents a set of recipes to use to transform a constrained problem into a unconstrained one. For an easy use of parametrizations, this paper is at the same time a cookbook, and a Python package allowing the use of parametrizations with numpy, but also JAX and PyTorch, as well as a high level and expressive interface allowing to easily describe a parametrization to transform a difficult problem of statistical inference into an easier problem addressable with modern optimization tools.
- North America > United States (0.14)
- Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
Elliptically-Contoured Tensor-variate Distributions with Application to Improved Image Learning
Llosa-Vite, Carlos, Maitra, Ranjan
Statistical analysis of tensor-valued data has largely used the tensor-variate normal (TVN) distribution that may be inadequate when data comes from distributions with heavier or lighter tails. We study a general family of elliptically contoured (EC) tensor-variate distributions and derive its characterizations, moments, marginal and conditional distributions, and the EC Wishart distribution. We describe procedures for maximum likelihood estimation from data that are (1) uncorrelated draws from an EC distribution, (2) from a scale mixture of the TVN distribution, and (3) from an underlying but unknown EC distribution, where we extend Tyler's robust estimator. A detailed simulation study highlights the benefits of choosing an EC distribution over the TVN for heavier-tailed data. We develop tensor-variate classification rules using discriminant analysis and EC errors and show that they better predict cats and dogs from images in the Animal Faces-HQ dataset than the TVN-based rules. A novel tensor-on-tensor regression and tensor-variate analysis of variance (TANOVA) framework under EC errors is also demonstrated to better characterize gender, age and ethnic origin than the usual TVN-based TANOVA in the celebrated Labeled Faces of the Wild dataset.
- North America > United States > New York (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (3 more...)
How to Evaluate Uncertainty Estimates in Machine Learning for Regression?
Sluijterman, Laurens, Cator, Eric, Heskes, Tom
As neural networks become more popular, the need for accompanying uncertainty estimates increases. The current testing methodology focusses on how good the predictive uncertainty estimates explain the differences between predictions and observations in a previously unseen test set. Intuitively this is a logical approach. The current setup of benchmark data sets also allows easy comparison between the different methods. We demonstrate, however, through both theoretical arguments and simulations that this way of evaluating the quality of uncertainty estimates has serious flaws. Firstly, it cannot disentangle the aleatoric from the epistemic uncertainty. Secondly, the current methodology considers the uncertainty averaged over all test samples, implicitly averaging out overconfident and underconfident predictions. When checking if the correct fraction of test points falls inside prediction intervals, a good score on average gives no guarantee that the intervals are sensible for individual points. We demonstrate through practical examples that these effects can result in favoring a method, based on the predictive uncertainty, that has undesirable behaviour of the confidence intervals. Finally, we propose a simulation-based testing approach that addresses these problems while still allowing easy comparison between different methods.
Sentiment Analysis using Logistic Regression and Naive Bayes
In supervised machine learning, you usually have an input X, which goes into your prediction function to get your Y . You can then compare your prediction with the true value Y. This gives you your cost which you use to update the parameters θ. Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. So, let's start sentiment analysis using Logistic Regression We will be using the sample twitter data set for this exercise.
- Research Report > New Finding (0.63)
- Research Report > Experimental Study (0.63)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Adaptively stacking ensembles for influenza forecasting with incomplete data
McAndrew, Thomas, Reich, Nicholas G.
Seasonal influenza infects between 10 and 50 million people in the United States every year, overburdening hospitals during weeks of peak incidence. Named by the CDC as an important tool to fight the damaging effects of these epidemics, accurate forecasts of influenza and influenza-like illness (ILI) forewarn public health officials about when, and where, seasonal influenza outbreaks will hit hardest. Multi-model ensemble forecasts---weighted combinations of component models---have shown positive results in forecasting. Ensemble forecasts of influenza outbreaks have been static, training on all past ILI data at the beginning of a season, generating a set of optimal weights for each model in the ensemble, and keeping the weights constant. We propose an adaptive ensemble forecast that (i) changes model weights week-by-week throughout the influenza season, (ii) only needs the current influenza season's data to make predictions, and (iii) by introducing a prior distribution, shrinks weights toward the reference equal weighting approach and adjusts for observed ILI percentages that are subject to future revisions. We investigate the prior's ability to impact adaptive ensemble performance and, after finding an optimal prior via a cross-validation approach, compare our adaptive ensemble's performance to equal-weighted and static ensembles. Applied to forecasts of short-term ILI incidence at the regional and national level in the US, our adaptive model outperforms a naive equal-weighted ensemble, and has similar or better performance to the static ensemble, which requires multiple years of training data. Adaptive ensembles are able to quickly train and forecast during epidemics, and provide a practical tool to public health officials looking for forecasts that can conform to unique features of a specific season.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada > Ontario (0.04)