maximum likelihood estimation
Multi-view Matrix Factorization for Linear Dynamical System Estimation
We consider maximum likelihood estimation of linear dynamical systems with generalized-linear observation models. Maximum likelihood is typically considered to be hard in this setting since latent states and transition parameters must be inferred jointly. Given that expectation-maximization does not scale and is prone to local minima, moment-matching approaches from the subspace identification literature have become standard, despite known statistical efficiency issues. In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters. Key to the approach is a two-view reformulation of maximum likelihood estimation for linear dynamical systems that enables the use of global optimization algorithms for matrix factorization. We show that the proposed estimation strategy outperforms widely-used identification algorithms such as subspace identification methods, both in terms of accuracy and runtime.
On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis
Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.\Despite
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.31)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.30)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)
Adjusting for Autocorrelated Errors in Neural Networks for Time Series
An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this paper, in order to adjust for autocorrelated errors, we propose to learn the autocorrelation coefficient jointly with the model parameters. In our experiments, we verify the effectiveness of our approach on time series forecasting. Results across a wide range of real-world datasets with various state-of-the-art models show that our method enhances performance in almost all cases. Based on these results, we suggest empirical critical values to determine the severity of autocorrelated errors. We also analyze several aspects of our method to demonstrate its advantages. Finally, other time series tasks are also considered to validate that our method is not restricted to only forecasting.
Minimization of Functions on Dually Flat Spaces Using Geodesic Descent Based on Dual Connections
We propose geodesic-based optimization methods on dually flat spaces, where the geometric structure of the parameter manifold is closely related to the form of the objective function. A primary application is maximum likelihood estimation in statistical models, especially exponential families, whose model manifolds are dually flat. We show that an m-geodesic update, which directly optimizes the log-likelihood, can theoretically reach the maximum likelihood estimator in a single step. In contrast, an e-geodesic update has a practical advantage in cases where the parameter space is geodesically complete, allowing optimization without explicitly handling parameter constraints. We establish the theoretical properties of the proposed methods and validate their effectiveness through numerical experiments.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.77)
Multi-view Matrix Factorization for Linear Dynamical System Estimation
We consider maximum likelihood estimation of linear dynamical systems with generalized-linear observation models. Maximum likelihood is typically considered to be hard in this setting since latent states and transition parameters must be inferred jointly. Given that expectation-maximization does not scale and is prone to local minima, moment-matching approaches from the subspace identification literature have become standard, despite known statistical efficiency issues. In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters. Key to the approach is a two-view reformulation of maximum likelihood estimation for linear dynamical systems that enables the use of global optimization algorithms for matrix factorization. We show that the proposed estimation strategy outperforms widely-used identification algorithms such as subspace identification methods, both in terms of accuracy and runtime.
- Asia > China > Hong Kong (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.31)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Nevada (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Deriving the Scaled-Dot-Function via Maximum Likelihood Estimation and Maximum Entropy Approach
In this paper, we present a maximum likelihood estimation approach to determine the value vector in transformer models. We model the sequence of value vectors, key vectors, and the query vector as a sequence of Gaussian distributions. The variance in each Gaussian distribution depends on the time step, the corresponding key vector, and the query vector. The mean value in each Gaussian distribution depends on the time step, and the corresponding value vector. This analysis may offer a new explanation of the scaled-dot-product function or softmax function used in transformer architectures [1]. Another explanation, inspired by [4], is based on the maximum entropy approach in natural language processing [5]. In this approach, a query vector and key vectors are used to derive the feature functions for the maximum entropy model.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
Appendix A Experiments In this section, we demonstrate our maximum likelihood estimation (8)
In the first experiment, we compare the performance of the aforementioned four alternatives. As we generated instances to satisfy the full-rank condition, i.e., Assumption E.1, or if random perturbations are applied to the underlying LMAB model [8]). (recall Figure 3). Multi-Armed Bandits problem that has been extensively studied in literature ( e.g., see [ When the time-horizon is sufficiently long but finite e.g., if Regime switching bandits LMAB may be also seen as a special type of adversarial or non-stationary bandits ( e.g., [ The standard objective in non-stationary bandits is to find the best stationary policy in hindsight with unlimited possible contexts. We focus on significantly more general cases where there is no obvious way of clustering observations, e.g., when Note that this could still be in H A regime with large number of actions A .
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.88)