Goto

Collaborating Authors

 latent variable modeling


Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection

arXiv.org Artificial Intelligence

I N T R O D U C T I O N T h e I r a n i a n l e o p a r d ( P a n t h e r a p a rd u s t u l l i a n a), a subspecies of the P ersian leopard, is critically endangered due to illegal poaching, habitat fragmentation, and h u m a n - w i l d l i f e c o n f l i c t. C o n s e r v a t i o n e f f o r t s a r e i n c r e a s i n g l y t u r n i n g t o t e c h n o l o g y f o r i n n o v a t i v e m o n i t o r i n g a n d i n t e r v e n t i o n m e t h o d s . Metric 10 Agents T raining Time (hrs) Memor y Usage (GB) CPU Utilization (%) GPU Utilization (%) T raining Time Increase (%) Memor y Usage Increase (%) 5.2 4.5 65 45 - - 20 Agents 50 Agents 6.3 5.1 75 55 20 15 8.0 6.8 85 70 53 51 T able 4. P ercentage of High-Risk zones Covered by Each Method (Mean std) F igure 3. P oacher Detection R ate Across Episodes. Higher Entropy Indicates More Diverse Exploration T able 5. KL Divergence between Inferred q(z) and Ground T ruth T ask Distribution T h e E M - b a s e d p o l i c y e x h i b i t s a n i n i t i a l l y h i g h e n t r o p y, e n c o u r a g i n g d i v e r s e a c t i o n s a m p l i n g, a n d g r a d u a l l y an n e a l s as th e po l i c y be c o m e s co n f i d e n t . Metric Cooperative Coverage Number of Agents Involved Coverage Efficiency (%) P oa ch er D et ec ti on R at e (%) Collision Incidents 6 85.3 - 0 P oacher Detection Coordination Conflict A voidance 8 - 92.1 0 10 - - 0 It enables conser vationists and security forces to allocate limited resources more effectiv e l y a n d a c t i n r e a l t i m e b a s e d o n a c t i o n a b l e i n t e l l i g e n c e d e r i v e d f r o m a u t o n o m o u s a g e n t s .


Latent Variable Modeling for Robust Causal Effect Estimation

arXiv.org Artificial Intelligence

Latent variable models provide a powerful framework for incorporating and inferring unobserved factors in observational data. In causal inference, they help account for hidden factors influencing treatment or outcome, thereby addressing challenges posed by missing or unmeasured covariates. This paper proposes a new framework that integrates latent variable modeling into the double machine learning (DML) paradigm to enable robust causal effect estimation in the presence of such hidden factors. We consider two scenarios: one where a latent variable affects only the outcome, and another where it may influence both treatment and outcome. To ensure tractability, we incorporate latent variables only in the second stage of DML, separating representation learning from latent inference. We demonstrate the robustness and effectiveness of our method through extensive experiments on both synthetic and real-world datasets.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

High-dimensional neural spike train analysis with generalized count linear dynamical systems This paper describes a general exponential-family model (called the "generalized count" (GC) distribution) for multi-neuron spike count data. The model accounts for both under-dispersed and over-dispersed spike count data, and has Poisson, Negative Binomial, Bernoulli, and several other classic models as special cases. The authors give a clear account of the relationship to other models, and demonstrate the need for a model to capture under-dispersed counts in primate motor cortex. They then describe an efficient method for maximum-likelihood fitting (and demonstrate concavity of the log-likelihood). They derive an efficient variational Bayesian inference method and apply the model to data from primate motor cortex, showing that it accounts more accurately for variance and cross-covariance of spike count data, compared to a model with Poisson observations.


Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation

arXiv.org Artificial Intelligence

Latent variable modeling in non-autoregressive neural machine translation (NAT) is a promising approach to mitigate the multimodality problem. In the previous works, they added an auxiliary model to estimate the posterior distribution of the latent variable conditioned on the source and target sentences. However, it causes several disadvantages, such as redundant information extraction in the latent variable, increasing parameters, and a tendency to ignore a part of the information from the inputs. In this paper, we propose a new latent variable modeling that is based on a dual reconstruction perspective and an advanced hierarchical latent modeling approach. Our proposed method, {\em LadderNMT}, shares a latent space across both languages so that it hypothetically alleviates or solves the above disadvantages. Experimental results quantitatively and qualitatively demonstrate that our proposed latent variable modeling learns an advantageous latent space and significantly improves translation quality in WMT translation tasks.


Information Theoretic Lower Bounds on Negative Log Likelihood

arXiv.org Machine Learning

In this article we use rate-distortion theory, a branch of information theory devoted to the problem of lossy compression, to shed light on an important problem in latent variable modeling of data: is there room to improve the model? One way to address this question is to find an upper bound on the probability (equivalently a lower bound on the negative log likelihood) that the model can assign to some data as one varies the prior and/or the likelihood function in a latent variable model. The core of our contribution is to formally show that the problem of optimizing priors in latent variable models is exactly an instance of the variational optimization problem that information theorists solve when computing rate-distortion functions, and then to use this to derive a lower bound on negative log likelihood. Moreover, we will show that if changing the prior can improve the log likelihood, then there is a way to change the likelihood function instead and attain the same log likelihood, and thus rate-distortion theory is of relevance to both optimizing priors as well as optimizing likelihood functions. We will experimentally argue for the usefulness of quantities derived from rate-distortion theory in latent variable modeling by applying them to a problem in image modeling.


Kullback-Leibler Principal Component for Tensors is not NP-hard

arXiv.org Machine Learning

We study the problem of nonnegative rank-one approximation of a nonnegative tensor, and show that the globally optimal solution that minimizes the generalized Kullback-Leibler divergence can be efficiently obtained, i.e., it is not NP-hard. This result works for arbitrary nonnegative tensors with an arbitrary number of modes (including two, i.e., matrices). We derive a closed-form expression for the KL principal component, which is easy to compute and has an intuitive probabilistic interpretation. For generalized KL approximation with higher ranks, the problem is for the first time shown to be equivalent to multinomial latent variable modeling, and an iterative algorithm is derived that resembles the expectation-maximization algorithm. On the Iris dataset, we showcase how the derived results help us learn the model in an \emph{unsupervised} manner, and obtain strikingly close performance to that from supervised methods.


Latent Variable Modeling with Diversity-Inducing Mutual Angular Regularization

arXiv.org Machine Learning

Latent Variable Models (LVMs) are a large family of machine learning models providing a principled and effective way to extract underlying patterns, structure and knowledge from observed data. Due to the dramatic growth of volume and complexity of data, several new challenges have emerged and cannot be effectively addressed by existing LVMs: (1) How to capture long-tail patterns that carry crucial information when the popularity of patterns is distributed in a power-law fashion? (2) How to reduce model complexity and computational cost without compromising the modeling power of LVMs? (3) How to improve the interpretability and reduce the redundancy of discovered patterns? To addresses the three challenges discussed above, we develop a novel regularization technique for LVMs, which controls the geometry of the latent space during learning to enable the learned latent components of LVMs to be diverse in the sense that they are favored to be mutually different from each other, to accomplish long-tail coverage, low redundancy, and better interpretability. We propose a mutual angular regularizer (MAR) to encourage the components in LVMs to have larger mutual angles. The MAR is non-convex and non-smooth, entailing great challenges for optimization. To cope with this issue, we derive a smooth lower bound of the MAR and optimize the lower bound instead. We show that the monotonicity of the lower bound is closely aligned with the MAR to qualify the lower bound as a desirable surrogate of the MAR. Using neural network (NN) as an instance, we analyze how the MAR affects the generalization performance of NN. On two popular latent variable models --- restricted Boltzmann machine and distance metric learning, we demonstrate that MAR can effectively capture long-tail patterns, reduce model complexity without sacrificing expressivity and improve interpretability.


Rejoinder: Latent variable graphical model selection via convex optimization

arXiv.org Machine Learning

We thank all the discussants for their careful reading of our paper, and for their insightful critiques. We would also like to thank the editors for organizing this discussion. Our paper contributes to the area of high-dimensional statistics which has received much attention over the past several years across the statistics, machine learning and signal processing communities. In this rejoinder we clarify and comment on some of the points raised in the discussions. Finally, we also remark on some interesting challenges that lie ahead in latent variable modeling. Briefly, we considered the problem of latent variable graphical model selection in the Gaussian setting.