Goto

Collaborating Authors

 information distance



Robustifying Algorithms of Learning Latent Trees with Vector Variables

Neural Information Processing Systems

We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) without the assumption that the effective depth is bounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization in CLRG greatly reduces the sample complexity of RG from being exponential in the diameter of the tree to only logarithmic in the diameter for the hidden Markov model (HMM).


RobustifyingAlgorithmsofLearningLatentTrees withVectorVariables

Neural Information Processing Systems

We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG)and Chow-Liu Recursive Grouping (CLRG)without theassumption thattheeffectivedepth isbounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization inCLRG greatly reduces the sample complexity ofRG from being exponential in the diameter of the tree to onlylogarithmic inthediameter forthehidden Markovmodel (HMM).


Approximating Human-Like Few-shot Learning with GPT-based Compression

arXiv.org Artificial Intelligence

In this work, we conceptualize the learning process as information compression. We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference. We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity, with the aim of estimating the optimal Information Distance for few-shot learning. We first propose using GPT as a prior for lossless text compression, achieving a noteworthy compression ratio. Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on enwik9. We justify the pre-training objective of GPT models by demonstrating its equivalence to the compression length, and, consequently, its ability to approximate the information distance for texts. Leveraging the approximated information distance, our method allows the direct application of GPT models in quantitative text similarity measurements. Experiment results show that our method overall achieves superior performance compared to embedding and prompt baselines on challenging NLP tasks, including semantic similarity, zero and one-shot text classification, and zero-shot text ranking.


Monitoring the Dynamic Networks of Stock Returns

arXiv.org Artificial Intelligence

In this paper, we study the connection between the companies in the Swedish capital market. We consider 28 companies included in the determination of the market index OMX30. The network structure of the market is constructed using different methods to determine the distance between the companies. We use hierarchical clustering methods to find the relation among the companies in each window. Next, we obtain one-dimensional time series of the distances between the clustering trees that reflect the changes in the relationship between the companies in the market over time. The method of statistical process control, namely the Shewhart control chart, is applied to those time series to detect abnormal changes in the financial market.


Robustifying Algorithms of Learning Latent Trees with Vector Variables

arXiv.org Machine Learning

We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) without the assumption that the effective depth is bounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization in CLRG greatly reduces the sample complexity of RG from being exponential in the diameter of the tree to only logarithmic in the diameter for the hidden Markov model (HMM). Second, we robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product. These robustified algorithms can tolerate a number of corruptions up to the square root of the number of clean samples. Finally, we derive the first known instance-dependent impossibility result for structure learning of latent trees. The optimalities of the robust version of CLRG and NJ are verified by comparing their sample complexities and the impossibility result.


On Information (pseudo) Metric

arXiv.org Machine Learning

This short note revisit information metric, underlining that it is a pseudo metric on manifolds of observables (random variables), rather than as usual on probability laws. Geodesics are characterized in terms of their boundaries and conditional independence condition. Pythagorean theorem is given, providing in special case potentially interesting natural integer triplets. This metric is computed for illustration on Diabetes dataset using infotopo package.


MIM-Based Generative Adversarial Networks and Its Application on Anomaly Detection

arXiv.org Machine Learning

In terms of Generative Adversarial Networks (GANs), the information metric to discriminate the generative data and the real data, lies in the key point of generation efficiency, which plays an important role in GAN-based applications, especially in anomaly detection. As for the original GAN, the information metric based on Kullback-Leibler (KL) divergence has limitations on rare events generation and training performance for adversarial networks. Therefore, it is significant to investigate the metrics used in GANs to improve the generation ability as well as bring gains in the training process. In this paper, we adopt the exponential form, referred from the Message Importance Measure (MIM), to replace the logarithm form of the original GAN. This approach named MIM-based GAN, has dominant performance on training process and rare events generation. Specifically, we first discuss the characteristics of training process in this approach. Moreover, we also analyze its advantages on generating rare events in theory. In addition, we do simulations on the datasets of MNIST and ODDS to see that the MIM-based GAN achieves state-of-the-art performance on anomaly detection compared with some classical GANs.


Visualizing High Dimensional Dynamical Processes

arXiv.org Machine Learning

Manifold learning techniques for dynamical systems and time series have shown their utility for a broad spectrum of applications in recent years. While these methods are effective at learning a low-dimensional representation, they are often insufficient for visualizing the global and local structure of the data. In this paper, we present DIG (Dynamical Information Geometry), a visualization method for multivariate time series data that extracts an information geometry from a diffusion framework. Specifically, we implement a novel group of distances in the context of diffusion operators, which may be useful to reveal structure in the data that may not be accessible by the commonly used diffusion distances. Finally, we present a case study applying our visualization tool to EEG data to visualize sleep stages.


Scalable Latent Tree Model and its Application to Health Analytics

arXiv.org Machine Learning

Latent tree graphical models are a popular class of latent variable models, where a probability distribution involving observed and hidden variables are Markovian on a tree. Due to the fact that structure of (observable and hidden) variable interactions are approximated as a tree, inference on latent trees can be carried out exactly through a simple belief propagation [Pea88]. Therefore, latent tree graphical models present a good tradeoff between model accuracy and computational complexity. They are applicable in many domains, where it is natural to expect hierarchical or sequential relationships among the variables (through a hidden-Markov model). For instance, latent tree models have been employed for phylogenetic reconstruction [DEKM99], object recognition [CTW12a, CTW12b] and human pose estimation [WL13]. In this paper, we use latent tree model for discovering a hierarchy among diseases based on comorbidities exhibited in patients' health records, i.e. co-occurrences of diseases in patients. In particular, two large healthcare datasets of 30K and 1.6M patients are used to build the latent disease trees, where clinically meaningful disease clusters are identified as shown in fig 3 and 4. The task of learning a latent tree models consists of two parts: learning the tree structure, and learning the parameters of the tree. There exist many challenges which prohibit efficient or guaranteed learning of the latent tree graphical model, which will be addressed in this paper: 1. The location and the number of latent variables are hidden and the marginalized graph over the observable variables no longer conforms to a tree structure.