Goto

Collaborating Authors

 mctm


Scalable Learning of Multivariate Distributions via Coresets

arXiv.org Machine Learning

Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistics and machine learning. However, available methods are limited in their ability to handle large-scale data. We address this issue by developing a novel coreset construction for multivariate conditional transformation models (MCTMs) to enhance their scalability and training efficiency. To the best of our knowledge, these are the first coresets for semi-parametric distributional models. Our approach yields substantial data reduction via importance sampling. It ensures with high probability that the log-likelihood remains within multiplicative error bounds of $(1\pm\varepsilon)$ and thereby maintains statistical model accuracy. Compared to conventional full-parametric models, where coresets have been incorporated before, our semi-parametric approach exhibits enhanced adaptability, particularly in scenarios where complex distributions and non-linear relationships are present, but not fully understood. To address numerical problems associated with normalizing logarithmic terms, we follow a geometric approximation based on the convex hull of input data. This ensures feasible, stable, and accurate inference in scenarios involving large amounts of data. Numerical experiments demonstrate substantially improved computational efficiency when handling large and complex datasets, thus laying the foundation for a broad range of applications within the statistics and machine learning communities.



Meta-ComplementingtheSemanticsofShortTextsin NeuralTopicModels

Neural Information Processing Systems

Orthogonal to existing works, we remedy this problem within the corpus itself by proposing a Meta-Complement Topic Model, which improves topic quality of short texts by transferring the semantic knowledge learned on long documents tocomplement semantically limited shorttexts.




A Multilayer Correlated Topic Model

arXiv.org Machine Learning

We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational expectation-maximization (EM) algorithm was derived to estimate the posterior and parameters in MCTM. We introduced two potential applications of MCTM, including the paragraph-level document analysis and market basket data analysis. The effectiveness of MCTM in understanding the document structure has been verified by the great predictive performance on held-out documents and intuitive visualization. We also showed that MCTM could successfully capture customers' popular shopping patterns in the market basket analysis.


zx

#artificialintelligence

Edit: If you want to see MarkovComposer in action, but you don't want to mess with Java code, you can access a web version of it here. In the following article, I'll present some of the research I've been working on lately. Algorithms, or algorithmic composition, have been used to compose music for centuries. For example, Western punctus contra punctum can be sometimes reduced to algorithmic determinacy. Then, why not use fast-learning computers capable of billions of calculations per second to do what they do best, to follow algorithms?