Goto

Collaborating Authors

 Bayesian Inference


An Enhanced Model-based Approach for Short Text Clustering

arXiv.org Artificial Intelligence

--Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook. Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches. This task is inherently challenging due to the sparse, large-scale, and high-dimensional characteristics of the short text data. Furthermore, the computational intensity required by representation learning significantly increases the running time. T o address these issues, we propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts while identifying representative words for each cluster . Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance. GSDMM+ reduces initialization noise and adap-tively adjusts word weights based on entropy, achieving fine-grained clustering that reveals more topic-related information. Additionally, strategic cluster merging is employed to refine clustering granularity, better aligning the predicted distribution with the true category distribution. We conduct extensive experiments, comparing our methods with both classical and state-of-the-art approaches. The experimental results demonstrate the efficiency and effectiveness of our methods. The source code for our model is publicly available at https://github.com/chehaoa/VEMC. HE proliferation of mobile internet has led to an exponential increase in user-generated data on online platforms, including video, text, and image data. Intelligent processing of such data can significantly enhance the quality of life across society and generate substantial economic benefits. Short text data are a prevalent and important form of user-generated data, consisting of concise texts such as microblogs and comments.


Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models

arXiv.org Artificial Intelligence

When faced with novel situations, people are able to marshal relevant considerations from a wide range of background knowledge and put these to use in inferences and predictions. What permits us to draw in globally relevant information and reason over it coherently? Here, we explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental models tailored to novel situations. We propose a computational implementation of this idea -- a ``Model Synthesis Architecture'' (MSA) -- using language models to implement global relevance-based retrieval and model synthesis and probabilistic programs to implement bespoke, coherent world models. We evaluate our MSA as a model of human judgments on a novel reasoning dataset. The dataset -- built around a `Model Olympics` domain of sports vignettes -- tests models' capacity for human-like, open-ended reasoning by requiring (i) judgments about novel causal structures described in language; (ii) drawing on large bodies of background knowledge; and (iii) doing both in light of observations that introduce arbitrary novel variables. Our MSA approach captures human judgments better than language model-only baselines, under both direct and chain-of-thought generations from the LM that supports model synthesis. These results suggest that MSAs can be implemented in a way that mirrors people's ability to deliver locally coherent reasoning over globally relevant variables, offering a path to understanding and replicating human reasoning in open-ended domains.


Variational Inference for Latent Variable Models in High Dimensions

arXiv.org Machine Learning

In modern applications, these models typically involve a large number of parameters and latent variables, resulting in complex and high-dimensional posteriors that are computationally intractable. For such scenarios, traditional Markov chain Monte Carlo (MCMC) approaches often suffer from lengthy burn-in periods and generally lack scalability [11]. Recently, variational inference (VI) [31, 10, 52, 11] has emerged as a popular and scalable alternative method for approximating intractable posterior distributions in large-scale applications (where the number of observations and dimensionality are both large) and is typically orders of magnitude faster than MCMC methods. Among the various forms of VI, arguably the most widely used and important is mean-field variational inference (MFVI) [52, 11], which approximates the intractable posterior by a product distribution. This approach has been widely adopted in statistics and machine learning, thanks to efficient algorithmic implementations based on coordinate ascent variational inference (CAVI) [10, 11, 19, 7, 5, 36, 14, 34].


Physics constrained learning of stochastic characteristics

arXiv.org Machine Learning

Accurate state estimation requires careful consideration of uncertainty surrounding the process and measurement models; these characteristics are usually not well-known and need an experienced designer to select the covariance matrices. An error in the selection of covariance matrices could impact the accuracy of the estimation algorithm and may sometimes cause the filter to diverge. Identifying noise characteristics has long been a challenging problem due to uncertainty surrounding noise sources and difficulties in systematic noise modeling. Most existing approaches try identifying unknown covariance matrices through an optimization algorithm involving innovation sequences. In recent years, learning approaches have been utilized to determine the stochastic characteristics of process and measurement models. We present a learning-based methodology with different loss functions to identify noise characteristics and test these approaches' performance for real-time vehicle state estimation


When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

arXiv.org Machine Learning

Predicting a response with partially missing inputs remains a challenging task even in parametric models, since parameter estimation in itself is not sufficient to predict on partially observed inputs. Several works study prediction in linear models. In this paper, we focus on logistic models, which present their own difficulties. From a theoretical perspective, we prove that a Pattern-by-Pattern strategy (PbP), which learns one logistic model per missingness pattern, accurately approximates Bayes probabilities in various missing data scenarios (MCAR, MAR and MNAR). Empirically, we thoroughly compare various methods (constant and iterative imputations, complete case analysis, PbP, and an EM algorithm) across classification, probability estimation, calibration, and parameter inference. Our analysis provides a comprehensive view on the logistic regression with missing values. It reveals that mean imputation can be used as baseline for low sample sizes, and improved performance is obtained via nonlinear multiple iterative imputation techniques with the labels ( MICE.RF.Y). For large sample sizes, PbP is the best method for Gaussian mixtures, and we recommend MICE.RF.Y in presence of nonlinear features.


Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes

arXiv.org Machine Learning

The identification of Linear Time-V ariant (L TV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system's impulse response, h( t,τ), as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (L TIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can robustly infer the properties of an L TI system from a single noisy observation, show superior data e fficiency compared to classical methods in a simulated ambient noise tomography problem, and successfully track a continuously varying L TV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.1. Introduction Linear Time-V ariant (L TV) systems are fundamental to modeling dynamic processes in fields ranging from geophysics and communications to control theory (Kozachek et al., 2024; Lin et al., 2020). Unlike their time-invariant counterparts, an L TV system's behavior is described by an impulse response, h( t,τ), that changes over time, posing significant challenges for analysis and estimation (Kailath, 1962; Bello, 1963). The task of identifying h( t,τ) from input-output data is a severely ill-posed inverse problem, as one must infer a function of two variables from one-dimensional time series (Aubel and B olcskei, 2015). This work introduces a Bayesian framework for modeling such systems, where the inherent uncertainty and time-varying nature are captured probabilistically.


(Exhaustive) Symbolic Regression and model selection by minimum description length

arXiv.org Artificial Intelligence

Symbolic regression is the machine learning method for learning functions from data. After a brief overview of the symbolic regression landscape, I will describe the two main challenges that traditional algorithms face: they have an unknown (and likely significant) probability of failing to find any given good function, and they suffer from ambiguity and poorly-justified assumptions in their function-selection procedure. To address these I propose an exhaustive search and model selection by the minimum description length principle, which allows accuracy and complexity to be directly traded off by measuring each in units of information. I showcase the resulting publicly available Exhaustive Symbolic Regression algorithm on three open problems in astrophysics: the expansion history of the universe, the effective behaviour of gravity in galaxies and the potential of the inflaton field. In each case the algorithm identifies many functions superior to the literature standards. This general purpose methodology should find widespread utility in science and beyond.


crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels

arXiv.org Artificial Intelligence

Crowdworking is a cost-efficient solution for acquiring class labels. Since these labels are subject to noise, various approaches to learning from crowds have been proposed. Typically, these approaches are evaluated with default hyperparameter configurations, resulting in unfair and suboptimal performance, or with hyperparameter configurations tuned via a validation set with ground truth class labels, representing an often unrealistic scenario. Moreover, both setups can produce different approach rankings, complicating study comparisons. Therefore, we introduce crowd-hpo as a framework for evaluating approaches to learning from crowds in combination with criteria to select well-performing hyperparameter configurations with access only to noisy crowd-labeled validation data. Extensive experiments with neural networks demonstrate that these criteria select hyperparameter configurations, which improve the learning from crowd approaches' generalization performances, measured on separate test sets with ground truth labels. Hence, incorporating such criteria into experimental studies is essential for enabling fairer and more realistic benchmarking.


MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration

arXiv.org Artificial Intelligence

An increasing number of applications are exploiting sampling-based algorithms for planning, optimization, and inference. The Markov Chain Monte Carlo (MCMC) algorithms form the computational backbone of this emerging branch of machine learning. Unfortunately, the high computational cost limits their feasibility for large-scale problems and real-world applications, and the existing MCMC acceleration solutions are either limited in hardware flexibility or fail to maintain efficiency at the system level across a variety of end-to-end applications. This paper introduces \textbf{MC$^2$A}, an algorithm-hardware co-design framework, enabling efficient and flexible optimization for MCMC acceleration. Firstly, \textbf{MC$^2$A} analyzes the MCMC workload diversity through an extension of the processor performance roofline model with a 3rd dimension to derive the optimal balance between the compute, sampling and memory parameters. Secondly, \textbf{MC$^2$A} proposes a parametrized hardware accelerator architecture with flexible and efficient support of MCMC kernels with a pipeline of ISA-programmable tree-structured processing units, reconfigurable samplers and a crossbar interconnect to support irregular access. Thirdly, the core of \textbf{MC$^2$A} is powered by a novel Gumbel sampler that eliminates exponential and normalization operations. In the end-to-end case study, \textbf{MC$^2$A} achieves an overall {$307.6\times$, $1.4\times$, $2.0\times$, $84.2\times$} speedup compared to the CPU, GPU, TPU and state-of-the-art MCMC accelerator. Evaluated on various representative MCMC workloads, this work demonstrates and exploits the feasibility of general hardware acceleration to popularize MCMC-based solutions in diverse application domains.


LLMs are Bayesian, in Expectation, not in Realization

arXiv.org Machine Learning

Large language models demonstrate remarkable in-context learning capabilities, adapting to new tasks without parameter updates. While this phenomenon has been successfully modeled as implicit Bayesian inference, recent empirical findings reveal a fundamental contradiction: transformers systematically violate the martingale property, a cornerstone requirement of Bayesian updating on exchangeable data. This violation challenges the theoretical foundations underlying uncertainty quantification in critical applications. Our theoretical analysis establishes four key results: (1) positional encodings induce martingale violations of order $Θ(\log n / n)$; (2) transformers achieve information-theoretic optimality with excess risk $O(n^{-1/2})$ in expectation over orderings; (3) the implicit posterior representation converges to the true Bayesian posterior in the space of sufficient statistics; and (4) we derive the optimal chain-of-thought length as $k^* = Θ(\sqrt{n}\log(1/\varepsilon))$ with explicit constants, providing a principled approach to reduce inference costs while maintaining performance. Empirical validation on GPT-3 confirms predictions (1)-(3), with transformers reaching 99\% of theoretical entropy limits within 20 examples. Our framework provides practical methods for extracting calibrated uncertainty estimates from position-aware architectures and optimizing computational efficiency in deployment.