Goto

Collaborating Authors

 Bayesian Learning


Tracking student skills real-time through a continuous-variable dynamic Bayesian network

arXiv.org Machine Learning

The field of Knowledge Tracing is focused on predicting the success rate of a student for a given skill. Modern methods like Deep Knowledge Tracing provide accurate estimates given enough data, but being based on neural networks they struggle to explain how these estimates are formed. More classical methods like Dynamic Bayesian Networks can do this, but they cannot give data on the accuracy of their estimates and often struggle to incorporate new observations in real-time due to their high computational load. This paper presents a novel method, Performance Distribution Tracing (PDT), in which the distribution of the success rate is traced live. It uses a Dynamic Bayesian Network with continuous random variables as nodes. By tracing the success rate distribution, there is always data available on the accuracy of any success rate estimation. In addition, it makes it possible to combine data from similar/related skills to come up with a more informed estimate of success rates. This makes it possible to predict exercise success rates, providing both explainability and an accuracy indication, even when an exercise requires a combination of different skills to solve. And through the use of the beta distribution functions as conjugate priors, all distributions are available in analytical form, allowing efficient online updates upon new observations. Experiments have shown that the resulting estimates generally feel sufficiently accurate to end-users such that they accept recommendations based on them.


Amortized Bayesian Mixture Models

arXiv.org Machine Learning

Finite mixtures are a broad class of models useful in scenarios where observed data is generated by multiple distinct processes but without explicit information about the responsible process for each data point. Estimating Bayesian mixture models is computationally challenging due to issues such as high-dimensional posterior inference and label switching. Furthermore, traditional methods such as MCMC are applicable only if the likelihoods for each mixture component are analytically tractable. Amortized Bayesian Inference (ABI) is a simulation-based framework for estimating Bayesian models using generative neural networks. This allows the fitting of models without explicit likelihoods, and provides fast inference. ABI is therefore an attractive framework for estimating mixture models. This paper introduces a novel extension of ABI tailored to mixture models. We factorize the posterior into a distribution of the parameters and a distribution of (categorical) mixture indicators, which allows us to use a combination of generative neural networks for parameter inference, and classification networks for mixture membership identification. The proposed framework accommodates both independent and dependent mixture models, enabling filtering and smoothing. We validate and demonstrate our approach through synthetic and real-world datasets.


DPERC: Direct Parameter Estimation for Mixed Data

arXiv.org Machine Learning

The covariance matrix is a foundation in numerous statistical and machine-learning applications such as Principle Component Analysis, Correlation Heatmap, etc. However, missing values within datasets present a formidable obstacle to accurately estimating this matrix. While imputation methods offer one avenue for addressing this challenge, they often entail a trade-off between computational efficiency and estimation accuracy. Consequently, attention has shifted towards direct parameter estimation, given its precision and reduced computational burden. In this paper, we propose Direct Parameter Estimation for Randomly Missing Data with Categorical Features (DPERC), an efficient approach for direct parameter estimation tailored to mixed data that contains missing values within continuous features. Our method is motivated by leveraging information from categorical features, which can significantly enhance covariance matrix estimation for continuous features. Our approach effectively harnesses the information embedded within mixed data structures. Through comprehensive evaluations of diverse datasets, we demonstrate the competitive performance of DPERC compared to various contemporary techniques. In addition, we also show by experiments that DPERC is a valuable tool for visualizing the correlation heatmap.


Goal-directed Generation of Discrete Structures with Conditional Generative Models

Neural Information Processing Systems

Despite recent advances, goal-directed generation of structured discrete data remains challenging. For problems such as program synthesis (generating source code) and materials design (generating molecules), finding examples which satisfy desired constraints or exhibit desired properties is difficult. In practice, expensive heuristic search or reinforcement learning algorithms are often employed. In this paper, we investigate the use of conditional generative models which directly attack this inverse problem, by modeling the distribution of discrete structures given properties of interest. Unfortunately, the maximum likelihood training of such models often fails with the samples from the generative model inadequately respecting the input properties.


The Poisson Gamma Belief Network

Neural Information Processing Systems

To infer a multilayer representation of high-dimensional count vectors, we propose the Poisson gamma belief network (PGBN) that factorizes each of its layers into the product of a connection weight matrix and the nonnegative real hidden units of the next layer. The PGBN's hidden layers are jointly trained with an upward-downward Gibbs sampler, each iteration of which upward samples Dirichlet distributed connection weight vectors starting from the first layer (bottom data layer), and then downward samples gamma distributed hidden units starting from the top hidden layer. The gamma-negative binomial process combined with a layer-wise training strategy allows the PGBN to infer the width of each layer given a fixed budget on the width of the first layer. The PGBN with a single hidden layer reduces to Poisson factor analysis. Example results on text analysis illustrate interesting relationships between the width of the first layer and the inferred network structure, and demonstrate that the PGBN, whose hidden units are imposed with correlated gamma priors, can add more layers to increase its performance gains over Poisson factor analysis, given the same limit on the width of the first layer.


ARMAX identification of low rank graphical models

arXiv.org Artificial Intelligence

In large-scale systems, complex internal relationships are often present. Such interconnected systems can be effectively described by low rank stochastic processes. When identifying a predictive model of low rank processes from sampling data, the rank-deficient property of spectral densities is often obscured by the inevitable measurement noise in practice. However, existing low rank identification approaches often did not take noise into explicit consideration, leading to non-negligible inaccuracies even under weak noise. In this paper, we address the identification issue of low rank processes under measurement noise. We find that the noisy measurement model admits a sparse plus low rank structure in latent-variable graphical models. Specifically, we first decompose the problem into a maximum entropy covariance extension problem, and a low rank graphical estimation problem based on an autoregressive moving-average with exogenous input (ARMAX) model. To identify the ARMAX low rank graphical models, we propose an estimation approach based on maximum likelihood. The identifiability and consistency of this approach are proven under certain conditions. Simulation results confirm the reliable performance of the entire algorithm in both the parameter estimation and noisy data filtering.


Utilizing AI Language Models to Identify Prognostic Factors for Coronary Artery Disease: A Study in Mashhad Residents

arXiv.org Artificial Intelligence

Abstract: Background: Understanding cardiovascular artery disease risk factors, the leading global cause of mortality, is crucial for influencing its etiology, prevalence, and treatment. This study aims to evaluate prognostic markers for coronary artery disease in Mashhad using Naive Bayes, REP Tree, J48, CART, and CHAID algorithms. Methods: Using data from the 2009 MASHAD STUDY, prognostic factors for coronary artery disease were determined with Naive Bayes, REP Tree, J48, CART, CHAID, and Random Forest algorithms using R 3.5.3 and WEKA 3.9.4. Model efficiency was compared by sensitivity, specificity, and accuracy. Cases were patients with coronary artery disease; each had three controls (totally 940). Results: Prognostic factors for coronary artery disease in Mashhad residents varied by algorithm. CHAID identified age, myocardial infarction history, and hypertension. CART included depression score and physical activity. REP added education level and anxiety score. NB included diabetes and family history. J48 highlighted father's heart disease and weight loss. CHAID had the highest accuracy (0.80). Conclusion: Key prognostic factors for coronary artery disease in CART and CHAID models include age, myocardial infarction history, hypertension, depression score, physical activity, and BMI. NB, REP Tree, and J48 identified numerous factors. CHAID had the highest accuracy, sensitivity, and specificity. CART offers simpler interpretation, aiding physician and paramedic model selection based on specific. Keywords: RF, Na\"ive Bayes, REP, J48 algorithms, Coronary Artery Disease (CAD).


Understanding Mental Health Content on Social Media and Its Effect Towards Suicidal Ideation

arXiv.org Artificial Intelligence

This review underscores the critical need for effective strategies to identify and support individuals with suicidal ideation, exploiting technological innovations in ML and DL to further suicide prevention efforts. The study details the application of these technologies in analyzing vast amounts of unstructured social media data to detect linguistic patterns, keywords, phrases, tones, and contextual cues associated with suicidal thoughts. It explores various ML and DL models like SVMs, CNNs, LSTM, neural networks, and their effectiveness in interpreting complex data patterns and emotional nuances within text data. The review discusses the potential of these technologies to serve as a life-saving tool by identifying at-risk individuals through their digital traces. Furthermore, it evaluates the real-world effectiveness, limitations, and ethical considerations of employing these technologies for suicide prevention, stressing the importance of responsible development and usage. The study aims to fill critical knowledge gaps by analyzing recent studies, methodologies, tools, and techniques in this field. It highlights the importance of synthesizing current literature to inform practical tools and suicide prevention efforts, guiding innovation in reliable, ethical systems for early intervention. This research synthesis evaluates the intersection of technology and mental health, advocating for the ethical and responsible application of ML, DL, and NLP to offer life-saving potential worldwide while addressing challenges like generalizability, biases, privacy, and the need for further research to ensure these technologies do not exacerbate existing inequities and harms.


Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review

arXiv.org Machine Learning

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities, practical applications in fields such as biology often require sample generation that maximizes specific metrics (e.g., stability, affinity in proteins, closeness to target structures). In these scenarios, diffusion models can be adapted not only to generate realistic samples but also to explicitly maximize desired measures at inference time without fine-tuning. This tutorial explores the foundational aspects of such inference-time algorithms. We review these methods from a unified perspective, demonstrating that current techniques -- such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance -- aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. Within this framework, we present several novel algorithms not yet covered in the literature. Furthermore, we discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, which have received limited attention in current research, and (3) connections between inference-time algorithms in language models and diffusion models. The code of this tutorial on protein design is available at https://github.com/masa-ue/AlignInversePro


Identifying Information from Observations with Uncertainty and Novelty

arXiv.org Machine Learning

A machine learning tasks from observations must encounter and process uncertainty and novelty, especially when it is expected to maintain performance when observing new information and to choose the best fitting hypothesis to the currently observed information. In this context, some key questions arise: what is information, how much information did the observations provide, how much information is required to identify the data-generating process, how many observations remain to get that information, and how does a predictor determine that it has observed novel information? This paper strengthens existing answers to these questions by formalizing the notion of "identifiable information" that arises from the language used to express the relationship between distinct states. Model identifiability and sample complexity are defined via computation of an indicator function over a set of hypotheses. Their properties and asymptotic statistics are described for data-generating processes ranging from deterministic processes to ergodic stationary stochastic processes. This connects the notion of identifying information in finite steps with asymptotic statistics and PAC-learning. The indicator function's computation naturally formalizes novel information and its identification from observations with respect to a hypothesis set. We also proved that computable PAC-Bayes learners' sample complexity distribution is determined by its moments in terms of the the prior probability distribution over a fixed finite hypothesis set.