Bayesian Learning
How transformers learn structured data: insights from hierarchical filtering
Garnier-Brun, Jerome, Mรฉzard, Marc, Moscato, Emanuele, Saglietti, Luca
We introduce a hierarchical filtering procedure for generative models of sequences on trees, enabling control over the range of positional correlations in the data. Leveraging this controlled setting, we provide evidence that vanilla encoder-only transformer architectures can implement the optimal Belief Propagation algorithm on both root classification and masked language modeling tasks. Correlations at larger distances corresponding to increasing layers of the hierarchy are sequentially included as the network is trained. We analyze how the transformer layers succeed by focusing on attention maps from models trained with varying degrees of filtering. These attention maps show clear evidence for iterative hierarchical reconstruction of correlations, and we can relate these observations to a plausible implementation of the exact inference algorithm for the network sizes considered. Transformer-based large language models have revolutionized natural language processing, and have notably demonstrated their capacity to perfectly assimilate the grammatical rules of the languages they are trained on. While this evidence shows that transformers can handle and exploit the subtle long-range correlations that emerge in natural language, their inner workings remain largely unclear. Due to the complexity of the standard multi-layer transformer architecture (Vaswani et al., 2017), understanding what strategy is precisely implemented via the attention mechanism to solve a given problem has been limited so far to very simple tasks (Weiss et al., 2021; Zhong et al., 2024; Behrens et al., 2024). Nonetheless, significant results have been obtained by studying transformers on simplified models of language known as Context-Free Grammars (CFGs).
Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation
The use of large language models (LLMs) has significantly increased since the introduction of ChatGPT in 2022, demonstrating their value across various applications. However, a major challenge for enterprise and commercial adoption of LLMs is their tendency to generate inaccurate information, a phenomenon known as "hallucination." This project proposes a method for estimating the factuality of a summary generated by LLMs when compared to a source text. Our approach utilizes Naive Bayes classification to assess the accuracy of the content produced.
Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
Silander, Tomi, Leppรค-aho, Janne, Jรครคsaari, Elias, Roos, Teemu
We introduce an information theoretic criterion for Bayesian network structure learning which we call quotient normalized maximum likelihood (qNML). In contrast to the closely related factorized normalized maximum likelihood criterion, qNML satisfies the property of score equivalence. It is also decomposable and completely free of adjustable hyperparameters. For practical computations, we identify a remarkably accurate approximation proposed earlier by Szpankowski and Weinberger. Experiments on both simulated and real data demonstrate that the new criterion leads to parsimonious models with good predictive accuracy.
Ensemble Prediction via Covariate-dependent Stacking
Wakayama, Tomoya, Sugasawa, Shonosuke
This study proposes a novel approach to ensemble prediction, called ``covariate-dependent stacking'' (CDST). Unlike traditional stacking methods, CDST allows model weights to vary flexibly as a function of covariates, thereby enhancing predictive performance in complex scenarios. We formulate the covariate-dependent weights through combinations of basis functions, estimate them by optimizing cross-validation, and develop an expectation-maximization algorithm, ensuring computational efficiency. To analyze the theoretical properties, we establish an oracle inequality regarding the expected loss to be minimized for estimating model weights. Through comprehensive simulation studies and an application to large-scale land price prediction, we demonstrate that the CDST consistently outperforms conventional model averaging methods, particularly on datasets where some models fail to capture the underlying complexity. Our findings suggest that the CDST is especially valuable for, but not limited to, spatio-temporal prediction problems, offering a powerful tool for researchers and practitioners in various data analysis fields.
Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics
Huang, Yixuan, Agia, Christopher, Wu, Jimmy, Hermans, Tucker, Bohg, Jeannette
We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a language instruction and a point cloud of the scene, our framework initiates a hierarchical planning procedure, whereby a language model generates a high-level plan and a sampling-based planner produces constraint-satisfying continuous parameters for manipulation primitives sequenced according to the high-level plan. Key to our approach is the use of a relational dynamics model as a unifying interface between the continuous and symbolic representations of states and actions, thus facilitating language-driven planning from high-dimensional perceptual input such as point clouds. Whereas previous relational dynamics models require training on datasets of multi-step manipulation scenarios that align with the intended test scenarios, Points2Plans uses only single-step simulated training data while generalizing zero-shot to a variable number of steps during real-world evaluations. We evaluate our approach on tasks involving geometric reasoning, multi-object interactions, and occluded object reasoning in both simulated and real-world settings. Results demonstrate that Points2Plans offers strong generalization to unseen long-horizon tasks in the real world, where it solves over 85% of evaluated tasks while the next best baseline solves only 50%. Qualitative demonstrations of our approach operating on a mobile manipulator platform are made available at sites.google.com/stanford.edu/points2plans.
Estimating Causal Effects from Learned Causal Networks
Raichev, Anna, Ihler, Alexander, Tian, Jin, Dechter, Rina
The standard approach to answering an identifiable causal-effect query (e.g., $P(Y|do(X)$) when given a causal diagram and observational data is to first generate an estimand, or probabilistic expression over the observable variables, which is then evaluated using the observational data. In this paper, we propose an alternative paradigm for answering causal-effect queries over discrete observable variables. We propose to instead learn the causal Bayesian network and its confounding latent variables directly from the observational data. Then, efficient probabilistic graphical model (PGM) algorithms can be applied to the learned model to answer queries. Perhaps surprisingly, we show that this \emph{model completion} learning approach can be more effective than estimand approaches, particularly for larger models in which the estimand expressions become computationally difficult. We illustrate our method's potential using a benchmark collection of Bayesian networks and synthetically generated causal models.
Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging
Li, Yuanhao, Chen, Badong, Hu, Zhongxu, Suzuki, Keita, Bai, Wenjun, Koike, Yasuharu, Yamashita, Okito
Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a potentially non-Gaussian distribution for the observation noise. Hence the conventional Gaussian likelihood model is a suboptimal choice for the real-world source imaging task. In this study, we aim to solve this problem by proposing a new likelihood model which is robust with respect to non-Gaussian noises. Motivated by the robust maximum correntropy criterion, we propose a new improper distribution model concerning the noise assumption. This new noise distribution is leveraged to structure a robust likelihood function and integrated with hierarchical prior distributions to estimate source activities by variational inference. In particular, the score matching is adopted to determine the hyperparameters for the improper likelihood model. A comprehensive performance evaluation is performed to compare the proposed noise assumption to the conventional Gaussian model. Simulation results show that, the proposed method can realize more precise source reconstruction by designing known ground-truth. The real-world dataset also demonstrates the superiority of our new method with the visual perception task. This study provides a new backbone for Bayesian source imaging, which would facilitate its application using real-world noisy brain signal.
Dynamic Pricing for Electric Vehicle Charging
Kalakanti, Arun Kumar, Rao, Shrisha
Dynamic pricing is a promising strategy to address the challenges of smart charging, as traditional time-of-use (ToU) rates and stationary pricing (SP) do not dynamically react to changes in operating conditions, reducing revenue for charging station (CS) vendors and affecting grid stability. Previous studies evaluated single objectives or linear combinations of objectives for EV CS pricing solutions, simplifying trade-offs and preferences among objectives. We develop a novel formulation for the dynamic pricing problem by addressing multiple conflicting objectives efficiently instead of solely focusing on one objective or metric, as in earlier works. We find optimal trade-offs or Pareto solutions efficiently using Non-dominated Sorting Genetic Algorithms (NSGA) II and NSGA III. A dynamic pricing model quantifies the relationship between demand and price while simultaneously solving multiple conflicting objectives, such as revenue, quality of service (QoS), and peak-to-average ratios (PAR). A single method can only address some of the above aspects of dynamic pricing comprehensively. We present a three-part dynamic pricing approach using a Bayesian model, multi-objective optimization, and multi-criteria decision-making (MCDM) using pseudo-weight vectors. To address the research gap in CS pricing, our method selects solutions using revenue, QoS, and PAR metrics simultaneously. Two California charging sites' real-world data validates our approach.
Lemon and Orange Disease Classification using CNN-Extracted Features and Machine Learning Classifier
Arifin, Khandoker Nosiba, Rupa, Sayma Akter, Anwar, Md Musfique, Jahan, Israt
Lemons and oranges, both are the most economically significant citrus fruits globally. The production of lemons and oranges is severely affected due to diseases in its growth stages. Fruit quality has degraded due to the presence of flaws. Thus, it is necessary to diagnose the disease accurately so that we can avoid major loss of lemons and oranges. To improve citrus farming, we proposed a disease classification approach for lemons and oranges. This approach would enable early disease detection and intervention, reduce yield losses, and optimize resource allocation. For the initial modeling of disease classification, the research uses innovative deep learning architectures such as VGG16, VGG19 and ResNet50. In addition, for achieving better accuracy, the basic machine learning algorithms used for classification problems include Random Forest, Naive Bayes, K-Nearest Neighbors (KNN) and Logistic Regression. The lemon and orange fruits diseases are classified more accurately (95.0% for lemon and 99.69% for orange) by the model. The model's base features were extracted from the ResNet50 pre-trained model and the diseases are classified by the Logistic Regression which beats the performance given by VGG16 and VGG19 for other classifiers. Experimental outcomes show that the proposed model also outperforms existing models in which most of them classified the diseases using the Softmax classifier without using any individual classifiers.
ALIAS: DAG Learning with Efficient Unconstrained Policies
Duong, Bao, Le, Hung, Nguyen, Thin
Recently, reinforcement learning (RL) has proved a promising alternative for conventional local heuristics in score-based approaches to learning directed acyclic causal graphs (DAGs) from observational data. However, the intricate acyclicity constraint still challenges the efficient exploration of the vast space of DAGs in existing methods. In this study, we introduce ALIAS (reinforced dAg Learning wIthout Acyclicity conStraints), a novel approach to causal discovery powered by the RL machinery. Our method features an efficient policy for generating DAGs in just a single step with an optimal quadratic complexity, fueled by a novel parametrization of DAGs that directly translates a continuous space to the space of all DAGs, bypassing the need for explicitly enforcing acyclicity constraints. This approach enables us to navigate the search space more effectively by utilizing policy gradient methods and established scoring functions. In addition, we provide compelling empirical evidence for the strong performance of ALIAS in comparison with state-of-the-arts in causal discovery over increasingly difficult experiment conditions on both synthetic and real datasets.