Goto

Collaborating Authors

 Bayesian Learning


Audio-FLAN: A Preliminary Release

arXiv.org Artificial Intelligence

Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learning across text and vision, its application to audio remains largely unexplored. A major obstacle is the lack of comprehensive datasets that unify audio understanding and generation. To address this, we introduce Audio-FLAN, a large-scale instruction-tuning dataset covering 80 diverse tasks across speech, music, and sound domains, with over 100 million instances. Audio-FLAN lays the foundation for unified audio-language models that can seamlessly handle both understanding (e.g., transcription, comprehension) and generation (e.g., speech, music, sound) tasks across a wide range of audio domains in a zero-shot manner. The Audio-FLAN dataset is available on HuggingFace and GitHub and will be continuously updated.


A Review of Causal Decision Making

arXiv.org Machine Learning

To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens: 1) the discovery of causal relationships through causal structure learning, 2) understanding the impacts of these relationships through causal effect learning, and 3) applying the knowledge gained from the first two aspects to support decision making via causal policy learning. Moreover, we identify challenges that hinder the broader utilization of causal decision-making and discuss recent advances in overcoming these challenges. Finally, we provide future research directions to address these challenges and to further enhance the implementation of causal decision-making in practice, with real-world applications illustrated based on the proposed causal decision-making. We aim to offer a comprehensive methodology and practical implementation framework by consolidating various methods in this area into a Python-based collection. URL: https://causaldm.github.io/Causal-Decision-Making.


Auto-ADMET: An Effective and Interpretable AutoML Method for Chemical ADMET Property Prediction

arXiv.org Artificial Intelligence

Machine learning (ML) has been playing important roles in drug discovery in the past years by providing (pre-)screening tools for prioritising chemical compounds to pass through wet lab experiments. One of the main ML tasks in drug discovery is to build quantitative structure-activity relationship (QSAR) models, associating the molecular structure of chemical compounds with an activity or property. These properties -- including absorption, distribution, metabolism, excretion and toxicity (ADMET) -- are essential to model compound behaviour, activity and interactions in the organism. Although several methods exist, the majority of them do not provide an appropriate model's personalisation, yielding to bias and lack of generalisation to new data since the chemical space usually shifts from application to application. This fact leads to low predictive performance when completely new data is being tested by the model. The area of Automated Machine Learning (AutoML) emerged aiming to solve this issue, outputting tailored ML algorithms to the data at hand. Although an important task, AutoML has not been practically used to assist cheminformatics and computational chemistry researchers often, with just a few works related to the field. To address these challenges, this work introduces Auto-ADMET, an interpretable evolutionary-based AutoML method for chemical ADMET property prediction. Auto-ADMET employs a Grammar-based Genetic Programming (GGP) method with a Bayesian Network Model to achieve comparable or better predictive performance against three alternative methods -- standard GGP method, pkCSM and XGBOOST model -- on 12 benchmark chemical ADMET property prediction datasets. The use of a Bayesian Network model on Auto-ADMET's evolutionary process assisted in both shaping the search procedure and interpreting the causes of its AutoML performance.


Network Tomography with Path-Centric Graph Neural Network

arXiv.org Artificial Intelligence

Network tomography is a crucial problem in network monitoring, where the observable path performance metric values are used to infer the unobserved ones, making it essential for tasks such as route selection, fault diagnosis, and traffic control. However, most existing methods either assume complete knowledge of network topology and metric formulas-an unrealistic expectation in many real-world scenarios with limited observability-or rely entirely on black-box end-to-end models. To tackle this, in this paper, we argue that a good network tomography requires synergizing the knowledge from both data and appropriate inductive bias from (partial) prior knowledge. To see this, we propose Deep Network Tomography (DeepNT), a novel framework that leverages a path-centric graph neural network to predict path performance metrics without relying on predefined hand-crafted metrics, assumptions, or the real network topology. The path-centric graph neural network learns the path embedding by inferring and aggregating the embeddings of the sequence of nodes that compose this path. Training path-centric graph neural networks requires learning the neural netowrk parameters and network topology under discrete constraints induced by the observed path performance metrics, which motivates us to design a learning objective that imposes connectivity and sparsity constraints on topology and path performance triangle inequality on path performance. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of DeepNT in predicting performance metrics and inferring graph topology compared to state-of-the-art methods.


Toward a Flexible Framework for Linear Representation Hypothesis Using Maximum Likelihood Estimation

arXiv.org Artificial Intelligence

Linear representation hypothesis posits that high-level concepts are encoded as linear directions in the representation spaces of LLMs. Park et al. (2024) formalize this notion by unifying multiple interpretations of linear representation, such as 1-dimensional subspace representation and interventions, using a causal inner product. However, their framework relies on single-token counterfactual pairs and cannot handle ambiguous contrasting pairs, limiting its applicability to complex or context-dependent concepts. We introduce a new notion of binary concepts as unit vectors in a canonical representation space, and utilize LLMs' (neural) activation differences along with maximum likelihood estimation (MLE) to compute concept directions (i.e., steering vectors). Our method, Sum of Activation-base Normalized Difference (SAND), formalizes the use of activation differences modeled as samples from a von Mises-Fisher (vMF) distribution, providing a principled approach to derive concept directions. We extend the applicability of Park et al. (2024) by eliminating the dependency on unembedding representations and single-token pairs. Through experiments with LLaMA models across diverse concepts and benchmarks, we demonstrate that our lightweight approach offers greater flexibility, superior performance in activation engineering tasks like monitoring and manipulation.


Does Your AI Agent Get You? A Personalizable Framework for Approximating Human Models from Argumentation-based Dialogue Traces

arXiv.org Artificial Intelligence

Explainable AI is increasingly employing argumentation methods to facilitate interactive explanations between AI agents and human users. While existing approaches typically rely on predetermined human user models, there remains a critical gap in dynamically learning and updating these models during interactions. In this paper, we present a framework that enables AI agents to adapt their understanding of human users through argumentation-based dialogues. Our approach, called Persona, draws on prospect theory and integrates a probability weighting function with a Bayesian belief update mechanism that refines a probability distribution over possible human models based on exchanged arguments. Through empirical evaluations with human users in an applied argumentation setting, we demonstrate that Persona effectively captures evolving human beliefs, facilitates personalized interactions, and outperforms state-of-the-art methods.


Direct Alignment with Heterogeneous Preferences

arXiv.org Artificial Intelligence

This tension in assumptions is readily apparent in standard human-AI alignment methods--such as reinforcement learning from human feedback (RLHF) [6, 7, 8] and direct preference optimization (DPO) [9]--which assume a single reward function captures the interests of the entire population. We examine the limits of the preference homogeneity assumption when individuals belong to user types, each characterized by a specific reward function. Recent work has shown that in this setting, the homogeneity assumption can lead to unexpected behavior [10, 11, 12]. One challenge is that, under this assumption, learning from human preferences becomes unrealizable, as a single reward function cannot capture the complexity of population preferences with multiple reward functions [13, 14]. Both RLHF and DPO rely on maximum likelihood estimation (MLE) to optimize the reward or policy. Unrealizability implies their likelihood functions cannot fully represent the underlying preference data distribution, resulting in a nontrivial optimal MLE solution. From another perspective, learning a universal reward or policy from a heterogeneous population inherently involves an aggregation of diverse interests, and this aggregation is nontrivial. In the quest for a single policy that accommodates a heterogeneous population with multiple user types, we show that the only universal reward yielding a well-defined alignment problem is an affine Equal contribution Work done while visiting Harvard Equal advising 1 arXiv:2502.16320v1


Statistical Inference in Reinforcement Learning: A Selective Survey

arXiv.org Machine Learning

Thus, the observed data can be summarized into a sequence of "observation-action-reward" triplets ( O t, A t, R t) t 0. It is worth noting that the observation O t at each time step is not equivalent to the environment's state S t. Indeed, the state can be viewed as a special observation with the Markov property, and we will elaborate on the difference between the two later. Policies: The goal of RL is to learn an optimal policy ฯ€ based on the observation-action-reward triplets to maximize the agent's cumulative reward. Mathematically, a policy is defined as a conditional probability distribution function mapping the agent's observed data history to the action space. It specifies the probability of the agent taking different actions at each time step. Below, we introduce three types of policies (see Figure 1(b) for a visualization of their relationships): (1) History-dependent policy: This is the most general form of policy. At each time t, we define H t as the set containing the current observation O t and all prior historical information (O i, A i, R i) i


Synthetic vs. Gold: The Role of LLM-Generated Labels and Data in Cyberbullying Detection

arXiv.org Artificial Intelligence

This study investigates the role of LLM-generated synthetic data in cyberbullying detection. We conduct a series of experiments where we replace some or all of the authentic data with synthetic data, or augment the authentic data with synthetic data. We find that synthetic cyberbullying data can be the basis for training a classifier for harm detection that reaches performance close to that of a classifier trained with authentic data. Combining authentic with synthetic data shows improvements over the baseline of training on authentic data alone for the test data for all three LLMs tried. These results highlight the viability of synthetic data as a scalable, ethically viable alternative in cyberbullying detection while emphasizing the critical impact of LLM selection on performance outcomes.


Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network

arXiv.org Artificial Intelligence

A major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that curricula constructed using SEBN frequently outperform other baselines.