Bayesian Learning
Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning
Hamzi, Boumediene, Hutter, Marcus, Owhadi, Houman
Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This paper shows that it is not necessary to use the statistical route to derive Sparse Kernel Flows and that one can directly work with code-lengths and complexities that are concepts that show up in AIT.
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
Wรถrmann, Julian, Bogdoll, Daniel, Brunner, Christian, Bรผhrle, Etienne, Chen, Han, Chuo, Evaristus Fuh, Cvejoski, Kostadin, van Elst, Ludger, Gottschall, Philip, Griesche, Stefan, Hellert, Christian, Hesels, Christian, Houben, Sebastian, Joseph, Tim, Keil, Niklas, Kelsch, Johann, Keser, Mert, Kรถnigshof, Hendrik, Kraft, Erwin, Kreuser, Leonie, Krone, Kevin, Latka, Tobias, Mattern, Denny, Matthes, Stefan, Motzkus, Franz, Munir, Mohsin, Nekolla, Moritz, Paschke, Adrian, von Pilchau, Stefan Pilar, Pintz, Maximilian Alexander, Qiu, Tianming, Qureishi, Faraz, Rizvi, Syed Tahseen Raza, Reichardt, Jรถrg, von Rueden, Laura, Sagel, Alexander, Sasdelli, Diogo, Scholl, Tobias, Schunk, Gerhard, Schwalbe, Gesina, Shen, Hao, Shoeb, Youssef, Stapelbroek, Hendrik, Stehr, Vera, Srinivas, Gurucharan, Tran, Anh Tuan, Vivekanandan, Abhishek, Wang, Ya, Wasserrab, Florian, Werner, Tino, Wirth, Christian, Zwicklbauer, Stefan
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
Provably Efficient CVaR RL in Low-rank MDPs
Zhao, Yulai, Zhan, Wenhao, Hu, Xiaoyan, Leung, Ho-fung, Farnia, Farzan, Sun, Wen, Lee, Jason D.
We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision Processes (MDPs) setting. To extend CVaR RL to settings where state space is large, function approximation must be deployed. We study CVaR RL in low-rank MDPs with nonlinear function approximation. Low-rank MDPs assume the underlying transition kernel admits a low-rank decomposition, but unlike prior linear models, low-rank MDPs do not assume the feature or state-action representation is known. We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to carefully balance the interplay between exploration, exploitation, and representation learning in CVaR RL. We prove that our algorithm achieves a sample complexity of $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ to yield an $\epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations. Computational-wise, we design a novel discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR objective as the planning oracle and show that we can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle. To our knowledge, this is the first provably efficient CVaR RL algorithm in low-rank MDPs.
Sentiment Analysis of Twitter Posts on Global Conflicts
Sasikumar, Ujwal, Zaman, Ank, Mawlood-Yunis, Abdul-Rahman, Chatterjee, Prosenjit
Sentiment analysis of social media data is an emerging field with vast applications in various domains. In this study, we developed a sentiment analysis model to analyze social media sentiment, especially tweets, during global conflicting scenarios. To establish our research experiment, we identified a recent global dispute incident on Twitter and collected around 31,000 filtered Tweets for several months to analyze human sentiment worldwide.
MiniAnDE: a reduced AnDE ensemble to deal with microarray data
Torrijos, Pablo, Gรกmez, Josรฉ A., Puerta, Josรฉ M.
This article focuses on the supervised classification of datasets with a large number of variables and a small number of instances. This is the case, for example, for microarray data sets commonly used in bioinformatics. Complex classifiers that require estimating statistics over many variables are not suitable for this type of data. Probabilistic classifiers with low-order probability tables, e.g. NB and AODE, are good alternatives for dealing with this type of data. AODE usually improves NB in accuracy, but suffers from high spatial complexity since $k$ models, each with $n+1$ variables, are included in the AODE ensemble. In this paper, we propose MiniAnDE, an algorithm that includes only a small number of heterogeneous base classifiers in the ensemble, i.e., each model only includes a different subset of the $k$ predictive variables. Experimental evaluation shows that using MiniAnDE classifiers on microarray data is feasible and outperforms NB and other ensembles such as bagging and random forest.
Ovarian Cancer Data Analysis using Deep Learning: A Systematic Review from the Perspectives of Key Features of Data Analysis and AI Assurance
Hira, Muta Tah, Razzaque, Mohammad A., Sarker, Mosharraf
Background and objectives: By extracting this information, Machine or Deep Learning (ML/DL)-based autonomous data analysis tools can assist clinicians and cancer researchers in discovering patterns and relationships from complex data sets. Many DL-based analyses on ovarian cancer (OC) data have recently been published. These analyses are highly diverse in various aspects of cancer (e.g., subdomain(s) and cancer type they address) and data analysis features. However, a comprehensive understanding of these analyses in terms of these features and AI assurance (AIA) is currently lacking. This systematic review aims to fill this gap by examining the existing literature and identifying important aspects of OC data analysis using DL, explicitly focusing on the key features and AI assurance perspectives. Methods: The PRISMA framework was used to conduct comprehensive searches in three journal databases. Only studies published between 2015 and 2023 in peer-reviewed journals were included in the analysis. Results: In the review, a total of 96 DL-driven analyses were examined. The findings reveal several important insights regarding DL-driven ovarian cancer data analysis: - Most studies 71% (68 out of 96) focused on detection and diagnosis, while no study addressed the prediction and prevention of OC. - The analyses were predominantly based on samples from a non-diverse population (75% (72/96 studies)), limited to a geographic location or country. - Only a small proportion of studies (only 33% (32/96)) performed integrated analyses, most of which used homogeneous data (clinical or omics). - Notably, a mere 8.3% (8/96) of the studies validated their models using external and diverse data sets, highlighting the need for enhanced model validation, and - The inclusion of AIA in cancer data analysis is in a very early stage; only 2.1% (2/96) explicitly addressed AIA through explainability.
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
Wang, Guangjing, Zhou, Ce, Wang, Yuanda, Chen, Bocheng, Guo, Hanqing, Yan, Qiben
Artificial Intelligence (AI) systems such as autonomous vehicles, facial recognition, and speech recognition systems are increasingly integrated into our daily lives. However, despite their utility, these AI systems are vulnerable to a wide range of attacks such as adversarial, backdoor, data poisoning, membership inference, model inversion, and model stealing attacks. In particular, numerous attacks are designed to target a particular model or system, yet their effects can spread to additional targets, referred to as transferable attacks. Although considerable efforts have been directed toward developing transferable attacks, a holistic understanding of the advancements in transferable attacks remains elusive. In this paper, we comprehensively explore learning-based attacks from the perspective of transferability, particularly within the context of cyber-physical security. We delve into different domains -- the image, text, graph, audio, and video domains -- to highlight the ubiquitous and pervasive nature of transferable attacks. This paper categorizes and reviews the architecture of existing attacks from various viewpoints: data, process, model, and system. We further examine the implications of transferable attacks in practical scenarios such as autonomous driving, speech recognition, and large language models (LLMs). Additionally, we outline the potential research directions to encourage efforts in exploring the landscape of transferable attacks. This survey offers a holistic understanding of the prevailing transferable attacks and their impacts across different domains.
Leveraging Uncertainty Estimates To Improve Classifier Performance
Arora, Gundeep, Merugu, Srujana, Saladi, Anoop, Rastogi, Rajeev
Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound). However, model scores are often not aligned with the true positivity rate. This is especially true when the training involves a differential sampling across classes or there is distributional drift between train and test settings. In this paper, we provide theoretical analysis and empirical evidence of the dependence of model score estimation bias on both uncertainty and score itself. Further, we formulate the decision boundary selection in terms of both model score and uncertainty, prove that it is NP-hard, and present algorithms based on dynamic programming and isotonic regression. Evaluation of the proposed algorithms on three real-world datasets yield 25%-40% gain in recall at high precision bounds over the traditional approach of using model score alone, highlighting the benefits of leveraging uncertainty.
Causal Structure Learning Supervised by Large Language Model
Ban, Taiyu, Chen, Lyuzhou, Lyu, Derui, Wang, Xiangyu, Chen, Huanhuan
Causal discovery from observational data is pivotal for deciphering complex relationships. Causal Structure Learning (CSL), which focuses on deriving causal Directed Acyclic Graphs (DAGs) from data, faces challenges due to vast DAG spaces and data sparsity. The integration of Large Language Models (LLMs), recognized for their causal reasoning capabilities, offers a promising direction to enhance CSL by infusing it with knowledge-based causal inferences. However, existing approaches utilizing LLMs for CSL have encountered issues, including unreliable constraints from imperfect LLM inferences and the computational intensity of full pairwise variable analyses. In response, we introduce the Iterative LLM Supervised CSL (ILS-CSL) framework. ILS-CSL innovatively integrates LLM-based causal inference with CSL in an iterative process, refining the causal DAG using feedback from LLMs. This method not only utilizes LLM resources more efficiently but also generates more robust and high-quality structural constraints compared to previous methodologies. Our comprehensive evaluation across eight real-world datasets demonstrates ILS-CSL's superior performance, setting a new standard in CSL efficacy and showcasing its potential to significantly advance the field of causal discovery. The codes are available at \url{https://github.com/tyMadara/ILS-CSL}.
Towards a Transportable Causal Network Model Based on Observational Healthcare Data
Bernasconi, Alice, Zanga, Alessio, Lucas, Peter J. F., Scutari, Marco, Stella, Fabio
Over the last decades, many prognostic models based on artificial intelligence techniques have been used to provide detailed predictions in healthcare. Unfortunately, the real-world observational data used to train and validate these models are almost always affected by biases that can strongly impact the outcomes validity: two examples are values missing not-at-random and selection bias. Addressing them is a key element in achieving transportability and in studying the causal relationships that are critical in clinical decision making, going beyond simpler statistical approaches based on probabilistic association. In this context, we propose a novel approach that combines selection diagrams, missingness graphs, causal discovery and prior knowledge into a single graphical model to estimate the cardiovascular risk of adolescent and young females who survived breast cancer. We learn this model from data comprising two different cohorts of patients. The resulting causal network model is validated by expert clinicians in terms of risk assessment, accuracy and explainability, and provides a prognostic model that outperforms competing machine learning methods.