Goto

Collaborating Authors

 Bayesian Learning


Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

arXiv.org Artificial Intelligence

The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.


Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning

arXiv.org Artificial Intelligence

The primary objective of methods in continual learning is to learn tasks in a sequential manner over time from a stream of data, while mitigating the detrimental phenomenon of catastrophic forgetting. In this paper, we focus on learning an optimal representation between previous class prototypes and newly encountered ones. We propose a prototypical network with a Bayesian learning-driven contrastive loss (BLCL) tailored specifically for class-incremental learning scenarios. Therefore, we introduce a contrastive loss that incorporates new classes into the latent representation by reducing the intra-class distance and increasing the inter-class distance. Our approach dynamically adapts the balance between the cross-entropy and contrastive loss functions with a Bayesian learning technique. Empirical evaluations conducted on both the CIFAR-10 and CIFAR-100 dataset for image classification and images of a GNSS-based dataset for interference classification validate the efficacy of our method, showcasing its superiority over existing state-of-the-art approaches.


Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

arXiv.org Artificial Intelligence

The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.


Positive and Unlabeled Data: Model, Estimation, Inference, and Classification

arXiv.org Machine Learning

This study introduces a new approach to addressing positive and unlabeled (PU) data through the double exponential tilting model (DETM). Traditional methods often fall short because they only apply to selected completely at random (SCAR) PU data, where the labeled positive and unlabeled positive data are assumed to be from the same distribution. In contrast, our DETM's dual structure effectively accommodates the more complex and underexplored selected at random PU data, where the labeled and unlabeled positive data can be from different distributions. We rigorously establish the theoretical foundations of DETM, including identifiability, parameter estimation, and asymptotic properties. Additionally, we move forward to statistical inference by developing a goodness-of-fit test for the SCAR condition and constructing confidence intervals for the proportion of positive instances in the target domain. We leverage an approximated Bayes classifier for classification tasks, demonstrating DETM's robust performance in prediction. Through theoretical insights and practical applications, this study highlights DETM as a comprehensive framework for addressing the challenges of PU data.


Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis

arXiv.org Machine Learning

Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions.


Parameter inference from a non-stationary unknown process

arXiv.org Machine Learning

Non-stationary systems are found throughout the world, from climate patterns under the influence of variation in carbon dioxide concentration, to brain dynamics driven by ascending neuromodulation. Accordingly, there is a need for methods to analyze non-stationary processes, and yet most time-series analysis methods that are used in practice, on important problems across science and industry, make the simplifying assumption of stationarity. One important problem in the analysis of non-stationary systems is the problem class that we refer to as Parameter Inference from a Non-stationary Unknown Process (PINUP). Given an observed time series, this involves inferring the parameters that drive non-stationarity of the time series, without requiring knowledge or inference of a mathematical model of the underlying system. Here we review and unify a diverse literature of algorithms for PINUP. We formulate the problem, and categorize the various algorithmic contributions. This synthesis will allow researchers to identify gaps in the literature and will enable systematic comparisons of different methods. We also demonstrate that the most common systems that existing methods are tested on - notably the non-stationary Lorenz process and logistic map - are surprisingly easy to perform well on using simple statistical features like windowed mean and variance, undermining the practice of using good performance on these systems as evidence of algorithmic performance. We then identify more challenging problems that many existing methods perform poorly on and which can be used to drive methodological advances in the field. Our results unify disjoint scientific contributions to analyzing non-stationary systems and suggest new directions for progress on the PINUP problem and the broader study of non-stationary phenomena.


Meta-Analysis with Untrusted Data

arXiv.org Machine Learning

[See paper for full abstract] Meta-analysis is a crucial tool for answering scientific questions. It is usually conducted on a relatively small amount of ``trusted'' data -- ideally from randomized, controlled trials -- which allow causal effects to be reliably estimated with minimal assumptions. We show how to answer causal questions much more precisely by making two changes. First, we incorporate untrusted data drawn from large observational databases, related scientific literature and practical experience -- without sacrificing rigor or introducing strong assumptions. Second, we train richer models capable of handling heterogeneous trials, addressing a long-standing challenge in meta-analysis. Our approach is based on conformal prediction, which fundamentally produces rigorous prediction intervals, but doesn't handle indirect observations: in meta-analysis, we observe only noisy effects due to the limited number of participants in each trial. To handle noise, we develop a simple, efficient version of fully-conformal kernel ridge regression, based on a novel condition called idiocentricity. We introduce noise-correcting terms in the residuals and analyze their interaction with a ``variance shaving'' technique. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper charts a new course for meta-analysis and evidence-based medicine, where heterogeneity and untrusted data are embraced for more nuanced and precise predictions.


Estimation of Distribution Algorithms with Matrix Transpose in Bayesian Learning

arXiv.org Machine Learning

Estimation of distribution algorithms (EDAs) constitute a new branch of evolutionary optimization algorithms, providing effective and efficient optimization performance in a variety of research areas. Recent studies have proposed new EDAs that employ mutation operators in standard EDAs to increase the population diversity. We present a new mutation operator, a matrix transpose, specifically designed for Bayesian structure learning, and we evaluate its performance in Bayesian structure learning. The results indicate that EDAs with transpose mutation give markedly better performance than conventional EDAs. Introduction Estimation of distribution algorithms (EDAs) constitute a new branch of evolutionary optimization algorithms [1]; their workflow is similar to that of conventional GAs.


Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight

arXiv.org Artificial Intelligence

Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to capture relationships between varied cloud system performance metrics. To be effective, however, systems developers must correctly define the causal graph of their system, which is a time-consuming, brittle, and challenging task that increases in difficulty for large and dynamic systems and requires domain expertise. Alternatively, automated data-driven approaches have limited efficacy for cloud systems due to the inherent rarity of incidents. In this work, we present Atlas, a novel approach to automatically synthesizing causal graphs for cloud systems. Atlas leverages large language models (LLMs) to generate causal graphs using system documentation, telemetry, and deployment feedback. Atlas is complementary to data-driven causal discovery techniques, and we further enhance Atlas with a data-driven validation step. We evaluate Atlas across a range of fault localization scenarios and demonstrate that Atlas is capable of generating causal graphs in a scalable and generalizable manner, with performance that far surpasses that of data-driven algorithms and is commensurate to the ground-truth baseline.


Mitigating Cognitive Biases in Multi-Criteria Crowd Assessment

arXiv.org Artificial Intelligence

Despite recent advances in AI and machine learning technologies, many applications still require human assessment because the characteristics of objects that can explain human subjectivity are sometimes unknown or too vague to be extracted automatically, which is a serious bottleneck when conducting large-scale automated quality assessments. The use of crowdsourcing is a promising way to implement this with the wisdom of the crowd. One challenge in crowdsourced quality assessments is the uncertainty of human judgments. Since workers have different competences, expertise, or motivations, their responses are sometimes too noisy to analyze and extract useful knowledge. A straightforward solution is to assign multiple crowdworkers to each evaluation target and aggregate the redundantly collected evaluations using majority voting. More sophisticated statistical methods, such as Bayesian generative models, have also been explored for better aggregations. Various factors of human error have been introduced into statistical models, such as the ability of workers (Dawid & Skene, 1979), difficulty of the questions (Whitehill et al., 2009; Welinder et al., 2011), and presence of malicious workers (Raykar & Yu, 2011).