AITopics

--In this work, we investigate causal learning of independent causal mechanisms from a Bayesian perspective. Confirming previous claims from the literature, we show in a didactically accessible manner that unlabeled data (i.e., cause realizations) do not improve the estimation of the parameters defining the mechanism. Furthermore, we observe the importance of choosing an appropriate prior for the cause and mechanism parameters, respectively. Specifically, we show that a factorized prior results in a factorized posterior, which resonates with Janz-ing and Sch olkopf's definition of independent causal mechanisms via the Kolmogorov complexity of the involved distributions and with the concept of parameter independence of Heckerman et al. Impact Statement --Learning the effect from a given cause is an important problem in many engineering disciplines, specifically in the field of surrogate modeling, which aims to reduce the computational cost of numerical simulations. Causal learning, however, cannot make use of unlabeled data - i.e., cause realizations - if the mechanism that produces the effect is independent from the cause. In this work, we recover this well-known fact from a Bayesian perspective.

artificial intelligence, machine learning, realization, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TAI.2024.3522867

2504.01424

Country: Europe > Austria (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.70)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Springer, Sebastian, Scaffidi, Andre, Autenrieth, Maximilian, Contardo, Gabriella, Laio, Alessandro, Trotta, Roberto, Haario, Heikki

Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics

Detecting localized density differences in multivariate data is a crucial task in computational science. Such anomalies can indicate a critical system failure, lead to a groundbreaking scientific discovery, or reveal unexpected changes in data distribution. We introduce EagleEye, an anomaly detection method to compare two multivariate datasets with the aim of identifying local density anomalies, namely over- or under-densities affecting only localised regions of the feature space. Anomalies are detected by modelling, for each point, the ordered sequence of its neighbours' membership label as a coin-flipping process and monitoring deviations from the expected behaviour of such process. A unique advantage of our method is its ability to provide an accurate, entirely unsupervised estimate of the local signal purity. We demonstrate its effectiveness through experiments on both synthetic and real-world datasets. In synthetic data, EagleEye accurately detects anomalies in multiple dimensions even when they affect a tiny fraction of the data. When applied to a challenging resonant anomaly detection benchmark task in simulated Large Hadron Collider data, EagleEye successfully identifies particle decay events present in just 0.3% of the dataset. In global temperature data, EagleEye uncovers previously unidentified, geographically localised changes in temperature fields that occurred in the most recent years. Thanks to its key advantages of conceptual simplicity, computational efficiency, trivial parallelisation, and scalability, EagleEye is widely applicable across many fields.

anomaly, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2503.23927

Country: Europe (1.00)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Lefringhausen, Robert, Hanna, Sami Leon Noel Aziz, August, Elias, Hirche, Sandra

Barrier Certificates for Unknown Systems with Latent States and Polynomial Dynamics using Bayesian Inference

-- Certifying safety in dynamical systems is crucial, but barrier certificates -- widely used to verify that system trajectories remain within a safe region -- typically require explicit system models. When dynamics are unknown, data-driven methods can be used instead, yet obtaining a valid certificate requires rigorous uncertainty quantification. For this purpose, existing methods usually rely on full-state measurements, limiting their applicability. This paper proposes a novel approach for synthesizing barrier certificates for unknown systems with latent states and polynomial dynamics. A Bayesian framework is employed, where a prior in state-space representation is updated using input-output data via a targeted marginal Metropolis-Hastings sampler . The resulting samples are used to construct a candidate barrier certificate through a sum-of-squares program. It is shown that if the candidate satisfies the required conditions on a test set of additional samples, it is also valid for the true, unknown system with high probability. The approach and its probabilistic guarantees are illustrated through a numerical simulation. Ensuring the safety of dynamical systems is a critical concern in applications such as human-robot interaction, autonomous driving, and medical devices, where failures can lead to severe consequences. In such scenarios, safety constraints typically mandate that the system state remains within a predefined allowable region. Barrier certificates [1] provide a systematic framework for verifying safety by establishing mathematical conditions that guarantee that system trajectories remain within these regions.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2504.01807

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

Cohen, Roxane, David, Robin, Yger, Florian, Rossi, Fabrice

Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.

data mining, machine learning, obfuscation, (17 more...)

arXiv.org Machine Learning

2504.01481

Country:

North America > United States (0.46)
Europe > France (0.29)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Maity, Dipankar, Chakrabarti, Kushal

On Model Protection in Federated Learning against Eavesdropping Attacks

-- In this study, we investigate the protection offered by Federated Learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the model. Unlike previous research, which predominantly focuses on safeguarding client data, our work shifts attention to protecting the client model itself. Through a theoretical analysis, we examine how various factors--such as the probability of client selection, the structure of local objective functions, global aggregation at the server, and the eavesdropper's capabilities--impact the overall level of protection. We further validate our findings through numerical experiments, assessing the protection by evaluating the model accuracy achieved by the adversary. Finally, we compare our results with methods based on differential privacy, underscoring their limitations in this specific context. Traditionally, deep learning techniques require centralized data collection and processing that may be infeasible in collaborative scenarios, such as healthcare, credit scoring, vehicle fleet learning, internet-of-things, e-commerce, and natural language processing, due to the high scalability of modern networks, growing sensitive data privacy concerns, and legal regulations such as GDPR [1]-[3]. In these domains, data is often distributed among multiple parties of interest, with no single trusted authority. Federated Learning (FL) has emerged as a distributed collaborative learning paradigm, which allows coordination among multiple clients to perform training without sharing raw data. Instead, they participate in the learning process by training models locally and sharing only the model parameters with a central server. This server aggregates the updates and redistributes the improved model to all participants [4], [5]. Based on the distribution/partition of data among the clients, FL can be classified into horizontal (HFL), vertical (VFL), and transfer (TFL) federated learning [1], [6].

adversary, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2504.02114

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

Sameni, Reza

We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback--Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

2504.02169

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Liu, Xuan, Chang, Xiaobin

LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning

In continual learning (CL), catastrophic forgetting often arises due to feature drift. This challenge is particularly prominent in the exemplar-free continual learning (EFCL) setting, where samples from previous tasks cannot be retained, making it difficult to preserve prior knowledge. To address this issue, some EFCL methods aim to identify feature spaces that minimize the impact on previous tasks while accommodating new ones. However, they rely on static features or outdated statistics stored from old tasks, which prevents them from capturing the dynamic evolution of the feature space in CL, leading to performance degradation over time. In this paper, we introduce the Drift-Resistant Space (DRS), which effectively handles feature drifts without requiring explicit feature modeling or the storage of previous tasks. A novel parameter-efficient fine-tuning approach called Low-Rank Adaptation Subtraction (LoRA-) is proposed to develop the DRS. This method subtracts the LoRA weights of old tasks from the initial pre-trained weight before processing new task data to establish the DRS for model training. Therefore, LoRA- enhances stability, improves efficiency, and simplifies implementation. Furthermore, stabilizing feature drifts allows for better plasticity by learning with a triplet loss. Our method consistently achieves state-of-the-art results, especially for long task sequences, across multiple datasets.

artificial intelligence, learning, machine learning, (12 more...)

arXiv.org Machine Learning

2503.18985

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Meunier, Matthieu, Reisinger, Christoph, Zhang, Yufei

Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo

Designing efficient learning algorithms with complexity guarantees for Markov decision processes (MDPs) with large or continuous state and action spaces remains a fundamental challenge. We address this challenge for entropy-regularized MDPs with Polish state and action spaces, assuming access to a generative model of the environment. We propose a novel family of multilevel Monte Carlo (MLMC) algorithms that integrate fixed-point iteration with MLMC techniques and a generic stochastic approximation of the Bellman operator. We quantify the precise impact of the chosen approximate Bellman operator on the accuracy of the resulting MLMC estimator. Leveraging this error analysis, we show that using a biased plain MC estimate for the Bellman operator results in quasi-polynomial sample complexity, whereas an unbiased randomized multilevel approximation of the Bellman operator achieves polynomial sample complexity in expectation. Notably, these complexity bounds are independent of the dimensions or cardinalities of the state and action spaces, distinguishing our approach from existing algorithms whose complexities scale with the sizes of these spaces. We validate these theoretical performance guarantees through numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2503.21224

Country: Europe > United Kingdom (0.45)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Ricci, Fabiola, Bardone, Lorenzo, Goldt, Sebastian

Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks from natural images resemble those learnt by independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we leverage this connection to investigate the interplay between data structure and optimisation in feature learning for the most popular ICA algorithm, FastICA, and stochastic gradient descent (SGD), which is used to train deep networks. We rigorously establish that FastICA requires at least $n\gtrsim d^4$ samples to recover a single non-Gaussian direction from $d$-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity $n \gtrsim d^2$ can be reached by smoothing the loss, albeit in a data-dependent way. We finally demonstrate the existence of a search phase for FastICA on ImageNet, and discuss how the strong non-Gaussianity of said images compensates for the poor sample complexity of FastICA.

artificial intelligence, fastica, machine learning, (16 more...)

arXiv.org Machine Learning

2503.23896

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Spatio-temporal Prediction of Fine-Grained Origin-Destination Matrices with Applications in Ridesharing

Yang, Run, Dai, Runpeng, Gao, Siran, Tang, Xiaocheng, Zhou, Fan, Zhu, Hongtu

Accurate spatial-temporal prediction of network-based travelers' requests is crucial for the effective policy design of ridesharing platforms. Having knowledge of the total demand between various locations in the upcoming time slots enables platforms to proactively prepare adequate supplies, thereby increasing the likelihood of fulfilling travelers' requests and redistributing idle drivers to areas with high potential demand to optimize the global supply-demand equilibrium. This paper delves into the prediction of Origin-Destination (OD) demands at a fine-grained spatial level, especially when confronted with an expansive set of local regions. While this task holds immense practical value, it remains relatively unexplored within the research community. To fill this gap, we introduce a novel prediction model called OD-CED, which comprises an unsupervised space coarsening technique to alleviate data sparsity and an encoder-decoder architecture to capture both semantic and geographic dependencies. Through practical experimentation, OD-CED has demonstrated remarkable results. It achieved an impressive reduction of up to 45% reduction in root-mean-square error and 60% in weighted mean absolute percentage error over traditional statistical methods when dealing with OD matrices exhibiting a sparsity exceeding 90%.

artificial intelligence, data mining, machine learning, (22 more...)

arXiv.org Machine Learning

2503.24237

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.48)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(3 more...)