AITopics

Deploying artificial intelligence (AI) models on edge devices involves a delicate balance between meeting stringent complexity constraints, such as limited memory and energy resources, and ensuring reliable performance in sensitive decision-making tasks. One way to enhance reliability is through uncertainty quantification via Bayesian inference. This approach, however, typically necessitates maintaining and running multiple models in an ensemble, which may exceed the computational limits of edge devices. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. In an offline phase, predictive probabilities generated by a high-complexity cloud-based model are leveraged to determine a threshold based on the typical divergence between the cloud and edge models. At run time, this threshold is used to construct credal sets -- ranges of predictive probabilities that are guaranteed, with a user-selected confidence level, to include the predictions of the cloud model. The credal sets are obtained through thresholding of a divergence measure in the simplex of predictive probabilities. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance compared to low-complexity Bayesian methods, such as Laplace approximation, making it a practical and efficient solution for edge AI deployments.

artificial intelligence, machine learning, small-scale model, (14 more...)

2501.06066

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Industry:

Education (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

van Nierop, Wessel L., Shlezinger, Nir, van Sloun, Ruud J. G.

Deep Variational Sequential Monte Carlo for High-Dimensional Observations

Sequential Monte Carlo (SMC), or particle filtering, is widely used in nonlinear state-space systems, but its performance often suffers from poorly approximated proposal and state-transition distributions. This work introduces a differentiable particle filter that leverages the unsupervised variational SMC objective to parameterize the proposal and transition distributions with a neural network, designed to learn from high-dimensional observations. Experimental results demonstrate that our approach outperforms established baselines in tracking the challenging Lorenz attractor from high-dimensional and partial observations. Furthermore, an evidence lower bound based evaluation indicates that our method offers a more accurate representation of the posterior distribution.

artificial intelligence, machine learning, particle filter, (15 more...)

2501.05982

Country:

Europe (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Patra, Aswini Kumar, Devi, Soraisham Elizabeth, Gajurel, Tejashwini

MRI Patterns of the Hippocampus and Amygdala for Predicting Stages of Alzheimer's Progression: A Minimal Feature Machine Learning Framework

Alzheimer's disease (AD) progresses through distinct stages, from early mild cognitive impairment (EMCI) to late mild cognitive impairment (LMCI) and eventually to AD. Accurate identification of these stages, especially distinguishing LMCI from EMCI, is crucial for developing pre-dementia treatments but remains challenging due to subtle and overlapping imaging features. This study proposes a minimal-feature machine learning framework that leverages structural MRI data, focusing on the hippocampus and amygdala as regions of interest. The framework addresses the curse of dimensionality through feature selection, utilizes region-specific voxel information, and implements innovative data organization to enhance classification performance by reducing noise. The methodology integrates dimensionality reduction techniques such as PCA and t-SNE with state-of-the-art classifiers, achieving the highest accuracy of 88.46%. This framework demonstrates the potential for efficient and accurate staging of AD progression while providing valuable insights for clinical applications.

accuracy, artificial intelligence, machine learning, (16 more...)

2501.05852

Country: Asia > India (0.47)

Genre:

Research Report > New Finding (0.47)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Kungurtsev, Vyacheslav, Moore, Leonardo Christov, Sir, Gustav, Krutsky, Martin

Cause

Causal Learning has emerged as a major theme of AI in recent years, promising to use special techniques to reveal the true nature of cause and effect in a number of important domains. We consider the Epistemology of learning and recognizing true cause and effect phenomena. Through thought exercises on the customary use of the word ''cause'', especially in scientific domains, we investigate what, in practice, constitutes a valid causal claim. We recognize the word's uses across scientific domains in disparate form but consistent function within the scientific paradigm. We highlight fundamental distinctions of practice that can be performed in the natural and social sciences, highlight the importance of many systems of interest being open and irreducible and identify the important notion of Hermeneutic knowledge for social science inquiry. We posit that the distinct properties require that definitive causal claims can only come through an agglomeration of consistent evidence across multiple domains and levels of abstraction, such as empirical, physiological, biochemical, etc. We present Cognitive Science as an exemplary multi-disciplinary field providing omnipresent opportunity for such a Research Program, and highlight the main general modes of practice of scientific inquiry that can adequately merge, rather than place as incorrigibly conflictual, multi-domain multi-abstraction scientific practices and language games.

logic & formal reasoning, machine learning, natural language, (22 more...)

2501.05844

Country: North America > United States > California (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.93)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(4 more...)

arXiv.org Machine LearningJan-10-2025

Covariate Dependent Mixture of Bayesian Networks

Marchant, Roman, Draca, Dario, Francis, Gilad, Assadzadeh, Sahand, Varidel, Mathew, Iorfino, Frank, Cripps, Sally

Learning the structure of Bayesian networks from data provides insights into underlying processes and the causal relationships that generate the data, but its usefulness depends on the homogeneity of the data population, a condition often violated in real-world applications. In such cases, using a single network structure for inference can be misleading, as it may not capture sub-population differences. To address this, we propose a novel approach of modelling a mixture of Bayesian networks where component probabilities depend on individual characteristics. Our method identifies both network structures and demographic predictors of sub-population membership, aiding personalised interventions. We evaluate our method through simulations and a youth mental health case study, demonstrating its potential to improve tailored interventions in health, education, and social policy.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2501.05745

Country: Oceania > Australia (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Physics-Driven Learning for Inverse Problems in Quantum Chromodynamics

Aarts, Gert, Fukushima, Kenji, Hatsuda, Tetsuo, Ipp, Andreas, Shi, Shuzhe, Wang, Lingxiao, Zhou, Kai

The integration of deep learning techniques and physics-driven designs is reforming the way we address inverse problems, in which accurate physical properties are extracted from complex data sets. This is particularly relevant for quantum chromodynamics (QCD), the theory of strong interactions, with its inherent limitations in observational data and demanding computational approaches. This perspective highlights advances and potential of physics-driven learning methods, focusing on predictions of physical quantities towards QCD physics, and drawing connections to machine learning(ML). It is shown that the fusion of ML and physics can lead to more efficient and reliable problem-solving strategies. Key ideas of ML, methodology of embedding physics priors, and generative models as inverse modelling of physical probability distributions are introduced. Specific applications cover first-principle lattice calculations, and QCD physics of hadrons, neutron stars, and heavy-ion collisions. These examples provide a structured and concise overview of how incorporating prior knowledge such as symmetry, continuity and equations into deep learning designs can address diverse inverse problems across different physical sciences.

artificial intelligence, doi, machine learning, (16 more...)

doi: 10.1038/s42254-024-00798-x

2501.0558

Country:

Europe (1.00)
Asia > China (0.46)
Asia > Japan > Honshū (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Generative Flow Networks: Theory and Applications to Structure Learning

Deleu, Tristan

forward and backward transition probability, intractable normalization constant, terminating state distribution, (17 more...)

Without any assumptions about data generation, multiple causal models may explain our observations equally well. To avoid selecting a single arbitrary model that could result in unsafe decisions if it does not match reality, it is therefore essential to maintain a notion of epistemic uncertainty about our possible candidates. This thesis studies the problem of structure learning from a Bayesian perspective, approximating the posterior distribution over the structure of a causal model, represented as a directed acyclic graph (DAG), given data. It introduces Generative Flow Networks (GFlowNets), a novel class of probabilistic models designed for modeling distributions over discrete and compositional objects such as graphs. They treat generation as a sequential decision making problem, constructing samples of a target distribution defined up to a normalization constant piece by piece. In the first part of this thesis, we present the mathematical foundations of GFlowNets, their connections to existing domains of machine learning and statistics such as variational inference and reinforcement learning, and their extensions beyond discrete problems. In the second part of this thesis, we show how GFlowNets can approximate the posterior distribution over DAG structures of causal Bayesian Networks, along with the parameters of its causal mechanisms, given observational and experimental data.

2501.05498

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
North America > Canada > Ontario > Toronto (0.13)
North America > Canada > Quebec > Montreal (0.04)
(8 more...)

Genre:

Overview (0.92)
Personal > Honors (0.67)
Research Report > New Finding (0.45)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Bringing Order Amidst Chaos: On the Role of Artificial Intelligence in Secure Software Engineering

Esposito, Matteo

baseline versus isolation prediction, learning-based vulnerability detection, process labelling dimension assessed, (16 more...)

Context. Developing secure and reliable software remains a key challenge in software engineering (SE). The ever-evolving technological landscape offers both opportunities and threats, creating a dynamic space where chaos and order compete. Secure software engineering (SSE) must continuously address vulnerabilities that endanger software systems and carry broader socio-economic risks, such as compromising critical national infrastructure and causing significant financial losses. Researchers and practitioners have explored methodologies like Static Application Security Testing Tools (SASTTs) and artificial intelligence (AI) approaches, including machine learning (ML) and large language models (LLMs), to detect and mitigate these vulnerabilities. Each method has unique strengths and limitations. Aim. This thesis seeks to bring order to the chaos in SSE by addressing domain-specific differences that impact AI accuracy. Methodology. The research employs a mix of empirical strategies, such as evaluating effort-aware metrics, analyzing SASTTs, conducting method-level analysis, and leveraging evidence-based techniques like systematic dataset reviews. These approaches help characterize vulnerability prediction datasets. Results. Key findings include limitations in static analysis tools for identifying vulnerabilities, gaps in SASTT coverage of vulnerability types, weak relationships among vulnerability severity scores, improved defect prediction accuracy using just-in-time modeling, and threats posed by untouched methods. Conclusions. This thesis highlights the complexity of SSE and the importance of contextual knowledge in improving AI-driven vulnerability and defect prediction. The comprehensive analysis advances effective prediction models, benefiting both researchers and practitioners.

2501.05165

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Washington > King County > Seattle (0.13)
North America > Canada > Quebec > Montreal (0.04)
(55 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Data Science > Data Mining (1.00)
(6 more...)

Evidential Deep Learning for Uncertainty Quantification and Out-of-Distribution Detection in Jet Identification using Deep Neural Networks

Khot, Ayush, Wang, Xiwei, Roy, Avik, Kindratenko, Volodymyr, Neubauer, Mark S.

Current methods commonly used for uncertainty quantification (UQ) in deep learning (DL) models utilize Bayesian methods which are computationally expensive and time-consuming. In this paper, we provide a detailed study of UQ based on evidential deep learning (EDL) for deep neural network models designed to identify jets in high energy proton-proton collisions at the Large Hadron Collider and explore its utility in anomaly detection. EDL is a DL approach that treats learning as an evidence acquisition process designed to provide confidence (or epistemic uncertainty) about test data. Using publicly available datasets for jet classification benchmarking, we explore hyperparameter optimizations for EDL applied to the challenge of UQ for jet identification. We also investigate how the uncertainty is distributed for each jet class, how this method can be implemented for the detection of anomalies, how the uncertainty compares with Bayesian ensemble methods, and how the uncertainty maps onto latent spaces for the models. Our studies uncover some pitfalls of EDL applied to anomaly detection and a more effective way to quantify uncertainty from EDL as compared with the foundational EDL setup. These studies illustrate a methodological approach to interpreting EDL in jet classification models, providing new insights on how EDL quantifies uncertainty and detects out-of-distribution data which may lead to improved EDL methods for DL models applied to classification tasks.

artificial intelligence, deep learning, machine learning, (14 more...)

2501.05656

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.92)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Regularized Top-$k$: A Bayesian Framework for Gradient Sparsification

Bereyhi, Ali, Liang, Ben, Boudreau, Gary, Afana, Ali

Error accumulation is effective for gradient sparsification in distributed settings: initially-unselected gradient entries are eventually selected as their accumulated error exceeds a certain level. The accumulation essentially behaves as a scaling of the learning rate for the selected entries. Although this property prevents the slow-down of lateral movements in distributed gradient descent, it can deteriorate convergence in some settings. This work proposes a novel sparsification scheme that controls the learning rate scaling of error accumulation. The development of this scheme follows two major steps: first, gradient sparsification is formulated as an inverse probability (inference) problem, and the Bayesian optimal sparsification mask is derived as a maximum-a-posteriori estimator. Using the prior distribution inherited from Top-$k$, we derive a new sparsification algorithm which can be interpreted as a regularized form of Top-$k$. We call this algorithm regularized Top-$k$ (RegTop-$k$). It utilizes past aggregated gradients to evaluate posterior statistics of the next aggregation. It then prioritizes the local accumulated gradient entries based on these posterior statistics. We validate our derivation through numerical experiments. In distributed linear regression, it is observed that while Top-$k$ remains at a fixed distance from the global optimum, RegTop-$k$ converges to the global optimum at significantly higher compression ratios. We further demonstrate the generalization of this observation by employing RegTop-$k$ in distributed training of ResNet-18 on CIFAR-10, where it noticeably outperforms Top-$k$.

bayesian inference, gradient sparsification, machine learning, (2 more...)

2501.05633

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)