AITopics

2501.05633

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Zhou, Xiaoshan, Menassa, Carol M., Kamat, Vineet R.

Towards Probabilistic Inference of Human Motor Intentions by Assistive Mobile Robots Controlled via a Brain-Computer Interface

arXiv.org Artificial IntelligenceJan-9-2025

Assistive mobile robots are a transformative technology that helps persons with disabilities regain the ability to move freely. Although autonomous wheelchairs significantly reduce user effort, they still require human input to allow users to maintain control and adapt to changing environments. Brain Computer Interface (BCI) stands out as a highly user-friendly option that does not require physical movement. Current BCI systems can understand whether users want to accelerate or decelerate, but they implement these changes in discrete speed steps rather than allowing for smooth, continuous velocity adjustments. This limitation prevents the systems from mimicking the natural, fluid speed changes seen in human self-paced motion. The authors aim to address this limitation by redesigning the perception-action cycle in a BCI controlled robotic system: improving how the robotic agent interprets the user's motion intentions (world state) and implementing these actions in a way that better reflects natural physical properties of motion, such as inertia and damping. The scope of this paper focuses on the perception aspect. We asked and answered a normative question "what computation should the robotic agent carry out to optimally perceive incomplete or noisy sensory observations?" Empirical EEG data were collected, and probabilistic representation that served as world state distributions were learned and evaluated in a Generative Adversarial Network framework. The ROS framework was established that connected with a Gazebo environment containing a digital twin of an indoor space and a virtual model of a robotic wheelchair. Signal processing and statistical analyses were implemented to identity the most discriminative features in the spatial-spectral-temporal dimensions, which are then used to construct the world model for the robotic agent to interpret user motion intentions as a Bayesian observer.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2501.0561

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.15)

Genre: Research Report > Experimental Study (0.88)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

arXiv.org Artificial IntelligenceJan-9-2025

Monotonic Learning in the PAC Framework: A New Perspective

Li, Ming, Zhang, Chenyi, Li, Qin

Monotone learning refers to learning processes in which expected performance consistently improves as more training data is introduced. Non-monotone behavior of machine learning has been the topic of a series of recent works, with various proposals that ensure monotonicity by applying transformations or wrappers on learning algorithms. In this work, from a different perspective, we tackle the topic of monotone learning within the framework of Probably Approximately Correct (PAC) learning theory. Following the mechanism that estimates sample complexity of a PAC-learnable problem, we derive a performance lower bound for that problem, and prove the monotonicity of that bound as the sample sizes increase. By calculating the lower bound distribution, we are able to prove that given a PAC-learnable problem with a hypothesis space that is either of finite size or of finite VC dimension, any learning algorithm based on Empirical Risk Minimization (ERM) is monotone if training samples are independent and identically distributed (i.i.d.). We further carry out an experiment on two concrete machine learning problems, one of which has a finite hypothesis set, and the other of finite VC dimension, and compared the experimental data for the empirical risk distributions with the estimated theoretical bound. The results of the comparison have confirmed the monotonicity of learning for the two PAC-learnable problems.

artificial intelligence, generalization loss, machine learning, (18 more...)

2501.05493

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

A Plug-and-Play Bregman ADMM Module for Inferring Event Branches in Temporal Point Processes

Wang, Qingmei, Wu, Yuxin, Long, Yujie, Huang, Jing, Ran, Fengyuan, Su, Bing, Xu, Hongteng

An event sequence generated by a temporal point process is often associated with a hidden and structured event branching process that captures the triggering relations between its historical and current events. In this study, we design a new plug-and-play module based on the Bregman ADMM (BADMM) algorithm, which infers event branches associated with event sequences in the maximum likelihood estimation framework of temporal point processes (TPPs). Specifically, we formulate the inference of event branches as an optimization problem for the event transition matrix under sparse and low-rank constraints, which is embedded in existing TPP models or their learning paradigms. We can implement this optimization problem based on subspace clustering and sparse group-lasso, respectively, and solve it using the Bregman ADMM algorithm, whose unrolling leads to the proposed BADMM module. When learning a classic TPP (e.g., Hawkes process) by the expectation-maximization algorithm, the BADMM module helps derive structured responsibility matrices in the E-step. Similarly, the BADMM module helps derive low-rank and sparse attention maps for the neural TPPs with self-attention layers. The structured responsibility matrices and attention maps, which work as learned event transition matrices, indicate event branches, e.g., inferring isolated events and those key events triggering many subsequent events. Experiments on both synthetic and real-world data show that plugging our BADMM module into existing TPP models and learning paradigms can improve model performance and provide us with interpretable structured event branches.

event branch, matrix, transition matrix, (15 more...)

2501.04529

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Media (0.69)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Shi, Yaozhong, Ross, Zachary E., Asimaki, Domniki, Azizzadenesheli, Kamyar

Stochastic Process Learning via Operator Flow Matching

Expanding on neural operators, we propose a novel framework for stochastic process learning across arbitrary domains. In particular, we develop operator flow matching (OFM) for learning stochastic process priors on function spaces. OFM provides the probability density of the values of any collection of points and enables mathematically tractable functional regression at new points with mean and density estimation. Our method outperforms state-of-the-art models in stochastic process learning, functional regression, and prior learning.

ofm, posterior sample, regression, (13 more...)

2501.04126

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Minh, Tâm Le, Arbel, Julyan, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz, Forbes, Florence

Natural Variational Annealing for Multimodal Optimization

arXiv.org Machine LearningJan-8-2025

We introduce a new multimodal optimization approach called Natural Variational Annealing (NVA) that combines the strengths of three foundational concepts to simultaneously search for multiple global and local modes of black-box nonconvex objectives. First, it implements a simultaneous search by using variational posteriors, such as, mixtures of Gaussians. Second, it applies annealing to gradually trade off exploration for exploitation. Finally, it learns the variational search distribution using natural-gradient learning where updates resemble well-known and easy-to-implement algorithms. The three concepts come together in NVA giving rise to new algorithms and also allowing us to incorporate "fitness shaping", a core concept from evolutionary algorithms. We assess the quality of search on simulations and compare them to methods using gradient descent and evolution strategies. We also provide an application to a real-world inverse problem in planetary science.

algorithm, gaussian mixture, gradient, (16 more...)

arXiv.org Machine Learning

2501.04667

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Oceania > Australia (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > Promising Solution (0.45)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

arXiv.org Machine LearningJan-8-2025

A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI

Oko, Kazusato, Lin, Licong, Cai, Yuhang, Mei, Song

Multi-modal generative AI systems, such as those combining vision and language, rely on contrastive pre-training to learn representations across different modalities. While their practical benefits are widely acknowledged, a rigorous theoretical understanding of the contrastive pre-training framework remains limited. This paper develops a theoretical framework to explain the success of contrastive pre-training in downstream tasks, such as zero-shot classification, conditional diffusion models, and vision-language models. We introduce the concept of approximate sufficient statistics, a generalization of the classical sufficient statistics, and show that near-minimizers of the contrastive pre-training loss are approximately sufficient, making them adaptable to diverse downstream tasks. We further propose the Joint Generative Hierarchical Model for the joint distribution of images and text, showing that transformers can efficiently approximate relevant functions within this model via belief propagation. Building on this framework, we derive sample complexity guarantees for multi-modal learning based on contrastive pre-trained representations. Numerical simulations validate these theoretical findings, demonstrating the strong generalization performance of contrastively pre-trained transformers in various multi-modal tasks.

algorithm, representation, txq, (16 more...)

arXiv.org Machine Learning

2501.04641

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Sainsbury-Dale, Matthew, Zammit-Mangion, Andrew, Cressie, Noel, Huser, Raphaël

Neural Parameter Estimation with Incomplete Data

arXiv.org Machine LearningJan-8-2025

Advancements in artificial intelligence (AI) and deep learning have led to neural networks being used to generate lightning-speed answers to complex questions, to paint like Monet, or to write like Proust. Leveraging their computational speed and flexibility, neural networks are also being used to facilitate fast, likelihood-free statistical inference. However, it is not straightforward to use neural networks with data that for various reasons are incomplete, which precludes their use in many applications. A recently proposed approach to remedy this issue inputs an appropriately padded data vector and a vector that encodes the missingness pattern to a neural network. While computationally efficient, this "masking" approach can result in statistically inefficient inferences. Here, we propose an alternative approach that is based on the Monte Carlo expectation-maximization (EM) algorithm. Our EM approach is likelihood-free, substantially faster than the conventional EM algorithm as it does not require numerical optimization at each iteration, and more statistically efficient than the masking approach. This research represents a prototype problem that indicates how improvements could be made in AI by introducing Bayesian statistical thinking. We compare the two approaches to missingness using simulated incomplete data from two models: a spatial Gaussian process model, and a spatial Potts model. The utility of the methodology is shown on Arctic sea-ice data and cryptocurrency data.

algorithm, estimator, inference, (17 more...)

arXiv.org Machine Learning

2501.0433

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(10 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine (1.00)
Banking & Finance > Trading (1.00)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

Enciso, Zephan M., Cheng, Boyang, Pei, Likai, Liu, Jianbo, Davis, Steven, Cao, Ningyuan, Niemier, Michael

Uncertainty estimation is an indispensable capability for AI-enabled, safety-critical applications, e.g. autonomous vehicles or medical diagnosis. Bayesian neural networks (BNNs) use Bayesian statistics to provide both classification predictions and uncertainty estimation, but they suffer from high computational overhead associated with random number generation and repeated sample iterations. Furthermore, BNNs are not immediately amenable to acceleration through compute-in-memory architectures due to the frequent memory writes necessary after each RNG operation. To address these challenges, we present an ASIC that integrates 360 fJ/Sample Gaussian RNG directly into the SRAM memory words. This integration reduces RNG overhead and enables fully-parallel compute-in-memory operations for BNNs. The prototype chip achieves 5.12 GSa/s RNG throughput and 102 GOp/s neural network throughput while occupying 0.45 mm2, bringing AI uncertainty estimation to edge computation.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2501.04577

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Spitzer, Philipp, Martin, Dominik, Eichberger, Laurin, Kühl, Niklas

Towards a Problem-Oriented Domain Adaptation Framework for Machine Learning

Domain adaptation is a sub-field of machine learning that involves transferring knowledge from a source domain to perform the same task in the target domain. It is a typical challenge in machine learning that arises, e.g., when data is obtained from various sources or when using a data basis that changes over time. Recent advances in the field offer promising methods, but it is still challenging for researchers and practitioners to determine if domain adaptation is suitable for a given problem -- and, subsequently, to select the appropriate approach. This article employs design science research to develop a problem-oriented framework for domain adaptation, which is matured in three evaluation episodes. We describe a framework that distinguishes between five domain adaptation scenarios, provides recommendations for addressing each scenario, and offers guidelines for determining if a problem falls into one of these scenarios. During the multiple evaluation episodes, the framework is tested on artificial and real-world datasets and an experimental study involving 100 participants. The evaluation demonstrates that the framework has the explanatory power to capture any domain adaptation problem effectively. In summary, we provide clear guidance for researchers and practitioners who want to employ domain adaptation but lack in-depth knowledge of the possibilities.

artificial intelligence, domain adaptation, machine learning, (18 more...)

2501.04528

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)