AITopics

2503.20544

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceMar-26-2025

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Yen, Thomson, Siah, Andrew Wei Tung, Chen, Haozhe, Peng, Tianyi, Guetta, Daniel, Namkoong, Hongseok

Careful curation of data sources can significantly improve the performance of LLM pre-training, but predominant approaches rely heavily on intuition or costly trial-and-error, making them difficult to generalize across different data domains and downstream tasks. Although scaling laws can provide a principled and general approach for data curation, standard deterministic extrapolation from small-scale experiments to larger scales requires strong assumptions on the reliability of such extrapolation, whose brittleness has been highlighted in prior works. In this paper, we introduce a $\textit{probabilistic extrapolation framework}$ for data mixture optimization that avoids rigid assumptions and explicitly models the uncertainty in performance across decision variables. We formulate data curation as a sequential decision-making problem$\unicode{x2013}$multi-fidelity, multi-scale Bayesian optimization$\unicode{x2013}$where $\{$data mixtures, model scale, training steps$\}$ are adaptively selected to balance training cost and potential information gain. Our framework naturally gives rise to algorithm prototypes that leverage noisy information from inexpensive experiments to systematically inform costly training decisions. To accelerate methodological progress, we build a simulator based on 472 language model pre-training runs with varying data compositions from the SlimPajama dataset. We observe that even simple kernels and acquisition functions can enable principled decisions across training models from 20M to 1B parameters and achieve $\textbf{2.6x}$ and $\textbf{3.3x}$ speedups compared to multi-fidelity BO and random search baselines. Taken together, our framework underscores potential efficiency gains achievable by developing principled and transferable data mixture optimization methods.

data mixture, large language model, machine learning, (20 more...)

2503.21023

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
(2 more...)

Kurscheidt, Leander, Morettin, Paolo, Sebastiani, Roberto, Passerini, Andrea, Vergari, Antonio

A Probabilistic Neuro-symbolic Layer for Algebraic Constraint Satisfaction

arXiv.org Artificial IntelligenceMar-25-2025

In safety-critical applications, guaranteeing the satisfaction of constraints over continuous environments is crucial, e.g., an autonomous agent should never crash into obstacles or go off-road. Neural models struggle in the presence of these constraints, especially when they involve intricate algebraic relationships. To address this, we introduce a differentiable probabilistic layer that guarantees the satisfaction of non-convex algebraic constraints over continuous variables. This probabilistic algebraic layer (PAL) can be seamlessly plugged into any neural architecture and trained via maximum likelihood without requiring approximations. PAL defines a distribution over conjunctions and disjunctions of linear inequalities, parameterized by polynomials. This formulation enables efficient and exact renormalization via symbolic integration, which can be amortized across different data points and easily parallelized on a GPU. We showcase PAL and our integration scheme on a number of benchmarks for algebraic constraint integration and on real-world trajectory data.

artificial intelligence, constraint, machine learning, (19 more...)

2503.19466

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(16 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Mortezanejad, Seyedeh Azadeh Fallah, Wang, Ruochen, Mohammad-Djafari, Ali

Physics-Informed Neural Networks with Unknown Partial Differential Equations: an Application in Multivariate Time Series

arXiv.org Artificial IntelligenceMar-25-2025

A significant advancement in Neural Network (NN) research is the integration of domain-specific knowledge through custom loss functions. This approach addresses a crucial challenge: how can models utilize physics or mathematical principles to enhance predictions when dealing with sparse, noisy, or incomplete data? Physics-Informed Neural Networks (PINNs) put this idea into practice by incorporating physical equations, such as Partial Differential Equations (PDEs), as soft constraints. This guidance helps the networks find solutions that align with established laws. Recently, researchers have expanded this framework to include Bayesian NNs (BNNs), which allow for uncertainty quantification while still adhering to physical principles. But what happens when the governing equations of a system are not known? In this work, we introduce methods to automatically extract PDEs from historical data. We then integrate these learned equations into three different modeling approaches: PINNs, Bayesian-PINNs (B-PINNs), and Bayesian Linear Regression (BLR). To assess these frameworks, we evaluate them on a real-world Multivariate Time Series (MTS) dataset. We compare their effectiveness in forecasting future states under different scenarios: with and without PDE constraints and accuracy considerations. This research aims to bridge the gap between data-driven discovery and physics-guided learning, providing valuable insights for practical applications.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2503.20144

Country:

North America > United States (0.14)
Europe > France (0.04)
Asia > China > Jiangsu Province (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Durand, Jean, Annadani, Yashas, Bauer, Stefan, Parbhoo, Sonali

Causal Bayesian Optimization with Unknown Graphs

arXiv.org Machine LearningMar-25-2025

Causal Bayesian Optimization (CBO) is a methodology designed to optimize an outcome variable by leveraging known causal relationships through targeted interventions. Traditional CBO methods require a fully and accurately specified causal graph, which is a limitation in many real-world scenarios where such graphs are unknown. To address this, we propose a new method for the CBO framework that operates without prior knowledge of the causal graph. Consistent with causal bandit theory, we demonstrate through theoretical analysis and that focusing on the direct causal parents of the target variable is sufficient for optimization, and provide empirical validation in the context of CBO. Furthermore we introduce a new method that learns a Bayesian posterior over the direct parents of the target variable. This allows us to optimize the outcome variable while simultaneously learning the causal structure. Our contributions include a derivation of the closed-form posterior distribution for the linear case. In the nonlinear case where the posterior is not tractable, we present a Gaussian Process (GP) approximation that still enables CBO by inferring the parents of the outcome variable. The proposed method performs competitively with existing benchmarks and scales well to larger graphs, making it a practical tool for real-world applications where causal information is incomplete.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2503.19554

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningMar-25-2025

Interpretable Deep Regression Models with Interval-Censored Failure Time Data

Yuan, Changhui, Zhao, Shishun, Li, Shuwei, Song, Xinyuan, Chen, Zhao

Deep neural networks (DNNs) have become powerful tools for modeling complex data structures through sequentially integrating simple functions in each hidden layer. In survival analysis, recent advances of DNNs primarily focus on enhancing model capabilities, especially in exploring nonlinear covariate effects under right censoring. However, deep learning methods for interval-censored data, where the unobservable failure time is only known to lie in an interval, remain underexplored and limited to specific data type or model. This work proposes a general regression framework for interval-censored data with a broad class of partially linear transformation models, where key covariate effects are modeled parametrically while nonlinear effects of nuisance multi-modal covariates are approximated via DNNs, balancing interpretability and flexibility. We employ sieve maximum likelihood estimation by leveraging monotone splines to approximate the cumulative baseline hazard function. To ensure reliable and tractable estimation, we develop an EM algorithm incorporating stochastic gradient descent. We establish the asymptotic properties of parameter estimators and show that the DNN estimator achieves minimax-optimal convergence. Extensive simulations demonstrate superior estimation and prediction accuracy over state-of-the-art methods. Applying our method to the Alzheimer's Disease Neuroimaging Initiative dataset yields novel insights and improved predictive performance compared to traditional approaches.

artificial intelligence, interpretable deep regression model, machine learning, (3 more...)

2503.19763

Genre: Research Report (0.69)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)

Rey, Samuel, Curbelo, Ernesto, Martino, Luca, Llorente, Fernando, Marques, Antonio G.

Enhancing Graphical Lasso: A Robust Scheme for Non-Stationary Mean Data

arXiv.org Artificial IntelligenceMar-25-2025

This work addresses the problem of graph learning from data following a Gaussian Graphical Model (GGM) with a time-varying mean. Graphical Lasso (GL), the standard method for estimating sparse precision matrices, assumes that the observed data follows a zero-mean Gaussian distribution. However, this assumption is often violated in real-world scenarios where the mean evolves over time due to external influences, trends, or regime shifts. When the mean is not properly accounted for, applying GL directly can lead to estimating a biased precision matrix, hence hindering the graph learning task. To overcome this limitation, we propose Graphical Lasso with Adaptive Targeted Adaptive Importance Sampling (GL-ATAIS), an iterative method that jointly estimates the time-varying mean and the precision matrix. Our approach integrates Bayesian inference with frequentist estimation, leveraging importance sampling to obtain an estimate of the mean while using a regularized maximum likelihood estimator to infer the precision matrix. By iteratively refining both estimates, GL-ATAIS mitigates the bias introduced by time-varying means, leading to more accurate graph recovery. Our numerical evaluation demonstrates the impact of properly accounting for time-dependent means and highlights the advantages of GL-ATAIS over standard GL in recovering the true graph structure.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2503.19651

Country:

Europe > Spain > Galicia > Madrid (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Europe > Italy (0.04)

Genre:

Research Report (0.82)
Workflow (0.69)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Peng, Shaoting, Chen, Haonan, Driggs-Campbell, Katherine

Towards Uncertainty Unification: A Case Study for Preference Learning

arXiv.org Artificial IntelligenceMar-24-2025

Learning human preferences is essential for human-robot interaction, as it enables robots to adapt their behaviors to align with human expectations and goals. However, the inherent uncertainties in both human behavior and robotic systems make preference learning a challenging task. While probabilistic robotics algorithms offer uncertainty quantification, the integration of human preference uncertainty remains underexplored. To bridge this gap, we introduce uncertainty unification and propose a novel framework, uncertainty-unified preference learning (UUPL), which enhances Gaussian Process (GP)-based preference learning by unifying human and robot uncertainties. Specifically, UUPL includes a human preference uncertainty model that improves GP posterior mean estimation, and an uncertainty-weighted Gaussian Mixture Model (GMM) that enhances GP predictive variance accuracy. Additionally, we design a user-specific calibration process to align uncertainty representations across users, ensuring consistency and reliability in the model performance. Comprehensive experiments and user studies demonstrate that UUPL achieves state-of-the-art performance in both prediction accuracy and user rating. An ablation study further validates the effectiveness of human uncertainty model and uncertainty-weighted GMM of UUPL.

artificial intelligence, human uncertainty, machine learning, (15 more...)

2503.19317

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Denmark (0.04)

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.57)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Nadifar, Mahsa, Bekker, Andriette, Arashi, Mohammad, Ramoelo, Abel

Bayesian Semi-Parametric Spatial Dispersed Count Model for Precipitation Analysis

arXiv.org Machine LearningMar-24-2025

The appropriateness of the Poisson model is frequently challenged when examining spatial count data marked by unbalanced distributions, over-dispersion, or under-dispersion. Moreover, traditional parametric models may inadequately capture the relationships among variables when covariates display ambiguous functional forms or when spatial patterns are intricate and indeterminate. To tackle these issues, we propose an innovative Bayesian hierarchical modeling system. This method combines non-parametric techniques with an adapted dispersed count model based on renewal theory, facilitating the effective management of unequal dispersion, non-linear correlations, and complex geographic dependencies in count data. We illustrate the efficacy of our strategy by applying it to lung and bronchus cancer mortality data from Iowa, emphasizing environmental and demographic factors like ozone concentrations, PM2.5, green space, and asthma prevalence. Our analysis demonstrates considerable regional heterogeneity and non-linear relationships, providing important insights into the impact of environmental and health-related factors on cancer death rates. This application highlights the significance of our methodology in public health research, where precise modeling and forecasting are essential for guiding policy and intervention efforts. Additionally, we performed a simulation study to assess the resilience and accuracy of the suggested method, validating its superiority in managing dispersion and capturing intricate spatial patterns relative to conventional methods. The suggested framework presents a flexible and robust instrument for geographical count analysis, offering innovative insights for academics and practitioners in disciplines such as epidemiology, environmental science, and spatial statistics.

artificial intelligence, machine learning, regression model, (20 more...)

2503.19117

Country:

North America > United States > Iowa (0.25)
North America > Canada > Alberta (0.14)
Africa > South Africa > Gauteng > Pretoria (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Public Health (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Machine LearningMar-23-2025

A New Stochastic Approximation Method for Gradient-based Simulated Parameter Estimation

Li, Zehao, Peng, Yijie

This paper tackles the challenge of parameter calibration in stochastic models, particularly in scenarios where the likelihood function is unavailable in an analytical form. We introduce a gradient-based simulated parameter estimation framework, which employs a multi-time scale stochastic approximation algorithm. This approach effectively addresses the ratio bias that arises in both maximum likelihood estimation and posterior density estimation problems. The proposed algorithm enhances estimation accuracy and significantly reduces computational costs, as demonstrated through extensive numerical experiments. Our work extends the GSPE framework to handle complex models such as hidden Markov models and variational inference-based problems, offering a robust solution for parameter estimation in challenging stochastic environments.

artificial intelligence, gradient-based simulated parameter estimation, machine learning, (2 more...)

2503.18319

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)