AITopics | arXiv.org Machine Learning

Plotting

arXiv.org Machine Learning

Deep Distributional Learning with Non-crossing Quantile Network

Shen, Guohao, Dai, Runpeng, Wu, Guojun, Luo, Shikai, Shi, Chengchun, Zhu, Hongtu

arXiv.org Machine LearningApr-10-2025

In this paper, we introduce a non-crossing quantile (NQ) network for conditional distribution learning. By leveraging non-negative activation functions, the NQ network ensures that the learned distributions remain monotonic, effectively addressing the issue of quantile crossing. Furthermore, the NQ network-based deep distributional learning framework is highly adaptable, applicable to a wide range of applications, from classical non-parametric quantile regression to more advanced tasks such as causal effect estimation and distributional reinforcement learning (RL). We also develop a comprehensive theoretical foundation for the deep NQ estimator and its application to distributional RL, providing an in-depth analysis that demonstrates its effectiveness across these domains. Our experimental results further highlight the robustness and versatility of the NQ network.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2504.08215

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.45)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Gradient-based Sample Selection for Faster Bayesian Optimization

Wei, Qiyu, Wang, Haowei, Cao, Zirui, Wang, Songhao, Allmendinger, Richard, Álvarez, Mauricio A

arXiv.org Machine LearningApr-10-2025

Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity in computing the Gaussian process (GP) surrogate model. In large-budget scenarios, directly employing the standard GP model faces significant challenges in computational time and resource requirements. In this paper, we propose a novel approach, gradient-based sample selection Bayesian Optimization (GSSBO), to enhance the computational efficiency of BO. The GP model is constructed on a selected set of samples instead of the whole dataset. These samples are selected by leveraging gradient information to maintain diversity and representation. We provide a theoretical analysis of the gradient-based sample selection strategy and obtain explicit sublinear regret bounds for our proposed framework. Extensive experiments on synthetic and real-world tasks demonstrate that our approach significantly reduces the computational cost of GP fitting in BO while maintaining optimization performance comparable to baseline methods.

artificial intelligence, machine learning, optimization, (14 more...)

arXiv.org Machine Learning

2504.07742

Country: Asia (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Unifying and extending Diffusion Models through PDEs for solving Inverse Problems

Dasgupta, Agnimitra, da Cunha, Alexsander Marciano, Fardisi, Ali, Aminy, Mehrnegar, Binder, Brianna, Shaddy, Bryan, Oberai, Assad A

arXiv.org Machine LearningApr-10-2025

Diffusion models have emerged as powerful generative tools with applications in computer vision and scientific machine learning (SciML), where they have been used to solve large-scale probabilistic inverse problems. Traditionally, these models have been derived using principles of variational inference, denoising, statistical signal processing, and stochastic differential equations. In contrast to the conventional presentation, in this study we derive diffusion models using ideas from linear partial differential equations and demonstrate that this approach has several benefits that include a constructive derivation of the forward and reverse processes, a unified derivation of multiple formulations and sampling strategies, and the discovery of a new class of models. We also apply the conditional version of these models to solving canonical conditional density estimation problems and challenging inverse problems. These problems help establish benchmarks for systematically quantifying the performance of different formulations and sampling strategies in this study, and for future studies. Finally, we identify and implement a mechanism through which a single diffusion model can be applied to measurements obtained from multiple measurement operators. Taken together, the contents of this manuscript provide a new understanding and several new directions in the application of diffusion models to solving physics-based inverse problems.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Machine Learning

2504.07437

Country:

North America > United States > California (0.28)
South America > Brazil (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Universal Architectures for the Learning of Polyhedral Norms and Convex Regularizers

Unser, Michael, Ducotterd, Stanislas

arXiv.org Machine LearningApr-10-2025

This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be expressed as a power of a seminorm. We then show that such functionals can be approximated to arbitrary precision with the help of polyhedral norms. In particular, we identify two dual parameterizations of such systems: (i) a synthesis form with an $\ell_1$-penalty that involves some learnable dictionary; and (ii) an analysis form with an $\ell_\infty$-penalty that involves a trainable regularization operator. After having provided geometric insights and proved that the two forms are universal, we propose an implementation that relies on a specific architecture (tight frame with a weighted $\ell_1$ penalty) that is easy to train. We illustrate its use for denoising and the reconstruction of biomedical images. We find that the proposed framework outperforms the sparsity-based methods of compressed sensing, while it offers essentially the same convergence and robustness guarantees.

artificial intelligence, machine learning, reconstruction, (19 more...)

arXiv.org Machine Learning

2503.1919

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Optimizing Power Grid Topologies with Reinforcement Learning: A Survey of Methods and Challenges

van der Sar, Erica, Zocca, Alessandro, Bhulai, Sandjai

arXiv.org Machine LearningApr-10-2025

Power grid operation is becoming increasingly complex due to the rising integration of renewable energy sources and the need for more adaptive control strategies. Reinforcement Learning (RL) has emerged as a promising approach to power network control (PNC), offering the potential to enhance decision-making in dynamic and uncertain environments. The Learning To Run a Power Network (L2RPN) competitions have played a key role in accelerating research by providing standardized benchmarks and problem formulations, leading to rapid advancements in RL-based methods. This survey provides a comprehensive and structured overview of RL applications for power grid topology optimization, categorizing existing techniques, highlighting key design choices, and identifying gaps in current research. Additionally, we present a comparative numerical study evaluating the impact of commonly applied RL-based methods, offering insights into their practical effectiveness. By consolidating existing research and outlining open challenges, this survey aims to provide a foundation for future advancements in RL-driven power grid optimization.

artificial intelligence, machine learning, reinforcement learning, (22 more...)

arXiv.org Machine Learning

2504.0821

Country: Europe > Netherlands (0.28)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Performance of Rank-One Tensor Approximation on Incomplete Data

Lebeau, Hugo

arXiv.org Machine LearningApr-10-2025

We are interested in the estimation of a rank-one tensor signal when only a portion $\varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral analysis gives access to the reconstruction performance. These results shed light on and specify the loss of performance induced by an artificial reduction of the memory cost of a tensor via the deletion of a random part of its entries.

artificial intelligence, machine learning, reconstruction performance, (18 more...)

arXiv.org Machine Learning

2504.07818

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Data Science (0.47)

Add feedback

All Optical Echo State Network Reservoir Computing

Kaushik, Ishwar S, Ehlers, Peter J, Soh, Daniel

arXiv.org Machine LearningApr-10-2025

We propose an innovative design for an all-optical Echo State Network (ESN), an advanced type of reservoir computer known for its universal computational capabilities. Our design enables fully optical implementation of arbitrary ESNs, featuring complete flexibility in optical matrix multiplication and nonlinear activation. Leveraging the nonlinear characteristics of stimulated Brillouin scattering (SBS), the architecture efficiently realizes measurement-free operations crucial for reservoir computing. The approach significantly reduces computational overhead and energy consumption compared to traditional software-based methods. Comprehensive simulations validate the system's memory capacity, nonlinear processing strength, and polynomial algebra capabilities, showcasing performance comparable to software ESNs across key benchmark tasks. Our design establishes a feasible, scalable, and universally applicable framework for optical reservoir computing, suitable for diverse machine learning applications.

artificial intelligence, esn, machine learning, (13 more...)

arXiv.org Machine Learning

2504.08224

Country: North America > United States > Arizona (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DDPM Score Matching and Distribution Learning

Chewi, Sinho, Kalavasis, Alkis, Mehrotra, Anay, Montasser, Omar

arXiv.org Machine LearningApr-7-2025

Score estimation is the backbone of score-based generative models (SGMs), especially denoising diffusion probabilistic models (DDPMs). A key result in this area shows that with accurate score estimates, SGMs can efficiently generate samples from any realistic data distribution (Chen et al., ICLR'23; Lee et al., ALT'23). This distribution learning result, where the learned distribution is implicitly that of the sampler's output, does not explain how score estimation relates to classical tasks of parameter and density estimation. This paper introduces a framework that reduces score estimation to these two tasks, with various implications for statistical and computational learning theory: Parameter Estimation: Koehler et al. (ICLR'23) demonstrate that a score-matching variant is statistically inefficient for the parametric estimation of multimodal densities common in practice. In contrast, we show that under mild conditions, denoising score-matching in DDPMs is asymptotically efficient. Density Estimation: By linking generation to score estimation, we lift existing score estimation guarantees to $(\epsilon,\delta)$-PAC density estimation, i.e., a function approximating the target log-density within $\epsilon$ on all but a $\delta$-fraction of the space. We provide (i) minimax rates for density estimation over H\"older classes and (ii) a quasi-polynomial PAC density estimation algorithm for the classical Gaussian location mixture model, building on and addressing an open problem from Gatmiry et al. (arXiv'24). Lower Bounds for Score Estimation: Our framework offers the first principled method to prove computational lower bounds for score estimation across general distributions. As an application, we establish cryptographic lower bounds for score estimation in general Gaussian mixture models, conceptually recovering Song's (NeurIPS'24) result and advancing his key open problem.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Machine Learning

2504.05161

Country:

North America > United States (0.92)
North America > Canada (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sparse Optimization for Transfer Learning: A L0-Regularized Framework for Multi-Source Domain Adaptation

Gong, Chenqi, Yang, Hu

arXiv.org Machine LearningApr-7-2025

This paper explores transfer learning in heterogeneous multi-source environments with distributional divergence between target and auxiliary domains. To address challenges in statistical bias and computational efficiency, we propose a Sparse Optimization for Transfer Learning (SOTL) framework based on L0-regularization. The method extends the Joint Estimation Transferred from Strata (JETS) paradigm with two key innovations: (1) L0-constrained exact sparsity for parameter space compression and complexity reduction, and (2) refining optimization focus to emphasize target parameters over redundant ones. Simulations show that SOTL significantly improves both estimation accuracy and computational speed, especially under adversarial auxiliary domain conditions. Empirical validation on the Community and Crime benchmarks demonstrates the statistical robustness of the SOTL method in cross-domain transfer.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2504.04812

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(3 more...)

Add feedback

Dimension-Free Convergence of Diffusion Models for Approximate Gaussian Mixtures

Li, Gen, Cai, Changxiao, Wei, Yuting

arXiv.org Machine LearningApr-7-2025

Diffusion models are distinguished by their exceptional generative performance, particularly in producing high-quality samples through iterative denoising. While current theory suggests that the number of denoising steps required for accurate sample generation should scale linearly with data dimension, this does not reflect the practical efficiency of widely used algorithms like Denoising Diffusion Probabilistic Models (DDPMs). This paper investigates the effectiveness of diffusion models in sampling from complex high-dimensional distributions that can be well-approximated by Gaussian Mixture Models (GMMs). For these distributions, our main result shows that DDPM takes at most $\widetilde{O}(1/\varepsilon)$ iterations to attain an $\varepsilon$-accurate distribution in total variation (TV) distance, independent of both the ambient dimension $d$ and the number of components $K$, up to logarithmic factors. Furthermore, this result remains robust to score estimation errors. These findings highlight the remarkable effectiveness of diffusion models in high-dimensional settings given the universal approximation capability of GMMs, and provide theoretical insights into their practical success.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2504.053

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)

Add feedback