AITopics | proximal

Collaborating Authors

proximal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary information for Learning Gaussian Mixtures with Generalised Linear Models Precise Asymptotics in High dimensions

Neural Information Processing SystemsApr-25-2026, 22:47:25 GMT

This appendix presents the proof of the main technical result, Theorem 1. Throughout the whole proof, we assume that the set of conditions from Sec. 2 is verified. A.1 Required background In this Section, we give an overview of the main concepts and tools on approximate message passing algorithms which will be required for the proof. We start with some definitions that commonly appear in the approximate message-passing literature, see e.g. The main regularity class of functions we will use is that of pseudo-Lipschitz functions, which roughly amounts to functions with polynomially bounded first derivatives. We include the required scaling w.r.t. the dimensions in the definition for convenience. Since K will be kept finite, it can be absorbed in any of the constants. For example, the function f: Rn R,x7 1nkxk22 is pseudo-Lipshitz of order 2. Moreau envelopes and Bregman proximal operators -- In our proof, we will also frequently use the notions of Moreau envelopes and proximal operators, see e.g.

artificial intelligence, equation, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Alternating Diffusion for Proximal Sampling with Zeroth Order Queries

Takagi, Hirohane, Nitanda, Atsushi

arXiv.org Machine LearningMar-23-2026

This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the heat flow. The backward step was originally implemented by rejection sampling, whereas we directly simulate the dynamics. Unlike diffusion-based sampling methods that estimate scores via learned models or by invoking auxiliary samplers, our method treats the intermediate particle distribution as a Gaussian mixture, thereby yielding a Monte Carlo score estimator from directly samplable distributions. Theoretically, when the score estimation error is sufficiently controlled, our method inherits the exponential convergence of proximal sampling under isoperimetric conditions on the target distribution. In practice, the algorithm avoids rejection sampling, permits flexible step sizes, and runs with a deterministic runtime budget. Numerical experiments demonstrate that our approach converges rapidly to the target distribution, driven by interactions among multiple particles and by exploiting parallel computation.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Machine Learning

2603.19633

Country:

Asia > Singapore (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Neural Proximal Gradient Descent for Compressive Imaging

Morteza Mardani, Qingyun Sun, David Donoho, Vardan Papyan, Hatef Monajemi, Shreyas Vasanawala, John Pauly

Neural Information Processing SystemsFeb-12-2026, 22:39:12 GMT

We learn a proximal map that works well with real images based on residual networks.

artificial intelligence, iteration, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Health & Medicine (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Neural Proximal Gradient Descent for Compressive Imaging

Morteza Mardani, Qingyun Sun, David Donoho, Vardan Papyan, Hatef Monajemi, Shreyas Vasanawala, John Pauly

Neural Information Processing SystemsNov-20-2025, 16:49:32 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, iteration, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.83)

Add feedback

Closed-Form Last Layer Optimization

Galashov, Alexandre, Da Costa, Nathaël, Xu, Liyuan, Hennig, Philipp, Gretton, Arthur

arXiv.org Machine LearningOct-7-2025

Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the last layer as a function of the backbone parameters, and optimizing solely for these parameters. We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer. We adapt the method for the setting of stochastic gradient descent, by trading off the loss on the current batch against the accumulated information from previous batches. Further, we prove that, in the Neural Tangent Kernel regime, convergence of this method to an optimal solution is guaranteed. Finally, we demonstrate the effectiveness of our approach compared with standard SGD on a squared loss in several supervised tasks -- both regression and classification -- including Fourier Neural Operators and Instrumental Variable Regression.

batch size, closed-form solution, different batch size, (16 more...)

arXiv.org Machine Learning

2510.04606

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

An Accelerated Bregman Algorithm for ReLU-based Symmetric Matrix Decomposition

Wang, Qingsong

arXiv.org Artificial IntelligenceMar-21-2025

Symmetric matrix decomposition is an active research area in machine learning. This paper focuses on exploiting the low-rank structure of non-negative and sparse symmetric matrices via the rectified linear unit (ReLU) activation function. We propose the ReLU-based nonlinear symmetric matrix decomposition (ReLU-NSMD) model, introduce an accelerated alternating partial Bregman (AAPB) method for its solution, and present the algorithm's convergence results. Our algorithm leverages the Bregman proximal gradient framework to overcome the challenge of estimating the global $L$-smooth constant in the classic proximal gradient algorithm. Numerical experiments on synthetic and real datasets validate the effectiveness of our model and algorithm.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.16846

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China > Hunan Province (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

An Accelerated Alternating Partial Bregman Algorithm for ReLU-based Matrix Decomposition

Wang, Qingsong, Qu, Yunfei, Cui, Chunfeng, Han, Deren

arXiv.org Artificial IntelligenceMar-4-2025

Despite the remarkable success of low-rank estimation in data mining, its effectiveness diminishes when applied to data that inherently lacks low-rank structure. To address this limitation, in this paper, we focus on non-negative sparse matrices and aim to investigate the intrinsic low-rank characteristics of the rectified linear unit (ReLU) activation function. We first propose a novel nonlinear matrix decomposition framework incorporating a comprehensive regularization term designed to simultaneously promote useful structures in clustering and compression tasks, such as low-rankness, sparsity, and non-negativity in the resulting factors. This formulation presents significant computational challenges due to its multi-block structure, non-convexity, non-smoothness, and the absence of global gradient Lipschitz continuity. To address these challenges, we develop an accelerated alternating partial Bregman proximal gradient method (AAPB), whose distinctive feature lies in its capability to enable simultaneous updates of multiple variables. Under mild and theoretically justified assumptions, we establish both sublinear and global convergence properties of the proposed algorithm. Through careful selection of kernel generating distances tailored to various regularization terms, we derive corresponding closed-form solutions while maintaining the $L$-smooth adaptable property always holds for any $L\ge 1$. Numerical experiments, on graph regularized clustering and sparse NMF basis compression confirm the effectiveness of our model and algorithm.

algorithm, closed-form solution, optimization problem, (13 more...)

arXiv.org Artificial Intelligence

2503.02386

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

Gu, Hyemin, Katsoulakis, Markos A., Rey-Bellet, Luc, Zhang, Benjamin J.

arXiv.org Machine LearningJul-16-2024

We formulate well-posed continuous-time generative flows for learning distributions that are supported on low-dimensional manifolds through Wasserstein proximal regularizations of $f$-divergences. Wasserstein-1 proximal operators regularize $f$-divergences so that singular distributions can be compared. Meanwhile, Wasserstein-2 proximal operators regularize the paths of the generative flows by adding an optimal transport cost, i.e., a kinetic energy penalization. Via mean-field game theory, we show that the combination of the two proximals is critical for formulating well-posed generative flows. Generative flows can be analyzed through optimality conditions of a mean-field game (MFG), a system of a backward Hamilton-Jacobi (HJ) and a forward continuity partial differential equations (PDEs) whose solution characterizes the optimal generative flow. For learning distributions that are supported on low-dimensional manifolds, the MFG theory shows that the Wasserstein-1 proximal, which addresses the HJ terminal condition, and the Wasserstein-2 proximal, which addresses the HJ dynamics, are both necessary for the corresponding backward-forward PDE system to be well-defined and have a unique solution with provably linear flow trajectories. This implies that the corresponding generative flow is also unique and can therefore be learned in a robust manner even for learning high-dimensional distributions supported on low-dimensional manifolds. The generative flows are learned through adversarial training of continuous-time flows, which bypasses the need for reverse simulation. We demonstrate the efficacy of our approach for generating high-dimensional images without the need to resort to autoencoders or specialized architectures.

generative flow, manifold, proximal, (16 more...)

arXiv.org Machine Learning

2407.11901

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.64)

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning heavy-tailed distributions with Wasserstein-proximal-regularized $\alpha$-divergences

Chen, Ziyu, Gu, Hyemin, Katsoulakis, Markos A., Rey-Bellet, Luc, Zhu, Wei

arXiv.org Machine LearningMay-22-2024

Heavy tails are ubiquitous, emerging in various fields such as extreme events in ocean waves [9], floods [21], social sciences [27, 16], human activities [17, 35], biology [18] and computer sciences [29]. Learning to generate heavy-tailed target distributions has been explored using GANs through tail estimation [10, 15, 1]. While estimating the tail behavior of a heavy-tailed distribution is important, selecting objectives that measure discrepancies between these distributions and facilitate stable learning is equally crucial. In generative modeling, the goal is to generate samples that mimic those from an underlying data distribution, typically by designing algorithms that minimize a probability divergence between the generated and target distributions. Thus, it is crucial to choose a divergence that flexibly and accurately respects the behavior of the data distribution.

divergence, heavy-tailed distribution, regularization, (15 more...)

arXiv.org Machine Learning

2405.13962

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages

Coleman, Jared, Krishnamachari, Bhaskar, Iskarous, Khalil, Rosales, Ruben

arXiv.org Artificial IntelligenceMay-16-2024

We propose a new paradigm for machine translation that is particularly useful for no-resource languages (those without any publicly available bilingual or monolingual corpora): LLM-RBMT (LLM-Assisted Rule Based Machine Translation). Using the LLM-RBMT paradigm, we design the first language education/revitalization-oriented machine translator for Owens Valley Paiute (OVP), a critically endangered Indigenous American language for which there is virtually no publicly available data. We present a detailed evaluation of the translator's components: a rule-based sentence builder, an OVP to English translator, and an English to OVP translator. We also discuss the potential of the paradigm, its limitations, and the many avenues for future research that it opens up.

arXiv.org Artificial Intelligence

2405.08997

Country:

North America > United States > California (0.48)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback