AITopics | null

Collaborating Authors

null

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b4d168b48157c623fbd095b4a565b5bb-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 17:55:15 GMT

generalization experiment, training data, validation data, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

4 pages to just discussions and positioning, including nearly a page of related work), we now believe after careful

Neural Information Processing SystemsAug-20-2025, 00:17:23 GMT

What's surprising here is that th e optimal rate behaves as min {E

discrepancy, just discussion and positioning, main contribution, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Suh, Namjoon, Cheng, Guang

arXiv.org Artificial IntelligenceJan-13-2024

In this article, we review the literature on statistical theories of neural networks from three perspectives. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression or classification. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks, in that tools from the approximation theory are adopted. Through these constructions, the width and depth of the networks can be expressed in terms of sample size, data dimension, and function smoothness. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. In the last part, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs). The former two models are known to be the main pillars of the modern generative AI era, while ICL is a strong capability of LLMs in learning from a few examples in the context. Finally, we conclude the paper by suggesting several promising directions for deep learning theory.

artificial intelligence, machine learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

2401.07187

Country:

North America > United States (0.45)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.92)
Energy > Oil & Gas (0.67)
Information Technology > Security & Privacy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Zhao, Heyang, He, Jiafan, Gu, Quanquan

arXiv.org Machine LearningNov-26-2023

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$ when $K$ is sufficiently large and near-optimal policy switching cost of $\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$ being the planning horizon, and $K$ being the number of episodes. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.

inequality hold, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2311.15238

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.63)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Add feedback

Anytime Model Selection in Linear Bandits

Kassraie, Parnian, Emmenegger, Nicolas, Krause, Andreas, Pacchiano, Aldo

arXiv.org Machine LearningNov-12-2023

Model selection in the context of bandit optimization is a challenging problem, as it requires balancing exploration and exploitation not only for action selection, but also for model selection. One natural approach is to rely on online learning algorithms that treat different models as experts. Existing methods, however, scale poorly ($\text{poly}M$) with the number of models $M$ in terms of their regret. Our key insight is that, for model selection in linear bandits, we can emulate full-information feedback to the online learner with a favorable bias-variance trade-off. This allows us to develop ALEXP, which has an exponentially improved ($\log M$) dependence on $M$ for its regret. ALEXP has anytime guarantees on its regret, and neither requires knowledge of the horizon $n$, nor relies on an initial purely exploratory stage. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Machine Learning

2307.12897

Country: Europe (0.46)

Genre:

Research Report (0.63)
Overview (0.46)

Industry:

Education (0.54)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms

Das, Aniket, Nagaraj, Dheeraj, Raj, Anant

arXiv.org Artificial IntelligenceOct-8-2023

We consider stochastic approximations of sampling algorithms, such as Stochastic Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD). We observe that the noise introduced by the stochastic approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness this structure to absorb the stochastic approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms. For SGLD, we prove the first stable convergence rate in KL divergence without requiring uniform warm start, assuming the target density satisfies a Log-Sobolev Inequality. Our result implies superior first-order oracle complexity compared to prior works, under significantly milder assumptions. We also prove the first guarantees for SGLD under even weaker conditions such as H\"{o}lder smoothness and Poincare Inequality, thus bridging the gap between the state-of-the-art guarantees for LMC and SGLD. Our analysis motivates a new algorithm called covariance correction, which corrects for the additional noise introduced by the stochastic approximation by rescaling the strength of the diffusion. Finally, we apply our techniques to analyze RBM, and significantly improve upon the guarantees in prior works (such as removing exponential dependence on horizon), under minimal assumptions.

artificial intelligence, machine learning, null, (16 more...)

arXiv.org Artificial Intelligence

2206.03792

Country:

Asia > India (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.70)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

On Penalty-based Bilevel Gradient Descent Method

Shen, Han, Xiao, Quan, Chen, Tianyi

arXiv.org Machine LearningSep-12-2023

Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization, meta-learning and reinforcement learning. However, bilevel optimization problems are difficult to solve. Recent progress on scalable bilevel algorithms mainly focuses on bilevel optimization problems where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle the bilevel problem through the lens of the penalty method. We show that under certain conditions, the penalty reformulation recovers the solutions of the original bilevel problem. Further, we propose the penalty-based bilevel gradient descent (PBGD) algorithm and establish its finite-time convergence for the constrained bilevel problem without lower-level strong convexity. Experiments showcase the efficiency of the proposed PBGD algorithm.

artificial intelligence, local solution, machine learning, (14 more...)

arXiv.org Machine Learning

2302.05185

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Add feedback

Neural-prior stochastic block model

Duranthon, O., Zdeborová, L.

arXiv.org Machine LearningSep-6-2023

The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works in signal processing using deep neural networks as priors, we propose to model the communities as being determined by the node attributes rather than the opposite. We define the corresponding model; we call it the neural-prior SBM. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. We analyze the performance of the algorithm as well as the Bayes-optimal performance. We identify detectability and exact recovery phase transitions, as well as an algorithmically hard region. The proposed model and algorithm can be used as a benchmark for both theory and algorithms. To illustrate this, we compare the optimal performances to the performance of simple graph neural networks.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1088/2632-2153/ace60f

2303.09995

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm

Deb, Nabarun, Kim, Young-Heon, Pal, Soumik, Schiebinger, Geoffrey

arXiv.org Machine LearningJul-31-2023

We prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on joint densities, converges to an absolutely continuous curve on the $2$-Wasserstein space, as the regularization parameter $\varepsilon$ goes to zero and the number of iterations is scaled as $1/\varepsilon$ (and other technical assumptions). This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow, a concept we introduce here inspired by the well-known Euclidean mirror gradient flows. In the case of Sinkhorn, the gradient is that of the relative entropy functional with respect to one of the marginals and the mirror is half of the squared Wasserstein distance functional from the other marginal. Interestingly, the norm of the velocity field of this flow can be interpreted as the metric derivative with respect to the linearized optimal transport (LOT) distance. An equivalent description of this flow is provided by the parabolic Monge-Amp\`{e}re PDE whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We derive conditions for exponential convergence for this limiting flow. We also construct a Mckean-Vlasov diffusion whose marginal distributions follow the Sinkhorn flow.

artificial intelligence, machine learning, null 1, (17 more...)

arXiv.org Machine Learning

2307.16421

Country:

Europe > United Kingdom (0.14)
North America > Canada > British Columbia (0.14)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Bai, Yu, Chen, Fan, Wang, Huan, Xiong, Caiming, Mei, Song

arXiv.org Artificial IntelligenceJul-6-2023

Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.

artificial intelligence, machine learning, null, (17 more...)

arXiv.org Artificial Intelligence

2306.04637

Country: Europe > Austria > Vienna (0.13)

Genre: Research Report (0.64)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.45)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.45)
Energy > Oil & Gas > Midstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback