AITopics | nullnull null 2

Collaborating Authors

nullnull null 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Federated Learning with Multi-Timescale Gradient Correction

Neural Information Processing SystemsFeb-16-2026, 15:03:47 GMT

We analytically characterize the convergence behavior of MTGC under general non-convex settings, overcoming challenges associated with couplings between correction terms.

artificial intelligence, machine learning, nullnull null 2, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

f4661398cb1a3abd3ffe58600bf11322-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 02:48:15 GMT

inequality, nullg null 2, nullnull null 2, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks

Taheri, Mahsa, Xie, Fang, Lederer, Johannes

arXiv.org Artificial IntelligenceDec-8-2025

Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is unclear whether these theories explain the performances of actual outputs of neural network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for shallow linear neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. We then extend our statistical guarantees to shallow ReLU neural networks, assuming the first layer weight matrices are nearly identical for the stationary network and the target. More generally, despite being limited to shallow neural networks for now, our theories make an important step forward in describing the practical properties of neural networks in mathematical terms.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2205.04491

Country: Asia (0.27)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dimension-free error estimate for diffusion model and optimal scheduling

de Bortoli, Valentin, Elie, Romuald, Kazeykina, Anna, Ren, Zhenjie, Zhang, Jiacheng

arXiv.org Machine LearningDec-2-2025

Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution. A common approach involves simulating the time-reversal of an Ornstein-Uhlenbeck (OU) process initialized at the true data distribution. Since the score function associated with the OU process is typically unknown, it is approximated using a trained neural network. This approximation, along with finite time simulation, time discretization and statistical approximation, introduce several sources of error whose impact on the generated samples must be carefully understood. Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler (KL) divergence. However, both metrics present limitations: KL divergence requires absolute continuity between distributions, while Wasserstein distance, though more general, leads to error bounds that scale poorly with dimension, rendering them impractical in high-dimensional settings. In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions. The bound is expressed in terms of a smooth test functional with bounded first and second derivatives. The key novelty lies in the use of this weaker, functional metric to obtain dimension-independent guarantees, at the cost of higher regularity on the test functions. As an application, we formulate and solve a variational problem to minimize the time-discretization error, leading to the derivation of an optimal time-scheduling strategy for the reverse-time diffusion. Interestingly, this scheduler has appeared previously in the literature in a different context; our analysis provides a new justification for its optimality, now grounded in minimizing the discretization bias in generative sampling.

arxiv, data distribution, diffusion model, (14 more...)

arXiv.org Machine Learning

2512.0182

Country:

Europe > France (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

An Automated Theorem Generator with Theoretical Foundation Based on Rectangular Standard Contradiction

Xu, Yang, Liu, Peiyao, Chen, Shuwei, Liu, Jun

arXiv.org Artificial IntelligenceNov-7-2025

Currently, there is a lack of rigorous theoretical system for systematically generating non-trivial and logically valid theorems. Addressing this critical gap, this paper conducts research to propose a novel automated theorem generation theory and tool. Based on the concept of standard contradiction which possesses unique deductive advantages, this paper defines and proves, for the first time, a new logical structure known as rectangular standard contradiction. Centered on this structure, a complete Automated Theorem Generation (ATG) theory is put forward. Theoretical proofs clarify two core properties of rectangular standard contradiction: first, it is a standard contradiction (necessarily unsatisfiable); second, it exhibits non-redundancy (the remaining clause set becomes satisfiable after removing any clause). Leveraging these properties, this paper proves that partitioning a rectangular standard contradiction into a premise subset $A$ and negation of its complement $H$, a valid theorem $A \vdash \neg H$ can be formed, and all such theorems are logically equivalent. To implement this theory, an efficient template-based ATG algorithm is designed, and a Rectangular Automated Theorem Generator is developed. This research enables machines to transition from "verifiers" to "discoverers", opening up new avenues for fundamental research in the fields of logic and artificial intelligence.

artificial intelligence, logic & formal reasoning, standard contradiction, (14 more...)

arXiv.org Artificial Intelligence

2511.04092

Country: Asia > China > Sichuan Province (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Add feedback

Hierarchical Federated Learning with Multi-Timescale Gradient Correction

Neural Information Processing SystemsOct-11-2025, 00:31:21 GMT

We analytically characterize the convergence behavior of MTGC under general non-convex settings, overcoming challenges associated with couplings between correction terms.

algorithm, gradient, nullnull null 2, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts

Liao, Fangshuo, Kyrillidis, Anastasios

arXiv.org Artificial IntelligenceOct-9-2025

Mixture-of-Experts (MoE) architectures have emerged as a cornerstone of modern AI systems. In particular, MoEs route inputs dynamically to specialized experts whose outputs are aggregated through weighted summation. Despite their widespread application, theoretical understanding of MoE training dynamics remains limited to either separate expert-router optimization or only top-1 routing scenarios with carefully constructed datasets. This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts in a student-teacher framework. We prove that, with moderate over-parameterization, the student network undergoes a feature learning phase, where the router's learning process is ``guided'' by the experts, that recovers the teacher's parameters. Moreover, we show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality. To our knowledge, our analysis is the first to bring novel insights in understanding the optimization landscape of the MoE architecture.

artificial intelligence, machine learning, nullnull null, (15 more...)

arXiv.org Artificial Intelligence

2510.07205

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Computation- and Communication-Efficient Online FL for Resource-Constrained Aerial Vehicles

Pervej, Ferdous, Jin, Richeng, Chowdhury, Md Moin Uddin, Singh, Simran, Güvenç, İsmail, Dai, Huaiyu

arXiv.org Artificial IntelligenceAug-28-2025

Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are inherently resource-constrained, computation- and communication-efficient ML solutions are needed. Therefore, we propose a computation- and communication-efficient online aerial federated learning (2CEOAFL) algorithm to take the benefits of continual sensed data and limited onboard resources of the ACVs. In particular, considering independently owned ACVs act as selfish data collectors, we first model their trajectories according to their respective time-varying data distributions. We then propose a 2CEOAFL algorithm that allows the flying ACVs to (a) prune the received dense ML model to make it shallow, (b) train the pruned model, and (c) probabilistically quantize and offload their trained accumulated gradients to the central server (CS). Our extensive simulation results show that the proposed 2CEOAFL algorithm delivers comparable performances to its non-pruned and nonquantized, hence, computation- and communication-inefficient counterparts.

artificial intelligence, machine learning, nullnull null 2, (15 more...)

arXiv.org Artificial Intelligence

2506.02972

Country: North America > United States (0.67)

Genre: Research Report (0.83)

Industry: Education > Educational Setting > Online (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Underdamped Langevin MCMC with third order convergence

Scott, Maximilian, O'Kane, Dáire, Jelinčič, Andraž, Foster, James

arXiv.org Machine LearningAug-25-2025

In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)\propto e^{-f(x)}$ is strongly log-concave and has varying degrees of smoothness. Precisely, under the assumptions that the gradient and Hessian of $f$ are Lipschitz continuous, our algorithm achieves a 2-Wasserstein error of $\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon)$ and $\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$ steps respectively. Therefore, our algorithm has a similar complexity as other popular Langevin MCMC algorithms under matching assumptions. However, if we additionally assume that the third derivative of $f$ is Lipschitz continuous, then our algorithm achieves a 2-Wasserstein error of $\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$ steps. To the best of our knowledge, this is the first gradient-only method for ULD with third order convergence. To support our theory, we perform Bayesian logistic regression across a range of real-world datasets, where our algorithm achieves competitive performance compared to an existing underdamped Langevin MCMC algorithm and the popular No U-Turn Sampler (NUTS).

artificial intelligence, machine learning, quicsort method, (14 more...)

arXiv.org Machine Learning

2508.16485

Country: