AITopics | tnull 2

Collaborating Authors

tnull 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d4ca950da1d6fd954520c45ab19fef1c-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 13:58:33 GMT

hr 2, theorem 2, tnull 2, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Genre: Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Neural Information Processing SystemsAug-22-2025, 02:08:27 GMT

We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.82)

Add feedback

Making Non-Stochastic Control (Almost) as Easy as Stochastic

Neural Information Processing SystemsAug-16-2025, 15:14:05 GMT

Recent literature has made much progress in understanding online LQR: a modern learning-theoretic take on the classical control problem where a learner attempts to optimally control an unknown linear dynamical system with fully observed state, perturbed by i.i.d.

hr 2, theorem 2, tnull 2, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Genre: Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems

Xie, Jiahao, Zhang, Chao, Shen, Zebang, Liu, Weijie, Qian, Hui

arXiv.org Artificial IntelligenceJun-28-2023

Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training. Recently, algorithms for minimax problems in the Federated Learning (FL) paradigm have received considerable interest. Existing federated algorithms for general minimax problems require the full aggregation (i.e., aggregation of local model information from all clients) in each training round. Thus, they are inapplicable to an important setting of FL known as the cross-device setting, which involves numerous unreliable mobile/IoT devices. In this paper, we develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting. CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round. With this mechanism, CDMA is resilient to the low client availability. In addition, CDMA is incorporated with a lightweight global correction in the local update steps of clients, which mitigates the impact of slow network connections. We establish theoretical guarantees of CDMA under different choices of hyperparameters and conduct experiments on AUC maximization, robust adversarial network training, and GAN training tasks. Theoretical and experimental results demonstrate the efficiency of CDMA.

artificial intelligence, machine learning, tnull 2, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v37i9.26246

2105.14216

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Accelerated Single-Call Methods for Constrained Min-Max Optimization

Cai, Yang, Zheng, Weiqiang

arXiv.org Artificial IntelligenceMay-14-2023

We study first-order methods for constrained min-max optimization. Existing methods either require two gradient calls or two projections in each iteration, which may be costly in some applications. In this paper, we first show that a variant of the Optimistic Gradient (OG) method, a single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ best-iterate convergence rate for inclusion problems with operators that satisfy the weak Minty variation inequality (MVI). Our second result is the first single-call single-projection algorithm -- the Accelerated Reflected Gradient (ARG) method that achieves the optimal $O(\frac{1}{T})$ last-iterate convergence rate for inclusion problems that satisfy negative comonotonicity. Both the weak MVI and negative comonotonicity are well-studied assumptions and capture a rich set of non-convex non-concave min-max optimization problems. Finally, we show that the Reflected Gradient (RG) method, another single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate for constrained convex-concave min-max optimization, answering an open problem of [Heish et al, 2019]. Our convergence rates hold for standard measures such as the tangent residual and the natural residual.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2210.03096

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Why (and When) does Local SGD Generalize Better than SGD?

Gu, Xinran, Lyu, Kaifeng, Huang, Longbo, Arora, Sanjeev

arXiv.org Artificial IntelligenceMar-8-2023

Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically. It has been recently observed that Local SGD can not only achieve the design goal of reducing the communication overhead but also lead to higher test accuracy than the corresponding SGD baseline (Lin et al., 2020b), though the training regimes for this to happen are still in debate (Ortiz et al., 2021). This paper aims to understand why (and when) Local SGD generalizes better based on Stochastic Differential Equation (SDE) approximation. The main contributions of this paper include (i) the derivation of an SDE that captures the long-term behavior of Local SGD in the small learning rate regime, showing how noise drives the iterate to drift and diffuse after it has reached close to the manifold of local minima, (ii) a comparison between the SDEs of Local SGD and SGD, showing that Local SGD induces a stronger drift term that can result in a stronger effect of regularization, e.g., a faster reduction of sharpness, and (iii) empirical evidence validating that having a small learning rate and long enough training time enables the generalization improvement over SGD but removing either of the two conditions leads to no improvement.

artificial intelligence, local sgd, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.01215

Country:

Europe > Germany > Saxony > Dresden (0.04)
Asia > Singapore (0.04)
Asia > China (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Zhu, Xingyu, Wang, Zixuan, Wang, Xiang, Zhou, Mo, Ge, Rong

arXiv.org Artificial IntelligenceFeb-21-2023

Recently, researchers observed that gradient descent for deep neural networks operates in an ``edge-of-stability'' (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold $2/\eta$ (where $\eta$ is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below $2/\eta$. While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and $2/\eta$. In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the final converging point has sharpness close to $2/\eta$. Globally we observe that the training dynamics for our example has an interesting bifurcating behavior, which was also observed in the training of neural nets.

artificial intelligence, initialization, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.03294

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning from many trajectories

Tu, Stephen, Frostig, Roy, Soltanolkotabi, Mahdi

arXiv.org Artificial IntelligenceJan-31-2023

We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Our conditions for efficient learning generalize the former setting--trajectories must be non-degenerate in ways that extend standard requirements for independent examples. Notably, we do not require that trajectories be ergodic, long, nor strictly stable. For linear least-squares regression, given $n$-dimensional examples produced by $m$ trajectories, each of length $T$, we observe a notable change in statistical efficiency as the number of trajectories increases from a few (namely $m \lesssim n$) to many (namely $m \gtrsim n$). Specifically, we establish that the worst-case error rate of this problem is $\Theta(n / m T)$ whenever $m \gtrsim n$. Meanwhile, when $m \lesssim n$, we establish a (sharp) lower bound of $\Omega(n^2 / m^2 T)$ on the worst-case error rate, realized by a simple, marginally unstable linear dynamical system. A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent, drawn from their marginals. As a corollary of our analysis, we also improve guarantees for the linear system identification problem.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2203.17193

Country:

North America > United States > California (0.13)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.74)

Add feedback

Zeroth-Order Alternating Randomized Gradient Projection Algorithms for General Nonconvex-Concave Minimax Problems

Xu, Zi, Shen, Jingjing, Wang, Ziqi, Dai, Yuhong

arXiv.org Machine LearningAug-4-2021

In this paper, we study zeroth-order algorithms for nonconvex-concave minimax problems, which have attracted widely attention in machine learning, signal processing and many other fields in recent years. We propose a zeroth-order alternating randomized gradient projection (ZO-AGP) algorithm for smooth nonconvex-concave minimax problems, and its iteration complexity to obtain an $\varepsilon$-stationary point is bounded by $\mathcal{O}(\varepsilon^{-4})$, and the number of function value estimation is bounded by $\mathcal{O}(d_{x}\varepsilon^{-4}+d_{y}\varepsilon^{-6})$ per iteration. Moreover, we propose a zeroth-order block alternating randomized proximal gradient algorithm (ZO-BAPG) for solving block-wise nonsmooth nonconvex-concave minimax optimization problems, and the iteration complexity to obtain an $\varepsilon$-stationary point is bounded by $\mathcal{O}(\varepsilon^{-4})$ and the number of function value estimation per iteration is bounded by $\mathcal{O}(K d_{x}\varepsilon^{-4}+d_{y}\varepsilon^{-6})$. To the best of our knowledge, this is the first time that zeroth-order algorithms with iteration complexity gurantee are developed for solving both general smooth and block-wise nonsmooth nonconvex-concave minimax problems. Numerical results on data poisoning attack problem validate the efficiency of the proposed algorithms.

algorithm, artificial intelligence, optimization problem, (14 more...)

arXiv.org Machine Learning

2108.00473

Country:

Asia > China (0.28)
North America > Canada (0.14)
North America > United States > Texas (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

How Fine-Tuning Allows for Effective Meta-Learning

Chua, Kurtland, Lei, Qi, Lee, Jason D.

arXiv.org Machine LearningMay-5-2021

Representation learning has been widely studied in the context of meta-learning, enabling rapid learning of new tasks through shared representations. Recent works such as MAML have explored using fine-tuning-based metrics, which measure the ease by which fine-tuning can achieve good performance, as proxies for obtaining representations. We present a theoretical framework for analyzing representations derived from a MAML-like algorithm, assuming the available tasks use approximately the same underlying representation. We then provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure. The upper bound applies to general function classes, which we demonstrate by instantiating the guarantees of our framework in the logistic regression and neural network settings. In contrast, we establish the existence of settings where any algorithm, using a representation trained with no consideration for task-specific fine-tuning, performs as well as a learner with no access to source tasks in the worst case. This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.

log 1, probability, representation, (16 more...)

arXiv.org Machine Learning

2105.02221

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback