AITopics | bl 1

Collaborating Authors

bl 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds Michael Crawshaw

Neural Information Processing SystemsFeb-8-2026, 04:16:08 GMT

Lastly, we provide an experimental evaluation of EPISODE++ when training RNNs on federated text classification tasks, demonstrating that EPISODE++ outperforms strong baselines in FL.

bl 1, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Off-Policy Reinforcement Learning with High Dimensional Reward

Lee, Dong Neuck, Kosorok, Michael R.

arXiv.org Machine LearningAug-14-2024

Conventional off-policy reinforcement learning (RL) focuses on maximizing the expected return of scalar rewards. Distributional RL (DRL), in contrast, studies the distribution of returns with the distributional Bellman operator in a Euclidean space, leading to highly flexible choices for utility. This paper establishes robust theoretical foundations for DRL. We prove the contraction property of the Bellman operator even when the reward space is an infinite-dimensional separable Banach space. Furthermore, we demonstrate that the behavior of high- or infinite-dimensional returns can be effectively approximated using a lower-dimensional Euclidean space. Leveraging these theoretical insights, we propose a novel DRL algorithm that tackles problems which have been previously intractable using conventional reinforcement learning approaches.

algorithm, banach space, bl 1, (15 more...)

arXiv.org Machine Learning

2408.0766

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > North Carolina (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Module checking of pushdown multi-agent systems

Bozzelli, Laura, Murano, Aniello, Peron, Adriano

arXiv.org Artificial IntelligenceJun-13-2024

In this paper, we investigate the module-checking problem of pushdown multi-agent systems (PMS) against ATL and ATL* specifications. We establish that for ATL, module checking of PMS is 2EXPTIME-complete, which is the same complexity as pushdown module-checking for CTL. On the other hand, we show that ATL* module-checking of PMS turns out to be 4EXPTIME-complete, hence exponentially harder than both CTL* pushdown module-checking and ATL* model-checking of PMS. Our result for ATL* provides a rare example of a natural decision problem that is elementary yet but with a complexity that is higher than triply exponential-time.

bl 2, bl 3, configuration code, (14 more...)

arXiv.org Artificial Intelligence

2003.04728

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy (0.04)
Europe > Greece (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Stochastic Two Points Method for Deep Model Zeroth-order Optimization

Pang, Yijiang, Zhou, Jiayu

arXiv.org Artificial IntelligenceFeb-2-2024

Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the model. This paper introduces an efficient Stochastic Two-Point (S2P) approach within the gradient-free regime. We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions. The theoretical properties also shed light on a faster and more stable S2P variant, Accelerated S2P (AS2P), through exploiting our new convergence properties that better represent the dynamics of deep models in training. Our comprehensive empirical results show that AS2P is highly effective in optimizing objectives for large deep models, including language models, and outperforms standard methods across various model types and scales, with 2 $\times$ speed-up in training over most conducted tasks.

inequality, point method, step size, (14 more...)

arXiv.org Artificial Intelligence

2402.01621

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data

Crawshaw, Michael, Bao, Yajie, Liu, Mingrui

arXiv.org Artificial IntelligenceFeb-14-2023

Gradient clipping is an important technique for deep neural networks with exploding gradients, such as recurrent neural networks. Recent studies have shown that the loss functions of these networks do not satisfy the conventional smoothness condition, but instead satisfy a relaxed smoothness condition, i.e., the Lipschitz constant of the gradient scales linearly in terms of the gradient norm. Due to this observation, several gradient clipping algorithms have been developed for nonconvex and relaxed-smooth functions. However, the existing algorithms only apply to the single-machine or multiple-machine setting with homogeneous data across machines. It remains unclear how to design provably efficient gradient clipping algorithms in the general Federated Learning (FL) setting with heterogeneous data and limited communication rounds. In this paper, we design EPISODE, the very first algorithm to solve FL problems with heterogeneous data in the nonconvex and relaxed smoothness setting. The key ingredients of the algorithm are two new techniques called \textit{episodic gradient clipping} and \textit{periodic resampled corrections}. At the beginning of each round, EPISODE resamples stochastic gradients from each client and obtains the global averaged gradient, which is used to (1) determine whether to apply gradient clipping for the entire round and (2) construct local gradient corrections for each client. Notably, our algorithm and analysis provide a unified framework for both homogeneous and heterogeneous data under any noise level of the stochastic gradient, and it achieves state-of-the-art complexity results. In particular, we prove that EPISODE can achieve linear speedup in the number of machines, and it requires significantly fewer communication rounds. Experiments on several heterogeneous datasets show the superior performance of EPISODE over several strong baselines in FL.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.07155

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre:

Workflow (0.92)
Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

Liu, Mingrui, Zhuang, Zhenxun, Lei, Yunwei, Liao, Chunyang

arXiv.org Artificial IntelligenceOct-13-2022

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup. The main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skipping communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works, and design a communication-efficient gradient clipping algorithm. This algorithm can be run on multiple machines, where each machine employs a gradient clipping scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have $O\left(\frac{1}{N\epsilon^4}\right)$ iteration complexity and $O(\frac{1}{\epsilon^3})$ communication complexity for finding an $\epsilon$-stationary point in the homogeneous data setting, where $N$ is the number of machines. This indicates that our algorithm enjoys linear speedup and reduced communication rounds. Our proof relies on novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2205.0504

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Zhang, Bohang, Jin, Jikai, Fang, Cong, Wang, Liwei

arXiv.org Machine LearningOct-28-2020

Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called $(L_0, L_1)$-smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks. However, their iteration complexities on the problem-dependent parameters are rather pessimistic, and theoretical justification of clipping combined with other crucial techniques, e.g. momentum acceleration, are still lacking. In this paper, we bridge the gap by presenting a general framework to study the clipping algorithms, which also takes momentum methods into consideration. We provide convergence analysis of the framework in both deterministic and stochastic setting, and demonstrate the tightness of our results by comparing them with existing lower bounds. Our results imply that the efficiency of clipping methods will not degenerate even in highly non-smooth regions of the landscape. Experiments confirm the superiority of clipping-based methods in deep learning tasks.

artificial intelligence, bl 1, machine learning, (17 more...)

arXiv.org Machine Learning

2010.02519

Country:

North America > United States > Pennsylvania (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Guangdong Province (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Isotonic regression with unknown permutations: Statistics, computation, and adaptation

Pananjady, Ashwin, Samworth, Richard J.

arXiv.org Machine LearningSep-5-2020

Motivated by models for multiway comparison data, we consider the problem of estimating a coordinate-wise isotonic function on the domain $[0, 1]^d$ from noisy observations collected on a uniform lattice, but where the design points have been permuted along each dimension. While the univariate and bivariate versions of this problem have received significant attention, our focus is on the multivariate case $d \geq 3$. We study both the minimax risk of estimation (in empirical $L_2$ loss) and the fundamental limits of adaptation (quantified by the adaptivity index) to a family of piecewise constant functions. We provide a computationally efficient Mirsky partition estimator that is minimax optimal while also achieving the smallest adaptivity index possible for polynomial time procedures. Thus, from a worst-case perspective and in sharp contrast to the bivariate case, the latent permutations in the model do not introduce significant computational difficulties over and above vanilla isotonic regression. On the other hand, the fundamental limits of adaptation are significantly different with and without unknown permutations: Assuming a hardness conjecture from average-case complexity theory, a statistical-computational gap manifests in the former case. In a complementary direction, we show that natural modifications of existing estimators fail to satisfy at least one of the desiderata of optimal worst-case statistical performance, computational efficiency, and fast adaptation. Along the way to showing our results, we improve adaptation results in the special case $d = 2$ and establish some properties of estimators for vanilla isotonic regression, both of which may be of independent interest.

artificial intelligence, estimator, machine learning, (19 more...)

arXiv.org Machine Learning

2009.02609

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Information Management (0.92)
Information Technology > Communications (0.92)

Add feedback

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Bahmani, Zeinab, Bertossi, Leopoldo, Vasiloglou, Nikolaos

arXiv.org Artificial IntelligenceAug-24-2015

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

1508.06013

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
Africa > West Africa (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.61)

Add feedback

Filters

Collaborating Authors

bl 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

14ecbfb2216bab76195b60bfac7efb1f-Paper-Conference.pdf

Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds Michael Crawshaw

Off-Policy Reinforcement Learning with High Dimensional Reward

Module checking of pushdown multi-agent systems

Stochastic Two Points Method for Deep Model Zeroth-order Optimization

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Isotonic regression with unknown permutations: Statistics, computation, and adaptation

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution