AITopics | rerm

Collaborating Authors

rerm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e64c9ec33f19c7de745bd6b6d1a7a86e-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 16:02:40 GMT

algorithm, sco problem, sgd, (15 more...)

Neural Information Processing Systems

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

abb451a12cf1a9d93292e81f0d4fdd7a-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 18:53:10 GMT

Machine learning systems deployed in the real-world interact with people through their decision making.

artificial intelligence, machine learning, rerm, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

OnConvergenceofFedProx: LocalDissimilarity InvariantBounds, Non-smoothnessandBeyond

Neural Information Processing SystemsFeb-8-2026, 16:06:32 GMT

Several popularly used FL algorithms for this setting includeFedAvg (McMahan et al., 2017), FedProx(Lietal.,2020b), We analyze its convergence behavior, expose problems, andpropose alternativesmore suitable forscaling upandgeneralization.

artificial intelligence, ft 1, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SGD: The Role of Implicit Regularization, Batch-size and Multiple Epochs

Neural Information Processing SystemsAug-18-2025, 07:45:43 GMT

Our main contributions are threefold: 1. We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

abb451a12cf1a9d93292e81f0d4fdd7a-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 17:40:08 GMT

adversary, online, stability parameter, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the tightness of information-theoretic bounds on generalization error of learning algorithms

Wu, Xuetong, Manton, Jonathan H., Aickelin, Uwe, Zhu, Jingge

arXiv.org Artificial IntelligenceMar-26-2023

A recent line of works, initiated by [1] and [2], has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O( λ/n) where λ is some information-theoretic quantities such as the mutual information or conditional mutual information between the data and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(λ/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the critical conditions needed for the fast rate generalization error, which we call the (η, c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a fast convergence rate for specific learning algorithms such as empirical risk minimization and its regularized version. Finally, several analytical examples are given to show the effectiveness of the bounds. The generalization error of a learning algorithm lies in the core analysis of the statistical learning theory, and the estimation of which becomes remarkably crucial.

artificial intelligence, generalization error, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.14658

Country:

Oceania > Australia > Victoria (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

Kale, Satyen, Sekhari, Ayush, Sridharan, Karthik

arXiv.org Artificial IntelligenceJul-11-2021

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of $O(1/\sqrt{n})$, where $n$ is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: (a) We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD. (b) We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$. (c) We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.

algorithm, inequality, sgd, (16 more...)

arXiv.org Artificial Intelligence

2107.05074

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Online learning with dynamics: A minimax perspective

Bhatia, Kush, Sridharan, Karthik

arXiv.org Machine LearningDec-3-2020

We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem. Our main results provide sufficient conditions for online learnability for this setup with corresponding rates. The rates are characterized by 1) a complexity term capturing the expressiveness of the underlying policy class under the dynamics of state change, and 2) a dynamics stability term measuring the deviation of the instantaneous loss from a certain counterfactual loss. Further, we provide matching lower bounds which show that both the complexity terms are indeed necessary. Our approach provides a unifying analysis that recovers regret bounds for several well studied problems including online learning with memory, online control of linear quadratic regulators, online Markov decision processes, and tracking adversarial targets. In addition, we show how our tools help obtain tight regret bounds for a new problems (with non-linear dynamics and non-convex losses) for which such bounds were not known prior to our work.

adversary, online, stability parameter, (16 more...)

arXiv.org Machine Learning

2012.01705

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.49)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)

Add feedback

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Taheri, Hossein, Pedarsani, Ramtin, Thrampoulidis, Christos

arXiv.org Machine LearningJul-5-2020

Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in high-dimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the sub-optimality of popular heuristic choices: for instance, we show that optimally-tuned least-squares is (perhaps surprisingly) approximately optimal for standard logistic data, but the sub-optimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridge-regularization as a function of the over-parameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics.

artificial intelligence, equation, machine learning, (15 more...)

arXiv.org Machine Learning

2006.08917

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback