AITopics | tilde

Collaborating Authors

tilde

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sums Optimization

Neural Information Processing SystemsMar-17-2026, 16:15:52 GMT

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sums problems. First, we show that perhaps surprisingly, the finite sum structure, by itself, is not sufficient for obtaining a complexity bound of $\tilde{\cO}((n+L/\mu)\ln(1/\epsilon))$ for $L$-smooth and $\mu$-strongly convex finite sums - one must also know exactly which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sums algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated' complexity bound of $\tilde{\cO}((n+\sqrt{n L/\mu})\ln(1/\epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and non-strongly convex finite sums, the optimal complexity bound is $\tilde{\cO}(n+L/\epsilon)$, assuming that (on average) the same update rule is used for any iteration, and $\tilde{\cO}(n+\sqrt{nL/\epsilon})$, otherwise.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces

Neural Information Processing SystemsMar-17-2026, 14:35:43 GMT

It has been a long-standing problem to efficiently learn a halfspace using as few labels as possible in the presence of noise. In this work, we propose an efficient Perceptron-based algorithm for actively learning homogeneous halfspaces under the uniform distribution over the unit sphere.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Neural Information Processing SystemsMar-17-2026, 13:31:40 GMT

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter. Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$. Here, regret compares the total reward achieved by the algorithm to the total expected reward of an optimal infinite-horizon undiscounted average reward policy, in time horizon $T$. This result improves over the best previously known upper bound of $\tilde{O}(DS\sqrt{AT})$ achieved by any algorithm in this setting, and matches the dependence on $S$ in the established lower bound of $\Omega(\sqrt{DSAT})$ for this problem. Our techniques involve proving some novel results about the anti-concentration of Dirichlet distribution, which may be of independent interest.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

Neural Information Processing SystemsMar-17-2026, 02:39:00 GMT

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Stochastic Cubic Regularization for Fast Nonconvex Optimization

Neural Information Processing SystemsMar-17-2026, 01:36:54 GMT

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\mathcal{\tilde{O}}(\epsilon^{-4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.09)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Neural Information Processing SystemsMar-16-2026, 23:27:46 GMT

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon^{-2})$, improving the best known rate $O(\varepsilon^{-8/3})$. If $f(x)$ is nonconvex, to find its $\varepsilon$-approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O}(\varepsilon^{-3.5})$,

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Recovering Unbalanced Communities in the Stochastic Block Model with Application to Clustering with a Faulty Oracle

Neural Information Processing SystemsFeb-5-2026, 18:41:11 GMT

The stochastic block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. It has received great attention in the last decade and the balanced case, i.e., assuming all clusters have large size, has been well studied. However, our understanding of SBM with unbalanced communities (arguably, more relevant in practice) is still limited. In this paper, we provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes.We improve upon a result of Ailon, Chen and Xu [ICML 2013; JMLR 2015] by removing the assumption that there is a large interval such that the sizes of clusters do not fall in, and also remove the dependency of the size of the recoverable clusters on the number of underlying clusters. We further complement our theoretical improvements with experimental comparisons.Under the planted clique conjecture, the size of the clusters that can be recovered by our algorithm is nearly optimal (up to poly-logarithmic factors) when the probability parameters are constant. As a byproduct, we obtain an efficient clustering algorithm with sublinear query complexity in a faulty oracle model, which is capable of detecting all clusters larger than $\tilde{\Omega}({\sqrt{n}})$, even in the presence of $\Omega(n)$ small clusters in the graph. In contrast, previous efficient algorithms that use a sublinear number of queries are incapable of recovering any large clusters if there are more than $\tilde{\Omega}(n^{2/5})$ small clusters.

artificial intelligence, data mining, machine learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.59)

Add feedback

STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

Neural Information Processing SystemsFeb-3-2026, 23:41:42 GMT

Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on certain stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(\epsilon^{-3/2})$ samples and $\tilde{\mathcal{O}}(\epsilon^{-1})$ communication rounds to compute an $\epsilon$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Neural Information Processing SystemsJan-31-2026, 10:30:35 GMT

Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to augment the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Entropy testing and its application to testing Bayesian networks

Neural Information Processing SystemsDec-27-2025, 12:46:53 GMT

This paper studies the problem of \emph{entropy identity testing}: given sample access to a distribution $p$ and a fully described distribution $q$ (both are discrete distributions over the support of size $k$), and the promise that either $p = q$ or $ | H(p) - H(q) | \geqslant \varepsilon$, where $H(\cdot)$ denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback