AITopics | near-optimal

Collaborating Authors

near-optimal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-optimal learning with average Hölder smoothness

Neural Information Processing SystemsDec-24-2025, 21:46:39 GMT

We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to Hölder smoothness. This measure of the effective smoothness of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic worst-case Hölder constant.We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average Hölder smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness.Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate.From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates.Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry.Overall, our results show that the classic worst-case notion of Hölder smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.

lder smoothness, near-optimal, smoothness, (4 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Neural Information Processing SystemsDec-24-2025, 20:25:55 GMT

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the Report an Issue link to request a name change.

artificial intelligence, minimax optimization, varepsilon, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Neural Information Processing SystemsMay-26-2025, 20:37:39 GMT

This paper considers the distributed convex-concave minimax optimization under the second-order similarity.We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction.We prove SVOGS can achieve the \varepsilon -duality gap within communication rounds of {\mathcal O}(\delta D 2/\varepsilon), communication complexity of {\mathcal O}(n \sqrt{n}\delta D 2/\varepsilon),and local gradient calls of \tilde{\mathcal O}(n (\sqrt{n}\delta L)D 2/\varepsilon\log(1/\varepsilon)), where n is the number of nodes, \delta is the degree of the second-order similarity, L is the smoothness parameter and D is the diameter of the constraint set.We can verify that all of above complexity (nearly) matches the corresponding lower bounds.For the specific \mu -strongly-convex- \mu -strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of \mathcal O(\delta/\mu\log(1/\varepsilon)), {\mathcal O}((n \sqrt{n}\delta/\mu)\log(1/\varepsilon)), and \tilde{\mathcal O}(n (\sqrt{n}\delta L)/\mu)\log(1/\varepsilon)) respectively, which are also nearly tight.Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

artificial intelligence, second-order similarity, varepsilon, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.64)

Add feedback

Near-optimal learning with average Hölder smoothness

Neural Information Processing SystemsJan-13-2025, 14:13:02 GMT

We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to Hölder smoothness. This measure of the "effective smoothness" of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic "worst-case" Hölder constant.We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average Hölder smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness.Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate.From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates.Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry.Overall, our results show that the classic worst-case notion of Hölder smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.

lder smoothness, near-optimal, smoothness, (1 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.65)

Add feedback

Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration

Kleinberg, Robert, Leyton-Brown, Kevin, Lucier, Brendan, Graham, Devon

Neural Information Processing SystemsMar-19-2020, 00:15:49 GMT

Algorithm configuration methods optimize the performance of a parameterized heuristic algorithm on a given distribution of problem instances. Recent work introduced an algorithm configuration procedure ( Structured Procrastination'') that provably achieves near optimal performance with high probability and with nearly minimal runtime in the worst case. It also offers an anytime property: it keeps tightening its optimality guarantees the longer it is run. Unfortunately, Structured Procrastination is not adaptive to characteristics of the parameterized algorithm: it treats every input like the worst case. Follow-up work ( LeapsAndBounds'') achieves adaptivity but trades away the anytime property.

adaptive algorithm configuration, anytime property, structured procrastination, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback