AITopics | Hinder, Oliver

Collaborating Authors

Hinder, Oliver

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Price of Adaptivity in Stochastic Convex Optimization

Carmon, Yair, Hinder, Oliver

arXiv.org Machine LearningMar-13-2024

Stochastic optimization methods in modern machine learning often require carefully tuning sensitive algorithmic parameters at significant cost in time, computation, and expertise. This reality has led to sustained interest in developing adaptive (or parameter-free) algorithms that require minimal or no tuning [6, 8, 12, 21, 22, 24, 26, 29, 35-39, 43, 45-47]. However, a basic theoretical question remains open: Are existing methods "as adaptive as possible," or is there substantial room for improvement? Put differently, is there a fundamental price to be paid (in terms of rate of convergence) for not knowing the problem parameters in advance? To address these questions, we must formally define what it means for an adaptive algorithm to be efficient. The standard notion of minimax optimality [1] does not suffice, since it does not constrain the algorithm to be agnostic to the parameters defining the function class; stochastic gradient descent (SGD) is in many cases minimax optimal, but its step size requires problemspecific tuning. To motivate our solution, we observe that guarantees for adaptive algorithms admit the following interpretation: assuming that the input problem satisfies certain assumptions (e.g., Lipschitz continuity, smoothness, etc.) the adaptive algorithm attains performance close to the best performance that is possible to guarantee given only these assumptions.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

2402.10898

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Datasets and Benchmarks for Nanophotonic Structure and Parametric Design Simulations

Kim, Jungtaek, Li, Mingxuan, Hinder, Oliver, Leu, Paul W.

arXiv.org Machine LearningOct-29-2023

Nanophotonic structures have versatile applications including solar cells, anti-reflective coatings, electromagnetic interference shielding, optical filters, and light emitting diodes. To design and understand these nanophotonic structures, electrodynamic simulations are essential. These simulations enable us to model electromagnetic fields over time and calculate optical properties. In this work, we introduce frameworks and benchmarks to evaluate nanophotonic structures in the context of parametric structure design problems. The benchmarks are instrumental in assessing the performance of optimization algorithms and identifying an optimal structure based on target optical properties. Moreover, we explore the impact of varying grid sizes in electrodynamic simulations, shedding light on how evaluation fidelity can be strategically leveraged in enhancing structure designs.

artificial intelligence, fidelity, machine learning, (18 more...)

arXiv.org Machine Learning

2310.19053

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Renewable > Solar (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

Ivgi, Maor, Hinder, Oliver, Carmon, Yair

arXiv.org Artificial IntelligenceJul-16-2023

We propose a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from the initial point and norms of gradients) and have no ``learning rate'' parameter. Theoretically, we show that a slight variation of the DoG formula enjoys strong parameter-free convergence guarantees for stochastic convex optimization assuming only \emph{locally bounded} stochastic gradients. Empirically, we consider a broad range of vision and language transfer learning tasks, and show that DoG's performance is close to that of SGD with tuned learning rate. We also propose a per-layer variant of DoG that generally outperforms tuned SGD, approaching the performance of tuned Adam. A PyTorch implementation is available at https://github.com/formll/dog

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2302.12022

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Making SGD Parameter-Free

Carmon, Yair, Hinder, Oliver

arXiv.org Artificial IntelligenceApr-13-2023

Stochastic convex optimization (SCO) is a cornerstone of both the theory and practice of machine learning. Consequently, there is intense interest in developing SCO algorithms that require little to no prior knowledge of the problem parameters, and hence little to no tuning [27, 23, 20, 2, 22, 39]. In this work we consider the fundamental problem of non-smooth SCO (in a potentially unbounded domain) and seek methods that are adaptive to a key problem parameter: the initial distance to optimality. Current approaches for tackling this problem focus on the more general online learning problem of parameter-free regret minimization [8, 10, 11, 12, 21, 24, 25, 30, 32, 37], where the goal is to to obtain regret guarantees that are valid for comparators with arbitrary norms. Research on parameter-free regret minimization has lead to practical algorithms for stochastic optimization [9, 27, 32], methods that are able to adapt to many problem parameters simultaneously [37] and methods that can work with any norm [12].

algorithm 1, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2205.0216

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Optimal Diagonal Preconditioning

Qu, Zhaonan, Gao, Wenzhi, Hinder, Oliver, Ye, Yinyu, Zhou, Zhengyuan

arXiv.org Artificial IntelligenceNov-4-2022

Preconditioning has long been a staple technique in optimization, often applied to reduce the condition number of a matrix and speed up the convergence of algorithms. Although there are many popular preconditioning techniques in practice, most lack guarantees on reductions in condition number. Moreover, the degree to which we can improve over existing heuristic preconditioners remains an important practical question. In this paper, we study the problem of optimal diagonal preconditioning that achieves maximal reduction in the condition number of any full-rank matrix by scaling its rows and/or columns. We first reformulate the problem as a quasi-convex problem and provide a simple algorithm based on bisection. Then we develop an interior point algorithm with $O(\log(1/\epsilon))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems. We then develop efficient customized solvers and study the empirical performance of our optimal diagonal preconditioning procedures through extensive experiments on large matrices. Our findings suggest that optimal diagonal preconditioners can significantly improve upon existing heuristics-based diagonal preconditioners at reducing condition numbers and speeding up iterative methods. Moreover, our implementation of customized solvers, combined with a random row/column sampling step, can find near-optimal diagonal preconditioners for matrices up to size 200,000 in reasonable time, demonstrating their practical appeal.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2209.00809

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Software (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Mathematics of Computing (0.67)

Add feedback

An efficient nonconvex reformulation of stagewise convex optimization problems

Bunel, Rudy, Hinder, Oliver, Bhojanapalli, Srinadh, Krishnamurthy, null, Dvijotham, null

arXiv.org Artificial IntelligenceOct-27-2020

Convex optimization problems with staged structure appear in several contexts, including optimal control, verification of deep neural networks, and isotonic regression. Off-the-shelf solvers can solve these problems but may scale poorly. We develop a nonconvex reformulation designed to exploit this staged structure. Our reformulation has only simple bound constraints, enabling solution via projected gradient methods and their accelerated variants. The method automatically generates a sequence of primal and dual feasible solutions to the original convex problem, making optimality certification easy. We establish theoretical properties of the nonconvex formulation, showing that it is (almost) free of spurious local minima and has the same global optimum as the convex problem. We modify PGD to avoid spurious local minimizers so it always converges to the global minimizer. For neural network verification, our approach obtains small duality gaps in only a few gradient steps. Consequently, it can quickly solve large-scale verification problems faster than both off-the-shelf and specialized solvers.

deep learning, minimizer, neural network, (19 more...)

arXiv.org Artificial Intelligence

2010.14322

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

Hinder, Oliver, Sidford, Aaron, Sohoni, Nimit Sharad

arXiv.org Machine LearningJun-27-2019

In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are strictly unimodal on all lines through a minimizer. This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant $\gamma \in (0,1]$, where $\gamma = 1$ encompasses the classes of smooth convex and star-convex functions, and smaller values of $\gamma$ indicate that the function can be "more nonconvex." We develop a variant of accelerated gradient descent that computes an $\epsilon$-approximate minimizer of a smooth $\gamma$-quasar-convex function with at most $O(\gamma^{-1} \epsilon^{-1/2} \log(\gamma^{-1} \epsilon^{-1}))$ total function and gradient evaluations. We also derive a lower bound of $\Omega(\gamma^{-1} \epsilon^{-1/2})$ on the number of gradient evaluations required by any deterministic first-order method in the worst case, showing that, up to a logarithmic factor, no deterministic first-order algorithm can improve upon ours.

algorithm, neural network, optimization problem, (17 more...)

arXiv.org Machine Learning

1906.11985

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback