newton
One-step differentiation of iterative algorithms
For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as efficient as implicit differentiation for fast algorithms (e.g., superlinear
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > California > Alameda County > Berkeley (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (4 more...)
- North America > United States > New York (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Indiana (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
Distributed Newton Can Communicate Less and Resist Byzantine Workers
We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines. We propose an iterative approximate Newton-type algorithm, where the worker machines communicate \emph{only once} per iteration with the central machine. This is in sharp contrast with the state-of-the-art distributed second order algorithms like GIANT \cite{giant}, DINGO\cite{dingo}, where the worker machines send (functions of) local gradient and Hessian sequentially; thus ending up communicating twice with the central machine per iteration. Furthermore, we employ a simple norm based thresholding rule to filter-out the Byzantine worker machines. We establish the linear-quadratic rate of convergence of our proposed algorithm and establish that the communication savings and Byzantine resilience attributes only correspond to a small statistical error rate for arbitrary convex loss functions. To the best of our knowledge, this is the first work that addresses the issue of Byzantine resilience in second order distributed optimization. Furthermore, we validate our theoretical results with extensive experiments on synthetically generated and benchmark LIBSVM \cite{libsvm} data-set and demonstrate convergence guarantees.
- North America > United States > Indiana (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
Ubiquitous Symmetry at Critical Points Across Diverse Optimization Landscapes
Symmetry plays a crucial role in understanding the properties of mathematical structures and optimization problems. Recent work has explored this phenomenon in the context of neural networks, where the loss function is invariant under column and row permutations of the network weights. It has been observed that local minima exhibit significant symmetry with respect to the network weights (invariance to row and column permutations). And moreover no critical point was found that lacked symmetry. We extend this line of inquiry by investigating symmetry phenomena in real-valued loss functions defined on a broader class of spaces. We will introduce four more cases: the projective case over a finite field, the octahedral graph case, the perfect matching case, and the particle attraction case. We show that as in the neural network case, all the critical points observed have non-trivial symmetry. Finally we introduce a new measure of symmetry in the system and show that it reveals additional symmetry structures not captured by the previous measure.