Optimization of convex functions is considered a mature field in mathematics. Accordingly, one can use well-established tools and theories to answer the questions described in the last paragraph for optimization. However, optimization of complicated non-convex functions is hard to analyze. Since the optimization of deep neural networks (yes, linear ones also) is non-convex, how can we attempt to answer those questions? One might seek wide empirical evidence that SGD converges to global minima on real-world problems.
Dec-7-2021, 01:55:30 GMT