hessian matrix
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point
Couto, Carlos, Mourão, José, Figueiredo, Mário A. T., Ribeiro, Pedro
Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for some classes of teacher-student problems, when the teacher and student networks have matching weights, showing that the smaller eigenvalues of the Hessian determine long-time learning performance. For linear networks, we analytically establish that for large networks the spectrum asymptotically follows a convolution of a scaled chi-square distribution with a scaled Marchenko-Pastur distribution. We numerically analyse the Hessian spectrum for polynomial and other non-linear networks. Furthermore, we show that the rank of the Hessian matrix can be seen as an effective number of parameters for networks using polynomial activation functions. For a generic non-linear activation function, such as the error function, we empirically observe that the Hessian matrix is always full rank.
- Europe > Portugal > Lisbon > Lisbon (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Comparing BFGS and OGR for Second-Order Optimization
Przybysz, Adrian, Kołek, Mikołaj, Sobota, Franciszek, Duda, Jarek
Across standard test functions and ablations with/without line search, OGR variants match or outperform BFGS in final objective and step efficiency, with particular gains in nonconvex landscapes where saddle handling matters. Exact Hessians (via AD) are used only as an oracle baseline to evaluate estimation quality, not to form steps. II. Online Gradient Regression (OGR) Online Gradient Regression (OGR) is a second-order optimization framework that accelerates stochastic gradient descent (SGD) by online least-squares regression of noisy gradients to infer local curvature and the distance to a stationary point [3]. The central assumption is that, in a small neighborhood, the objective F (θ) is well-approximated by a quadratic model, so the gradient varies approximately linearly with the parameters. OGR maintains exponentially weighted statistics of recent (θ t, g t) pairs and updates a local model each iteration at negligible extra cost compared to computing the gradient itself [2], [3]. A. Direct multivariate approach In given time T, based on recent gradients g t R d and positions θ t R d for t < T, we would like to locally approximate behavior with 2nd order polynomial using parametrization: f (θ) = h + 1 2 (θ p) T H(θ p) f = H(θ p) for Hessian H R d d and p R d position of saddle or extremum. For local behavior we will work on averages with weights w t further decreasing exponentially, defining averages: v null t
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)
- Europe > Poland > Greater Poland Province > Poznań (0.04)
An hybrid stochastic Newton algorithm for logistic regression
Bercu, Bernard, Fredes, Luis, Gbaguidi, Eméric
In this paper, we investigate a second-order stochastic algorithm for solving large-scale binary classification problems. We propose to make use of a new hybrid stochastic Newton algorithm that includes two weighted components in the Hessian matrix estimation: the first one coming from the natural Hessian estimate and the second associated with the stochastic gradient information. Our motivation comes from the fact that both parts evaluated at the true parameter of logistic regression, are equal to the Hessian matrix. This new formulation has several advantages and it enables us to prove the almost sure convergence of our stochastic algorithm to the true parameter. Moreover, we significantly improve the almost sure rate of convergence to the Hessian matrix. Furthermore, we establish the central limit theorem for our hybrid stochastic Newton algorithm. Finally, we show a surprising result on the almost sure convergence of the cumulative excess risk.
- Research Report > New Finding (0.85)
- Research Report > Experimental Study (0.61)
- Asia > Singapore (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > Virginia (0.05)
- Oceania > Australia > Queensland (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Switzerland (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective
Network pruning is a commonly used measure to alleviate the storage and computational burden of deep neural networks. However, the fundamental limit of network pruning is still lacking. To close the gap, in this work we'll take a first-principles approach, i.e. we'll directly impose the sparsity constraint on the loss function and leverage the framework of statistical dimension in convex geometry, thus enabling us to characterize the sharp phase transition point, which can be regarded as the fundamental limit of the pruning ratio. Through this limit, we're able to identify two key factors that determine the pruning ratio limit, namely, weight magnitude and network sharpness .
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel (0.04)