gaussian initialization
- North America > United States > California > Los Angeles County > Los Angeles (0.29)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits
Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase. Our theoretical results hold for both the local and the global observable cases, where the latter was believed to have vanishing gradients even for very shallow circuits. Experimental results verify our theoretical findings in quantum simulation and quantum chemistry.
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.78)
Reparameterized LLM Training via Orthogonal Equivalence Transformation
Qiu, Zeju, Buchholz, Simon, Xiao, Tim Z., Dax, Maximilian, Schölkopf, Bernhard, Liu, Weiyang
While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Luo, Minrui, Xu, Weihang, Gao, Xiang, Fazel, Maryam, Du, Simon Shaolei
Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights. Here F {C,R} as we consider both real and complex matrices in this paper. Following a long line of works (Arora et al., 2019a; Jiang et al., 2023; Y e & Du, 2021; Chou et al., 2024), we aim to understand the dynamics of gradient descent (GD) on this problem: j = 1,. . Work done while Minrui Luo was visiting the University of Washington. While the model representation power is independent of depth N, the deep matrix factorization problem is naturally motivated by the goal of understanding benefits of depth in deep learning (see, e.g., Arora et al. (2019b)).
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks Spencer Frei and Yuan Cao and Quanquan Gu
Compared with its rapid and widespread adoption, the theoretical understanding of why deep learning works so well has lagged significantly. This is particularly the case in the common setup of an overparameterized network, where the number of parameters in the network greatly exceeds the number of training examples and input dimension.
- North America > United States > California > Los Angeles County > Los Angeles (0.29)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction
Liu, Jianheng, Wan, Yunfei, Wang, Bowen, Zheng, Chunran, Lin, Jiarong, Zhang, Fu
Digital twins are fundamental to the development of autonomous driving and embodied artificial intelligence. However, achieving high-granularity surface reconstruction and high-fidelity rendering remains a challenge. Gaussian splatting offers efficient photorealistic rendering but struggles with geometric inconsistencies due to fragmented primitives and sparse observational data in robotics applications. Existing regularization methods, which rely on render-derived constraints, often fail in complex environments. Moreover, effectively integrating sparse LiDAR data with Gaussian splatting remains challenging. We propose a unified LiDAR-visual system that synergizes Gaussian splatting with a neural signed distance field. The accurate LiDAR point clouds enable a trained neural signed distance field to offer a manifold geometry field, This motivates us to offer an SDF-based Gaussian initialization for physically grounded primitive placement and a comprehensive geometric regularization for geometrically consistent rendering and reconstruction. Experiments demonstrate superior reconstruction accuracy and rendering quality across diverse trajectories. To benefit the community, the codes will be released at https://github.com/hku-mars/GS-SDF.
- Information Technology (0.48)
- Transportation > Ground (0.34)
Reviews: Initialization of ReLUs for Dynamical Isometry
The response did elaborate on the relationship between the approaches to ReLU initialization considered and the earlier portion of the paper - this should be made clearer in the paper. However, as pointed out by the other reviewers, the structure in the proposed Gaussian submatrix initalization has previously been proposed in Balduzzi et al. [2]. It analyzes how signals are transformed through the layers of a feedforward neural network, assuming weights are initialized from Gaussian distributions. Previous work used a mean-field assumption to study these dynamics, and used the results to identify parameters for the Gaussians to ensure stable propagation of the mean of the signal variance through the layers, a necessary condition for training deep networks. This work considers how the distribution of the initial signal variance is transformed through the layers of the network.
Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits
Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase.
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.85)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)