AITopics | gaussian initialization

Collaborating Authors

gaussian initialization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6e2290dbf1e11f39d246e7ce5ac50a1e-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 12:56:25 GMT

gradient descent, initialization, neural network, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits

Neural Information Processing SystemsDec-24-2025, 11:48:03 GMT

Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase. Our theoretical results hold for both the local and the global observable cases, where the latter was believed to have vanishing gradients even for very shallow circuits. Experimental results verify our theoretical findings in quantum simulation and quantum chemistry.

barren plateau, gaussian initialization, quantum circuit, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Reparameterized LLM Training via Orthogonal Equivalence Transformation

Qiu, Zeju, Buchholz, Simon, Xiao, Tim Z., Dax, Maximilian, Schölkopf, Bernhard, Liu, Weiyang

arXiv.org Artificial IntelligenceDec-12-2025

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.08001

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

Neural Information Processing SystemsNov-21-2025, 13:08:39 GMT

However, it is unclear how to achieve dynamical isometry in nonlinear deep networks.

artificial intelligence, dynamical isometry, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Global Convergence of Four-Layer Matrix Factorization under Random Initialization

Luo, Minrui, Xu, Weihang, Gao, Xiang, Fazel, Maryam, Du, Simon Shaolei

arXiv.org Artificial IntelligenceNov-20-2025

Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights. Here F {C,R} as we consider both real and complex matrices in this paper. Following a long line of works (Arora et al., 2019a; Jiang et al., 2023; Y e & Du, 2021; Chou et al., 2024), we aim to understand the dynamics of gradient descent (GD) on this problem: j = 1,. . Work done while Minrui Luo was visiting the University of Washington. While the model representation power is independent of depth N, the deep matrix factorization problem is naturally motivated by the goal of understanding benefits of depth in deep learning (see, e.g., Arora et al. (2019b)).

artificial intelligence, initialization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.09925

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks Spencer Frei and Yuan Cao and Quanquan Gu

Neural Information Processing SystemsOct-2-2025, 23:07:20 GMT

Compared with its rapid and widespread adoption, the theoretical understanding of why deep learning works so well has lagged significantly. This is particularly the case in the common setup of an overparameterized network, where the number of parameters in the network greatly exceeds the number of training examples and input dimension.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction

Liu, Jianheng, Wan, Yunfei, Wang, Bowen, Zheng, Chunran, Lin, Jiarong, Zhang, Fu

arXiv.org Artificial IntelligenceMar-13-2025

Digital twins are fundamental to the development of autonomous driving and embodied artificial intelligence. However, achieving high-granularity surface reconstruction and high-fidelity rendering remains a challenge. Gaussian splatting offers efficient photorealistic rendering but struggles with geometric inconsistencies due to fragmented primitives and sparse observational data in robotics applications. Existing regularization methods, which rely on render-derived constraints, often fail in complex environments. Moreover, effectively integrating sparse LiDAR data with Gaussian splatting remains challenging. We propose a unified LiDAR-visual system that synergizes Gaussian splatting with a neural signed distance field. The accurate LiDAR point clouds enable a trained neural signed distance field to offer a manifold geometry field, This motivates us to offer an SDF-based Gaussian initialization for physically grounded primitive placement and a comprehensive geometric regularization for geometrically consistent rendering and reconstruction. Experiments demonstrate superior reconstruction accuracy and rendering quality across diverse trajectories. To benefit the community, the codes will be released at https://github.com/hku-mars/GS-SDF.

gaussian, regularization, surface reconstruction, (12 more...)

arXiv.org Artificial Intelligence

2503.1017

Genre: Research Report (0.40)

Industry:

Information Technology (0.48)
Transportation > Ground (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reviews: Initialization of ReLUs for Dynamical Isometry

Neural Information Processing SystemsJan-27-2025, 11:09:54 GMT

The response did elaborate on the relationship between the approaches to ReLU initialization considered and the earlier portion of the paper - this should be made clearer in the paper. However, as pointed out by the other reviewers, the structure in the proposed Gaussian submatrix initalization has previously been proposed in Balduzzi et al. [2]. It analyzes how signals are transformed through the layers of a feedforward neural network, assuming weights are initialized from Gaussian distributions. Previous work used a mean-field assumption to study these dynamics, and used the results to identify parameters for the Gaussians to ensure stable propagation of the mean of the signal variance through the layers, a necessary condition for training deep networks. This work considers how the distribution of the initial signal variance is transformed through the layers of the network.

initialization, propagation, signal propagation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback