AITopics | condition hold

Geometric Analysis of Matrix Sensing over Graphs

Neural Information Processing SystemsApr-29-2026, 14:57:15 GMT

In this work, we consider the problem of matrix sensing over graphs (MSoG). As a general case of matrix completion and matrix sensing problems, the MSoG problem has not been analyzed in the literature and the existing results cannot be directly applied to the MSoG problem. This work provides the first theoretical results on the optimization landscape of the MSoG problem. More specifically, we propose a new condition, named the Ω-RIP condition, to characterize the optimization complexity of the problem. In addition, with an improved regularizer of the incoherence, we prove that the strict saddle property holds for the MSoG problem with high probability under the incoherence condition and the Ω-RIP condition, which guarantees the polynomial-time global convergence of saddleavoiding methods. Compared with state-of-the-art results, the bounds in this work are tight up to a constant. Besides the theoretical guarantees, we numerically illustrate the close relation between the Ω-RIP condition and the optimization complexity.

artificial intelligence, machine learning, probability, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

db116b39f7a3ac5366079b1d9fe249a5-Paper.pdf

Neural Information Processing SystemsApr-22-2026, 09:41:55 GMT

artificial intelligence, bernstein condition, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.28)

Industry: Education > Educational Setting > Online (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

be7b70477c8fca697f14b1dbb1c086d1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 20:54:53 GMT

artificial intelligence, machine learning, matrix, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Corrections to the main paper 2 2 B Problem setup 3

Neural Information Processing SystemsFeb-13-2026, 22:02:06 GMT

In the course of preparing the supplementary materials we identified the following two mistakes. For the convenience of the reader we provide the full, corrected table below. C is an appropriatly chosen constant. Frei et al. (2022) Xu & Gu (2023) Theorem 3.1 Theorem 3.6 Theorem 3.8n C log null 1 δ null log null m δ null 1 δ 1 log null m δ null m C 1 log null n δ null log null n δ null log null n δ null log null n δ null γ 1 C 1 n 1 n 1 n 1 nd 1 k γ C 1 nd null log( The same mistake also means that the sentence starting on line 188 "Comparing In order to provide a convenient reference for the reader, we summarize our notation as follows. As such we typically resort to using a generically large enough constant C . For the reader's convenience we recap the data model studied in this work. We assume test data are drawn mutually i.i.d. In regard to the initialization of the network weights, for convenience we assume each neuron's To this end, we introduce the following notation, where p { 1, 1}. P(( B < κT) (T > 0) | w, v > 0) 1 P( T = 0 | w, v > 0) P( B κT | w, v > 0), therefore it suffices to upper bound the two probabilities on the right-hand-side. Using a variant of Hoeffding's bound for sampling without replacement (see Proposition Based on Lemma B.2, the following lemma bounds the probability that " on the counting functions: in particular we write P (i, l) + P (i, l) = P ( i, i) = 1 /2 and hence we conclude p + q = 1 / 2. As a result Observe by the data model, described in Section B.2, that We will often make use of the following similar but more pessimistic bounds on the activations.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

e124f1547f7ac87e33d348b827d4291b-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 10:27:23 GMT

estimator, influence function, theorem 4, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.35)

Add feedback

Appendix of " Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent " A Preliminaries

Neural Information Processing SystemsFeb-11-2026, 10:47:16 GMT

In this section, we first summarize frequently used notations in the following table. Table 4: Frequently used notations.Notation Description C Lemma 7. Let d = 1 . Combining the cases above completes the proof. Subsection B.2 proves several convergence rate lemmas. Subsection B.3 gives some technical We are now ready to prove Theorem 1. Proof of Theorem 1.

artificial intelligence, inequality hold, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

4ac4365b98bc242acd5ab974a05c68a8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 10:47:13 GMT

complex-valued neuron, inequality hold, neuron, (12 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

32bb90e8976aab5298d5da10fe66f21d-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 00:44:37 GMT

condition hold, gcn model, synthetic dataset, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks

Tang, Xuan, Zhang, Han, Cao, Yuan, Zou, Difan

arXiv.org Machine LearningOct-14-2025

Adam is a popular and widely used adaptive gradient method in deep learning, which has also received tremendous focus in theoretical research. However, most existing theoretical work primarily analyzes its full-batch version, which differs fundamentally from the stochastic variant used in practice. Unlike SGD, stochastic Adam does not converge to its full-batch counterpart even with infinitesimal learning rates. We present the first theoretical characterization of how batch size affects Adam's generalization, analyzing two-layer over-parameterized CNNs on image data. Our results reveal that while both Adam and AdamW with proper weight decay $λ$ converge to poor test error solutions, their mini-batch variants can achieve near-zero test error. We further prove Adam has a strictly smaller effective weight decay bound than AdamW, theoretically explaining why Adam requires more sensitive $λ$ tuning. Extensive experiments validate our findings, demonstrating the critical role of batch size and weight decay in Adam's generalization performance.

adamw, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2510.11354

Genre: Research Report > New Finding (1.00)

Technology: