AITopics

2603.22248

Country:

Asia > China > Hong Kong (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 00:13:15 GMT

Appendices

The Hessian of f(Z) can be viewed as an KN KN matrix by vectorizing the matrix Z. For deeper linear networks, it can be shown that flat saddle points exist at the origin, but there are no spurious local minima [34,37]. While most of these results based on the bottom-up approach explain optimization and generalization of certain types of deep neural networks, they provided limited insights into the practice of deep learning. In fact, our proof techniques are inspired by recent results on low-rank matrix recovery [77,80]. Some of the metrics are similar to those presented in [1]. Figure 7 depicts the learning curves in terms of both the training and test accuracy for all three optimization algorithms (i.e., SGD, Adam, and LBFGS).

artificial intelligence, deep learning, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsFeb-7-2026, 16:26:22 GMT

18a9042b3fc5b02fe3d57fea87d6992f-Supplemental.pdf

Projecting this differential equation on the last coordinate givesdHe+1t = dt, that is, He+1t = t. Finally,let (a(n))n N beaCauchysequencein T . Straightforward calculations yield the equality,valid for any x R, tanh(x)=2σ(2x) 1. But,foranyn 1, Next, it is clear that the signature of a constant path is equal to1 = (1,0,...,0,...) which is the nullelementinT . More precisely, fork = 1, C(1;0) = 1 00 = 1 and C(1;1) = 0 01 = 0. Assume that the formula is true at orderk. Then, at order k + 1, there are two cases.

artificial intelligence, deep learning, machine learning, (20 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-7-2026, 07:19:34 GMT

Proof. If there is a bingo on mode-k, them-th row ofthe mode-k expansion ofP is a constant multipleofthe(m 1)-throw,wheremisanumberdeterminedbythebingoposition. Indeed, P

Non-negative matrix factorization with fixed row and columnsums.

artificial intelligence, machine learning, pi1, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Karki, Jony, Huang, Dongzhou, Zhao, Yunpeng

Variational Estimators for Node Popularity Models

arXiv.org Machine LearningNov-25-2025

Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity Model (TNPM). Existing methods, such as the Two-Stage Divided Cosine (TSDC) algorithm, provide a scalable estimation approach but may have limitations in terms of accuracy or applicability across different types of networks. In this paper, we develop a computationally efficient and theoretically justified variational expectation-maximization (VEM) framework for the TNPM. We establish label consistency for the estimated community assignments produced by the proposed variational estimator in bipartite networks. Through extensive simulation studies, we show that our method achieves superior estimation accuracy across a range of bipartite as well as undirected networks compared to existing algorithms. Finally, we evaluate our method on real-world bipartite and undirected networks, further demonstrating its practical effectiveness and robustness.

algorithm, block model, matrix, (16 more...)

2511.17783

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Colorado (0.04)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Neural Information Processing SystemsAug-19-2025, 22:18:36 GMT

fe47dd3fd8e7eb43187d42d65083e383-Supplemental-Conference.pdf

artificial intelligence, machine learning, node, (18 more...)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Oriol, Benoit, Miot, Alexandre

Ledoit-Wolf linear shrinkage with unknown mean

arXiv.org Machine LearningApr-14-2023

The empirical covariance estimator fails when dimension and number of samples are proportional and tend to infinity, settings known as Kolmogorov asymptotics. When the mean is known, Ledoit and Wolf (2004) proposed a linear shrinkage estimator and proved its convergence under those asymptotics. To the best of our knowledge, no formal proof has been proposed when the mean is unknown. To address this issue, we propose a new estimator and prove its quadratic convergence under the Ledoit and Wolf assumptions. Finally, we show empirically that it outperforms other standard estimators.

artificial intelligence, ik 1, machine learning, (18 more...)

2304.07045

Country:

Europe > Netherlands > South Holland > Dordrecht (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

arXiv.org Artificial IntelligenceMar-1-2023

Empowering Networks With Scale and Rotation Equivariance Using A Similarity Convolution

Sun, Zikai, Blu, Thierry

The translational equivariant nature of Convolutional Neural Networks (CNNs) is a reason for its great success in computer vision. However, networks do not enjoy more general equivariance properties such as rotation or scaling, ultimately limiting their generalization performance. To address this limitation, we devise a method that endows CNNs with simultaneous equivariance with respect to translation, rotation, and scaling. Our approach defines a convolution-like operation and ensures equivariance based on our proposed scalable Fourier-Argand representation. The method maintains similar efficiency as a traditional network and hardly introduces any additional learnable parameters, since it does not face the computational issue that often occurs in group-convolution operators. We validate the efficacy of our approach in the image classification task, demonstrating its robustness and the generalization ability to both scaled and rotated inputs. The remarkable success of network architectures can be largely attributed to the availability of large datasets and a large number of parameters, enabling them to "remember" vast amounts of information. On the contrary, humans can learn new concepts with very little data and are able to generalize this knowledge.

artificial intelligence, equivariance, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2303.00326

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Sportisse, Aude, Biernacki, Christophe, Boyer, Claire, Josse, Julie, Lourdelle, Matthieu Marbac, Celeux, Gilles, Laporte, Fabien

Model-based Clustering with Missing Not At Random Data

arXiv.org Machine LearningDec-20-2021

In recent decades, technological advances have made it possible to collect large data sets. In this context, the model-based clustering is a very popular, flexible and interpretable methodology for data exploration in a well-defined statistical framework. One of the ironies of the increase of large datasets is that missing values are more frequent. However, traditional ways (as discarding observations with missing values or imputation methods) are not designed for the clustering purpose. In addition, they rarely apply to the general case, though frequent in practice, of Missing Not At Random (MNAR) values, i.e. when the missingness depends on the unobserved data values and possibly on the observed data values. The goal of this paper is to propose a novel approach by embedding MNAR data directly within model-based clustering algorithms. We introduce a selection model for the joint distribution of data and missing-data indicator. It corresponds to a mixture model for the data distribution and a general MNAR model for the missing-data mechanism, which may depend on the underlying classes (unknown) and/or the values of the missing variables themselves. A large set of meaningful MNAR sub-models is derived and the identifiability of the parameters is studied for each of the sub-models, which is usually a key issue for any MNAR proposals. The EM and Stochastic EM algorithms are considered for estimation. Finally, we perform empirical evaluations for the proposed submodels on synthetic data and we illustrate the relevance of our method on a medical register, the TraumaBase (R) dataset.

algorithm, ik 1, mis, (17 more...)

2112.10425

Country:

Europe > France > Provence-Alpes-Côte d'Azur (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Amovin-Assagba, Martial, Gannaz, Irène, Jacques, Julien

Outlier detection in multivariate functional data through a contaminated mixture model

arXiv.org Machine LearningJun-14-2021

This work is motivated by an application in an industrial context, where the activity of sensors is recorded at a high frequency. The objective is to automatically detect abnormal measurement behaviour. Considering the sensor measures as functional data, we are formally interested in detecting outliers in a multivariate functional data set. Due to the heterogeneity of this data set, the proposed contaminated mixture model both clusters the multivariate functional data into homogeneous groups and detects outliers. The main advantage of this procedure over its competitors is that it does not require us to specify the proportion of outliers. Model inference is performed through an Expectation-Conditional Maximization algorithm, and the BIC criterion is used to select the number of clusters. Numerical experiments on simulated data demonstrate the high performance achieved by the inference algorithm. In particular, the proposed model outperforms competitors. Its application on the real data which motivated this study allows us to correctly detect abnormal behaviours.

algorithm, functional data, outlier, (15 more...)

2106.07222

Country: Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)