AITopics

2504.1953

Country:

North America > United States > Minnesota (0.04)
Asia > Macao (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

arXiv.org Artificial IntelligenceApr-18-2025

Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods

Jin, Ruinan, Cheng, Difei, Qiao, Hong, Shi, Xin, Liu, Shaodong, Zhang, Bo

Stochastic Gradient Descent (SGD) is widely used in machine learning research. Previous convergence analyses of SGD under the vanishing step-size setting typically require Robbins-Monro conditions. However, in practice, a wider variety of step-size schemes are frequently employed, yet existing convergence results remain limited and often rely on strong assumptions. This paper bridges this gap by introducing a novel analytical framework based on a stopping-time method, enabling asymptotic convergence analysis of SGD under more relaxed step-size conditions and weaker assumptions. In the non-convex setting, we prove the almost sure convergence of SGD iterates for step-sizes $ \{ ε_t \}_{t \geq 1} $ satisfying $\sum_{t=1}^{+\infty} ε_t = +\infty$ and $\sum_{t=1}^{+\infty} ε_t^p < +\infty$ for some $p > 2$. Compared with previous studies, our analysis eliminates the global Lipschitz continuity assumption on the loss function and relaxes the boundedness requirements for higher-order moments of stochastic gradients. Building upon the almost sure convergence results, we further establish $L_2$ convergence. These significantly relaxed assumptions make our theoretical results more general, thereby enhancing their applicability in practical scenarios.

artificial intelligence, assumption 3, machine learning, (14 more...)

2504.12601

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.91)

De, Saibal, Knitter, Oliver, Kodati, Rohan, Jayakumar, Paramsothy, Stokes, James, Veerapaneni, Shravan

Variational quantum and neural quantum states algorithms for the linear complementarity problem

arXiv.org Artificial IntelligenceApr-18-2025

Variational quantum algorithms (VQAs) are promising hybrid quantum-classical methods designed to leverage the computational advantages of quantum computing while mitigating the limitations of current noisy intermediate-scale quantum (NISQ) hardware. Although VQAs have been demonstrated as proofs of concept, their practical utility in solving real-world problems -- and whether quantum-inspired classical algorithms can match their performance -- remains an open question. We present a novel application of the variational quantum linear solver (VQLS) and its classical neural quantum states-based counterpart, the variational neural linear solver (VNLS), as key components within a minimum map Newton solver for a complementarity-based rigid body contact model. We demonstrate using the VNLS that our solver accurately simulates the dynamics of rigid spherical bodies during collision events. These results suggest that quantum and quantum-inspired linear algebra algorithms can serve as viable alternatives to standard linear algebra solvers for modeling certain physical systems.

artificial intelligence, linear system, machine learning, (19 more...)

2504.08141

Country: North America > United States > Michigan (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceApr-15-2025

A Nonlinear Hash-based Optimization Method for SpMV on GPUs

Yan, Chen, Diao, Boyu, Liu, Hangda, An, Zhulin, Xu, Yongjun

A Nonlinear Hash-based Optimization Method for SpMV on GPUs Chen Y an a,b, Boyu Diao a,b, Hangda Liu a,b, Zhulin An a,b and Y ongjun Xu a,b a Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China b University of Chinese Academy of Sciences, Beijing, China {yanchen23s, diaoboyu2012, liuhangda21s, anzhulin, xyj } @ict.ac.cn Abstract --Sparse matrix-vector multiplication (SpMV) is a fundamental operation with a wide range of applications in scientific computing and artificial intelligence. However, the large scale and sparsity of sparse matrix often make it a performance bottleneck. In this paper, we highlight the effectiveness of hash-based techniques in optimizing sparse matrix reordering, introducing the Hash-based Partition (HBP) format, a lightweight SpMV approach. HBP retains the performance benefits of the 2D-partitioning method while leveraging the hash transformation's ability to group similar elements, thereby accelerating the pre-processing phase of sparse matrix reordering. Additionally, we achieve parallel load balancing across matrix blocks through a competitive method. Our experiments, conducted on both Nvidia Jetson AGX Orin and Nvidia RTX 4090, show that in the pre-processing step, our method offers an average speedup of 3.53 times compared to the sorting approach and 3.67 times compared to the dynamic programming method employed in Regu2D. Furthermore, in SpMV, our method achieves a maximum speedup of 3.32 times on Orin and 3.01 times on RTX4090 against the CSR format in sparse matrices from the University of Florida Sparse Matrix Collection. I NTRODUCTION Sparse matrix-vector multiplication (SpMV) has a wide range of applications, such as mathematical solutions for sparse linear equations [13], iterative algorithm-solving processing [15] [25], graph processing [9] [14] [24], and weight calculations for forward and backward propagation in neural networks [3] [12] [17] [19], etc. However, SpMV is actually the bottleneck for many algorithms. The sparse matrix used in SpMV has the following characteristics [4]: (1) Sparsity. On the one hand, sparse matrices contain a large number of zero elements.

artificial intelligence, machine learning, matrix block, (15 more...)

2504.0886

Country:

North America > United States (1.00)
Asia > China > Beijing > Beijing (0.44)

Genre: Research Report (0.40)

Industry: Information Technology > Hardware (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.55)

Broadbent, Dominic, Whiteley, Nick, Allison, Robert, Lovett, Tom

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

arXiv.org Machine LearningApr-14-2025

Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of labelled data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a natural metric for comparing conditional distributions. We then derive a consistent estimator for the AMCMD and establish its rate of convergence. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from $\mathcal{O}(n^3)$ to $\mathcal{O}(n)$. Building on this, we extend the idea of KH to develop Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm that constructs a compressed set targeting the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), a straightforward adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression (via JKH and JKIP) and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.

artificial intelligence, machine learning, tr 3, (16 more...)

2504.10139

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > United Kingdom > England > Bristol (0.04)
(5 more...)

Genre:

Research Report (0.64)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Fan, Yefeng, White, Simon

Neural Posterior Estimation on Exponential Random Graph Models: Evaluating Bias and Implementation Challenges

arXiv.org Machine LearningApr-12-2025

Exponential random graph models (ERGMs) are flexible probabilistic frameworks to model statistical networks through a variety of network summary statistics. Conventional Bayesian estimation for ERGMs involves iteratively exchanging with an auxiliary variable due to the intractability of ERGMs, however, this approach lacks scalability to large-scale implementations. Neural posterior estimation (NPE) is a recent advancement in simulation-based inference, using a neural network based density estimator to infer the posterior for models with doubly intractable likelihoods for which simulations can be generated. While NPE has been successfully adopted in various fields such as cosmology, little research has investigated its use for ERGMs. Performing NPE on ERGM not only provides a differing angle of resolving estimation for the intractable ERGM likelihoods but also allows more efficient and scalable inference using the amortisation properties of NPE, and therefore, we investigate how NPE can be effectively implemented in ERGMs. In this study, we present the first systematic implementation of NPE for ERGMs, rigorously evaluating potential biases, interpreting the biases magnitudes, and comparing NPE fittings against conventional Bayesian ERGM fittings. More importantly, our work highlights ERGM-specific areas that may impose particular challenges for the adoption of NPE.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2504.09349

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

arXiv.org Machine LearningApr-11-2025

Riemannian Optimization on Relaxed Indicator Matrix Manifold

Yuan, Jinghui, Xie, Fangyuan, Nie, Feiping, Li, Xuelong

The indicator matrix plays an important role in machine learning, but optimizing it is an NP-hard problem. We propose a new relaxation of the indicator matrix and prove that this relaxation forms a manifold, which we call the Relaxed Indicator Matrix Manifold (RIM manifold). Based on Riemannian geometry, we develop a Riemannian toolbox for optimization on the RIM manifold. Specifically, we provide several methods of Retraction, including a fast Retraction method to obtain geodesics. We point out that the RIM manifold is a generalization of the double stochastic manifold, and it is much faster than existing methods on the double stochastic manifold, which has a complexity of $ \mathcal{O}(n^3) $, while RIM manifold optimization is $ \mathcal{O}(n) $ and often yields better results. We conducted extensive experiments, including image denoising, with millions of variables to support our conclusion, and applied the RIM manifold to Ratio Cut, we provide a rigorous convergence proof and achieve clustering results that outperform the state-of-the-art methods. Our Code in \href{https://github.com/Yuan-Jinghui/Riemannian-Optimization-on-Relaxed-Indicator-Matrix-Manifold}{here}.

data mining, machine learning, manifold, (18 more...)

2503.20505

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Iowa > Johnson County > Iowa City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

arXiv.org Machine LearningApr-10-2025

Local Distance-Preserving Node Embeddings and Their Performance on Random Graphs

Le, My, Ruiz, Luana, Dhara, Souvik

Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as landmark-based algorithms, these embeddings approximate pairwise distances by computing shortest paths from a small subset of reference nodes (i.e., landmarks). Our main theoretical contribution shows that random graphs, such as Erd\H{o}s-R\'enyi random graphs, require lower dimensions in landmark-based embeddings compared to worst-case graphs. Empirically, we demonstrate that the GNN-based approximations for the distances to landmarks generalize well to larger networks, offering a scalable alternative for graph representation learning.

artificial intelligence, data mining, machine learning, (19 more...)

2504.08216

Country:

North America > United States > Oregon (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Horváth, András, Ballarini, Paolo, Cry, Pierre

Probabilistic Process Discovery with Stochastic Process Trees

arXiv.org Artificial IntelligenceApr-9-2025

In order to obtain a stochastic model that accounts for the stochastic aspects of the dynamics of a business process, usually the following steps are taken. Given an event log, a process tree is obtained through a process discovery algorithm, i.e., a process tree that is aimed at reproducing, as accurately as possible, the language of the log. The process tree is then transformed into a Petri net that generates the same set of sequences as the process tree. In order to capture the frequency of the sequences in the event log, weights are assigned to the transitions of the Petri net, resulting in a stochastic Petri net with a stochastic language in which each sequence is associated with a probability. In this paper we show that this procedure has unfavorable properties. First, the weights assigned to the transitions of the Petri net have an unclear role in the resulting stochastic language. We will show that a weight can have multiple, ambiguous impact on the probability of the sequences generated by the Petri net. Second, a number of different Petri nets with different number of transitions can correspond to the same process tree. This means that the number of parameters (the number of weights) that determines the stochastic language is not well-defined. In order to avoid these ambiguities, in this paper, we propose to add stochasticity directly to process trees. The result is a new formalism, called stochastic process trees, in which the number of parameters and their role in the associated stochastic language is clear and well-defined.

artificial intelligence, process tree, transition, (13 more...)

2504.05765

Country: Europe > Italy (0.28)

Genre:

Research Report (0.64)
Workflow (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.64)

Subedi, Unique, Tewari, Ambuj

Operator Learning: A Statistical Perspective

arXiv.org Machine LearningApr-4-2025

Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.

artificial intelligence, machine learning, operator, (18 more...)

2504.03503

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Tripura (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.69)
Education (0.68)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)