AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)

Neural Information Processing SystemsMay-27-2025, 04:37:25 GMT

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.

efficient numerical linear algebra, exploiting compositional structure, linear algebra, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.81)

Neural Information Processing SystemsMay-27-2025, 03:17:37 GMT

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challenging to collect. Researchers thus turn to generation techniques to generate additional MILP instances. However, existing approaches do not take into account specific block structures--which are closely related to the problem formulations--in the constraint coefficient matrices (CCMs) of MILPs. Consequently, they are prone to generate computationally trivial or infeasible instances due to the disruptions of block structures and thus problem formulations.

artificial intelligence, block structure decomposition, milp-studio, (4 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.61)

Neural Information Processing SystemsMay-27-2025, 00:03:23 GMT

Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows

Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the continuous-time Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.

artificial intelligence, dynamic normalizing flow, modeling continuous stochastic process

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Neural Information Processing SystemsMay-26-2025, 20:12:29 GMT

A Combinatorial Algorithm for the Semi-Discrete Optimal Transport Problem

Optimal Transport (OT, also known as the Wasserstein distance) is a popular metric for comparing probability distributions and has been successfully used in many machine-learning applications.In the semi-discrete 2 -Wasserstein problem, we wish to compute the cheapest way to transport all the mass from a continuous distribution \mu to a discrete distribution u in \mathbb{R} d for d\ge 1, where the cost of transporting unit mass between points a and b is d(a,b) a-b 2 . When both distributions are discrete, a simple combinatorial framework has been used to find the exact solution (see e.g. In this paper, we propose a combinatorial framework for the semi-discrete OT, which can be viewed as an extension of the combinatorial framework for the discrete OT but requires several new ideas. We present a new algorithm that given \mu and u in \mathbb{R} 2 and a parameter \varepsilon 0, computes an \varepsilon -additive approximate semi-discrete transport plan in O(n {4}\log n\log \frac{1}{\varepsilon}) time (in the worst case), where n is the support-size of the discrete distribution u and we assume that the mass of \mu inside a triangle can be computed in O(1) time. Our algorithm is significantly faster than the known algorithms, and unlike many numerical algorithms, it does not make any assumptions on the smoothness of \mu .As an application of our algorithm, we describe a data structure to store a large discrete distribution \mu (with support size N) using O(N) space so that, given a query discrete distribution u (with support size k), an \varepsilon -additive approximate transport plan can be computed in O(k {3}\sqrt{N}\log \frac{1}{\varepsilon}) time in 2 dimensions.

artificial intelligence, machine learning, semi-discrete optimal transport problem, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Neural Information Processing SystemsMay-26-2025, 18:56:25 GMT

Globally Q-linear Gauss-Newton Method for Overparameterized Non-convex Matrix Sensing

This paper focuses on the optimization of overparameterized, non-convex low-rank matrix sensing (LRMS)--an essential component in contemporary statistics and machine learning. Recent years have witnessed significant breakthroughs in first-order methods, such as gradient descent, for tackling this non-convex optimization problem. However, the presence of numerous saddle points often prolongs the time required for gradient descent to overcome these obstacles. In this paper, we introduce an approximated Gauss-Newton (AGN) method for tackling the non-convex LRMS problem. Notably, AGN incurs a computational cost comparable to gradient descent per iteration but converges much faster without being slowed down by saddle points. We prove that, despite the non-convexity of the objective function, AGN achieves Q-linear convergence from random initialization to the global optimal solution.

artificial intelligence, machine learning, overparameterized non-convex matrix sensing, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Cortild, Daniel, Ketels, Lucas, Peypouquet, Juan, Garrigos, Guillaume

New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis

arXiv.org Machine LearningMay-26-2025

The analysis of Stochastic Gradient Descent (SGD) often relies on making some assumption on the variance of the stochastic gradients, which is usually not satisfied or difficult to verify in practice. This paper contributes to a recent line of works which attempt to provide guarantees without making any variance assumption, leveraging only the (strong) convexity and smoothness of the loss functions. In this context, we prove new theoretical bounds derived from the monotonicity of a simple Lyapunov energy, improving the current state-of-the-art and extending their validity to larger step-sizes. Our theoretical analysis is backed by a Performance Estimation Problem analysis, which allows us to claim that, empirically, the bias term in our bounds is tight within our framework.

artificial intelligence, assumption, machine learning, (17 more...)

2505.17965

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Kawano, Keisuke, Kutsuna, Takuro, Hayashi, Naoki, Esaki, Yasushi, Tanaka, Hidenori

CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots

arXiv.org Machine LearningMay-26-2025

In many real-world scenarios, such as single-cell RNA sequencing, data are observed only as discrete-time snapshots spanning finite time intervals and subject to noisy timestamps, with no continuous trajectories available. Recovering the underlying continuous-time dynamics from these snapshots with coarse and noisy observation times is a critical and challenging task. We propose Continuous-Time Optimal Transport Flow (CT-OT Flow), which first infers high-resolution time labels via partial optimal transport and then reconstructs a continuous-time data distribution through a temporal kernel smoothing. This reconstruction enables accurate training of dynamics models such as ODEs and SDEs. CT-OT Flow consistently outperforms state-of-the-art methods on synthetic benchmarks and achieves lower reconstruction errors on real scRNA-seq and typhoon-track datasets. Our results highlight the benefits of explicitly modeling temporal discretization and timestamp uncertainty, offering an accurate and general framework for bridging discrete snapshots and continuous-time processes.

artificial intelligence, ct -ot flow, machine learning, (19 more...)

2505.17354

Country:

Asia > Southeast Asia (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Communications > Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Hristopulos, Dionissios T., Baxevani, Anastassia, Kaniadakis, Giorgio

Stochastic Processes with Modified Lognormal Distribution Featuring Flexible Upper Tail

arXiv.org Machine LearningMay-22-2025

Asymmetric, non-Gaussian probability distributions are often observed in the analysis of natural and engineering datasets. The lognormal distribution is a standard model for data with skewed frequency histograms and fat tails. However, the lognormal law severely restricts the asymptotic dependence of the probability density and the hazard function for high values. Herein we present a family of three-parameter non-Gaussian probability density functions that are based on generalized kappa-exponential and kappa-logarithm functions and investigate its mathematical properties. These kappa-lognormal densities represent continuous deformations of the lognormal with lighter right tails, controlled by the parameter kappa. In addition, bimodal distributions are obtained for certain parameter combinations. We derive closed-form analytic expressions for the main statistical functions of the kappa-lognormal distribution. For the moments, we derive bounds that are based on hypergeometric functions as well as series expansions. Explicit expressions for the gradient and Hessian of the negative log-likelihood are obtained to facilitate numerical maximum-likelihood estimates of the kappa-lognormal parameters from data. We also formulate a joint probability density function for kappa-lognormal stochastic processes by applying Jacobi's multivariate theorem to a latent Gaussian process. Estimation of the kappa-lognormal distribution based on synthetic and real data is explored. Furthermore, we investigate applications of kappa-lognormal processes with different covariance kernels in time series forecasting and spatial interpolation using warped Gaussian process regression. Our results are of practical interest for modeling skewed distributions in various scientific and engineering fields.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

2505.14713

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.71)
(3 more...)

Lai, Jeffrey, Bao, Anthony, Gilpin, William

Panda: A pretrained forecast model for universal representation of chaotic dynamics

arXiv.org Machine LearningMay-21-2025

Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained separately on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear DynAmics. We train Panda on a novel synthetic, extensible dataset of $2 \times 10^4$ chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen real world chaotic systems, and nonlinear resonance patterns in cross-channel attention heads. Despite having been trained only on low-dimensional ordinary differential equations, Panda spontaneously develops the ability to predict partial differential equations without retraining. We demonstrate a neural scaling law for differential equations, underscoring the potential of pretrained models for probing abstract mathematical domains like nonlinear dynamics.

evolutionary algorithm, large language model, machine learning, (19 more...)

2505.13755

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > India > Tripura (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)
(2 more...)