AITopics | dxdy

Collaborating Authors

dxdy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

Neural Information Processing SystemsJun-16-2026, 16:15:48 GMT

Wasserstein gradient flow (WGF) is a common method to perform optimization over the space of probability measures. While WGF is guaranteed to converge to a first-order stationary point, for nonconvex functionals the converged solution does not necessarily satisfy the second-order optimality condition; i.e., it could converge to a saddle point. In this work, we propose a new algorithm for probability measure optimization, perturbed Wasserstein gradient flow (PWGF), that achieves second-order optimality for general nonconvex objectives. PWGF enhances WGF by injecting noisy perturbations near saddle points via a Gaussian process-based scheme. By pushing the measure forward along a random vector field generated from a Gaussian process, PWGF helps the solution escape saddle points efficiently by perturbing the solution towards the smallest eigenvalue direction of the Wasserstein Hessian. We theoretically derive the computational complexity for PWGF to achieve a second-order stationary point. Furthermore, we prove that PWGF converges to a global optimum in polynomial time for strictly benign objectives.

artificial intelligence, dxdy, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

min

Neural Information Processing SystemsFeb-12-2026, 06:11:02 GMT

LetAbean nHermitian matrixandletBbea(n 1) (n 1)matrixwhich is constructed by deleting thei-th row andi-th column ofA. Denote thatΦ = [ϕ(x1),...,ϕ(xn)] Rn D, where D is the dimension of feature spaceH. Performing rank-n singular value decomposition (SVD) onΦ, we have Φ = HΣV, where H Rn n, Σ Rn n is a diagonal matrix whose diagonal elements are the singular values of Φ,andV RD n. F(α) in Eq.(21) is proven differentiable and thep-th component of the gradient is F(α) αp = Then, a reduced gradient descent algorithm [26] is adopted to optimize Eq.(21). The three deep neural networks are pre-trained on the ImageNet[5].

approx, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Andforanyα>0,theLaplaciansatisfies Gλ c1 2α Gλ+2Md whereM =c23α/c1+c22. 2. IfG (x)=kg (x)k2 isC1,then k G (x)k c2kg (x)k,g (x) > G (x) c1G (x). Proof. Claim1.Note Gλ =2 2Fλgλandthat 1 2 c1I 2Fλ= mX

Neural Information Processing SystemsFeb-9-2026, 11:57:32 GMT

As a remark, using the Krylov-Bogoliubov existence theorem (see Corollary 11.8 of [6]), fixed points to(4)exist as long as one can show{ρt,t 0}istight. The learning rate is set differently foreachtask. Obviously, the HV indicator (Eq.(10)) can also be used as an objective function for optimizing solution sets. For example, [25, 7] greedily add new points to obtain the highest expected HV improvement. However, the landscape of the HV indicator is piece-wise constant (similar to the 0-1 loss in classification) and is difficult to optimize with gradient descent. Particularly, for all the dominated points inthe solution set, their gradient iszero.

artificial intelligence, machine learning, seefig, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

Yamamoto, Naoya, Kim, Juno, Suzuki, Taiji

arXiv.org Machine LearningSep-23-2025

dxdy, perturbation, thre, (12 more...)

arXiv.org Machine Learning

2509.16974

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

ROCK: A variational formulation for occupation kernel methods in Reproducing Kernel Hilbert Spaces

Rielly, Victor, Lahouel, Kamel, Nguyen, Chau, Jedynak, Bruno

arXiv.org Machine LearningMar-17-2025

We present a Representer Theorem result for a large class of weak formulation problems. We provide examples of applications of our formulation both in traditional machine learning and numerical methods as well as in new and emerging techniques. Finally we apply our formulation to generalize the multivariate occupation kernel (MOCK) method for learning dynamical systems from data proposing the more general Riesz Occupation Kernel (ROCK) method. Our generalized methods are both more computationally efficient and performant on most of the benchmarks we test against.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Machine Learning

2503.13791

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sinkhorn algorithms and linear programming solvers for optimal partial transport problems

Bai, Yikun

arXiv.org Artificial IntelligenceJul-8-2024

In this note, we generalize the classical optimal partial transport (OPT) problem by modifying the mass destruction/creation term to function-based terms, introducing what we term ``generalized optimal partial transport'' problems. We then discuss the dual formulation of these problems and the associated Sinkhorn solver. Finally, we explore how these new OPT problems relate to classical optimal transport (OT) problems and introduce a linear programming solver tailored for these generalized scenarios.

opt problem, ptv, transport problem, (13 more...)

arXiv.org Artificial Intelligence

2407.06481

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)

Add feedback

Understanding Test-Time Augmentation

Kimura, Masanari

arXiv.org Artificial IntelligenceFeb-10-2024

Test-Time Augmentation (TTA) is a very powerful heuristic that takes advantage of data augmentation during testing to produce averaged output. Despite the experimental effectiveness of TTA, there is insufficient discussion of its theoretical aspects. In this paper, we aim to give theoretical guarantees for TTA and clarify its behavior.

augmentation, data augmentation, tta, (15 more...)

arXiv.org Artificial Intelligence

2402.06892

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Learning High-Dimensional Nonparametric Differential Equations via Multivariate Occupation Kernel Functions

Rielly, Victor, Lahouel, Kamel, Lew, Ethan, Wells, Michael, Haney, Vicky, Jedynak, Bruno

arXiv.org Artificial IntelligenceJun-16-2023

Learning a nonparametric system of ordinary differential equations (ODEs) from $n$ trajectory snapshots in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach to learning using the implicit formulation provided by vector-valued Reproducing Kernel Hilbert Spaces. By rewriting the ODEs in a weaker integral form, which we subsequently minimize, we derive our learning algorithm. The minimization problem's solution for the vector field relies on multivariate occupation kernel functions associated with the solution trajectories. We validate our approach through experiments on highly nonlinear simulated and real data, where $d$ may exceed 100. We further demonstrate the versatility of the proposed method by learning a nonparametric first order quasilinear partial differential equation.

artificial intelligence, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2306.10189

Country:

North America > The Bahamas (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Wisconsin (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.71)

Add feedback

The Conditional Cauchy-Schwarz Divergence with Applications to Time-Series Data and Sequential Decision Making

Yu, Shujian, Li, Hongming, Løkse, Sigurd, Jenssen, Robert, Príncipe, José C.

arXiv.org Artificial IntelligenceJan-21-2023

The Cauchy-Schwarz (CS) divergence was developed by Pr\'{i}ncipe et al. in 2000. In this paper, we extend the classic CS divergence to quantify the closeness between two conditional distributions and show that the developed conditional CS divergence can be simply estimated by a kernel density estimator from given samples. We illustrate the advantages (e.g., the rigorous faithfulness guarantee, the lower computational complexity, the higher statistical power, and the much more flexibility in a wide range of applications) of our conditional CS divergence over previous proposals, such as the conditional KL divergence and the conditional maximum mean discrepancy. We also demonstrate the compelling performance of conditional CS divergence in two machine learning tasks related to time series data and sequential inference, namely the time series clustering and the uncertainty-guided exploration for sequential decision making.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2301.0897

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Norway > Northern Norway > Troms > Tromsø (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning low-dimensional state embeddings and metastable clusters from time series data

Sun, Yifan, Duan, Yaqi, Gong, Hao, Wang, Mengdi

arXiv.org Machine LearningJun-1-2019

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank. In the spirit of diffusion map, we propose an efficient method for learning a low-dimensional state embedding and capturing the process's dynamics. This idea also leads to a kernel reshaping method for more accurate nonparametric estimation of the transition function. State embedding can be used to cluster states into metastable sets, thereby identifying the slow dynamics. Sharp statistical error bounds and misclassification rate are proved. Experiment on a simulated dynamical system shows that the state clustering method indeed reveals metastable structures. We also experiment with time series generated by layers of a Deep-Q-Network when playing an Atari game. The embedding method identifies game states to be similar if they share similar future events, even though their raw data are far different.

artificial intelligence, kernel, machine learning, (19 more...)

arXiv.org Machine Learning

1906.00302

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback