AITopics | persistent training

Collaborating Authors

persistent training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative Modeling by Minimizing the Wasserstein-2 Loss

Huang, Yu-Jui, Malik, Zachariah

arXiv.org Machine LearningJul-14-2024

This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss) through a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential associated with the true data distribution and a current estimate of it. A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W_2$ loss, which converges exponentially to the true data distribution. An Euler scheme for the ODE is proposed and it is shown to recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach. In both low- and high-dimensional experiments, our algorithm outperforms Wasserstein generative adversarial networks by increasing the level of persistent training appropriately.

equation, gradient flow, ode, (16 more...)

arXiv.org Machine Learning

2406.13619

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

A Differential Equation Approach for Wasserstein GANs and Beyond

Malik, Zachariah, Huang, Yu-Jui

arXiv.org Machine LearningMay-25-2024

We propose a new theoretical lens to view Wasserstein generative adversarial networks (WGANs). In our framework, we define a discretization inspired by a distribution-dependent ordinary differential equation (ODE). We show that such a discretization is convergent and propose a viable class of adversarial training methods to implement this discretization, which we call W1 Forward Euler (W1-FE). In particular, the ODE framework allows us to implement persistent training, a novel training technique that cannot be applied to typical WGAN algorithms without the ODE interpretation. Remarkably, when we do not implement persistent training, we prove that our algorithms simplify to existing WGAN algorithms; when we increase the level of persistent training appropriately, our algorithms outperform existing WGAN algorithms in both low- and high-dimensional examples.

algorithm, persistent training, wgan algorithm, (16 more...)

arXiv.org Machine Learning

2405.16351

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Persistently Trained, Diffusion-assisted Energy-based Models

Zhang, Xinwei, Tan, Zhiqiang, Ou, Zhijian

arXiv.org Artificial IntelligenceApr-20-2023

Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo.Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation. We propose to introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, through persistent training (i.e., using persistent contrastive divergence) with an enhanced sampling algorithm to properly sample from complex, multimodal distributions. We present results from a 2D illustrative experiment and image experiments and demonstrate that, for the first time for image data, persistently trained EBMs can {\it simultaneously} achieve long-run stability, post-training image generation, and superior out-of-distribution detection.

artificial intelligence, energy function, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2304.10707

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Add feedback

Persistent Neurons

Min, Yimeng

arXiv.org Machine LearningJul-2-2020

Most algorithms used in neural networks(NN)-based leaning tasks are strongly affected by the choices of initialization. Good initialization can avoid sub-optimal solutions and alleviate saturation during training. However, designing improved initialization strategies is a difficult task and our understanding of good initialization is still very primitive. Here, we propose persistent neurons, a strategy that optimizes the learning trajectory using information from previous converged solutions. More precisely, we let the parameters explore new landscapes by penalizing the model from converging to the previous solutions under the same initialization. Specifically, we show that persistent neurons, under certain data distribution, is able to converge to more optimal solutions while initializations under popular framework find bad local minima. We further demonstrate that persistent neurons helps improve the model's performance under both good and poor initializations. Moreover, we evaluate full and partial persistent model and show it can be used to boost the performance on a range of NN structures, such as AlexNet and residual neural network. Saturation of activation functions during persistent training is also studied.

artificial intelligence, machine learning, persistent training, (19 more...)

arXiv.org Machine Learning

2007.01419

Country:

North America > Canada > Quebec (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback