AITopics | gradient calculation

Given the ever increasing importance of AD in both communities, adding to the range of scientific computing primitives for which frameworks such as autograd can efficiently compute derivatives through will hopefully spur more widespread use of gradient based learning and inference methods with ODE models and hopefully spur other frameworks with AD capability in the community such as Stan, TensorFlow and Pytorch to implement adjoint sensitivity methods. The specific suggested applications of the'ODE solver modelling primitive' in ODE-Nets, CNFs and L-ODEs are all interesting demonstrations of some of the computational and modelling advantages that come from using a continuous-time ODE mode; formulation, with in particular the memory savings possible by avoiding the need to compute all intermediate states by recomputing trajectories backwards through time being a possible major gain given that device memory is often currently a bottleneck. While'reversing' the integration to recompute the reverse trajectory is an appealing idea, it would have helped to have more discussion of when this would be expected to breakdown - for example it seems likely that highly chaotic dynamical systems would tend to be problematic as even small errors in the initial backwards steps could soon lead to very large divergences in the reversed trajectories compared to the forward ones. It seems like a useful sanity check in an implementation would be to compare the final state of the reversed trajectory to the initial state of the forward trajectory to check how closely they agree. The submission is generally very well written and presented with a clear expository style, with useful illustrative examples given in the experiments to support the claims made and well thought out figures which help to give visual intuitions about the methods and results.

experiment, neural ordinary differential equation, trajectory, (10 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization

Fox, Derek, Hernandez, Samuel, Tong, Qianqian

arXiv.org Machine LearningJul-23-2024

Stochastic optimization algorithms are widely used for large-scale data analysis due to their low per-iteration costs, but they often suffer from slow asymptotic convergence caused by inherent variance. Variance-reduced techniques have been therefore used to address this issue in structured sparse models utilizing sparsity-inducing norms or $\ell_0$-norms. However, these techniques are not directly applicable to complex (non-convex) graph sparsity models, which are essential in applications like disease outbreak monitoring and social network analysis. In this paper, we introduce two stochastic variance-reduced gradient-based methods to solve graph sparsity optimization: GraphSVRG-IHT and GraphSCSG-IHT. We provide a general framework for theoretical analysis, demonstrating that our methods enjoy a linear convergence speed. Extensive experiments validate

algorithm, gradient, raph scsg-iht, (12 more...)

arXiv.org Machine Learning

2407.16968

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > North Carolina (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

Tyurin, Alexander, Richtárik, Peter

arXiv.org Artificial IntelligenceJan-3-2024

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.

communication round, np 2, step size, (13 more...)

arXiv.org Artificial Intelligence

2205.1558

Country: Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A Theoretical Perspective of Machine Learning with Computational Resource Concerns

Zhou, Zhi-Hua

arXiv.org Artificial IntelligenceDec-3-2023

Conventional theoretical machine learning studies generally assume explicitly or implicitly that there are enough or even infinitely supplied computational resources. In real practice, however, computational resources are usually limited, and the performance of machine learning depends not only on how many data have been received, but also on how many data can be handled with the computational resources available. Note that most current ``intelligent supercomputing'' facilities work like exclusive operating systems, where a fixed amount of resources are allocated to a machine learning task without adaptive scheduling strategies considering important factors such as learning performance demands and learning process status. In this article, we introduce the notion of machine learning throughput, define Computational Resource Efficient Learning (CoRE-Learning) and present a theoretical framework that takes into account the influence of computational resources in learning theory. This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless with overwhelming size and it is impractical to assume that all received data can be handled in time. It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.

computational resource, thread throughput, throughput, (14 more...)

arXiv.org Artificial Intelligence

2305.02217

Country: Asia > China > Jiangsu Province > Nanjing (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.70)

Add feedback

Filters

Collaborating Authors

gradient calculation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Gradient Descent for Spiking Neural Networks

ee76626ee11ada502d5dbf1fb5aae4d2-Supplemental.pdf

890e018ca9c879c5ac01757239538f7c-Paper-Conference.pdf

Gradient Descent for Spiking Neural Networks

778ff1fcfb6d6707fc015908a1845b62-Paper-Conference.pdf

890e018ca9c879c5ac01757239538f7c-Paper-Conference.pdf

Reviews: Neural Ordinary Differential Equations

Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

A Theoretical Perspective of Machine Learning with Computational Resource Concerns