AITopics

2101.12176

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Mekkaoui, Khaoula El, Mesquita, Diego, Blomstedt, Paul, Kaski, Samuel

Distributed stochastic gradient MCMC for federated learning

arXiv.org Machine LearningJan-27-2021

Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), enable large-scale posterior inference by leveraging noisy but cheap gradient estimates. However, when federated data are non-IID, the variance of distributed gradient estimates is amplified compared to its centralized version, and delayed communication rounds lead chains to diverge from the target posterior. In this work, we introduce the concept of conducive gradients, zero-mean stochastic gradients that serve as a mechanism for sharing probabilistic information between data shards. We propose a novel stochastic gradient estimator that incorporates the conducive gradients, and we show that it improves convergence on federated data when compared to distributed SGLD (DSGLD). We evaluate, conducive gradient DSGLD (CG-DSGLD) on metric learning and deep MLPs tasks. Experiments show that it outperforms standard DSGLD for non-IID federated data.

cg-dsgld, dsgld, gradient, (13 more...)

2004.11231

Country: North America > United States > New York (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Hibat-Allah, Mohamed, Inack, Estelle M., Wiersema, Roeland, Melko, Roger G., Carrasquilla, Juan

Variational Neural Annealing

arXiv.org Artificial IntelligenceJan-25-2021

Many important challenges in science and technology can be cast as optimization problems. When viewed in a statistical physics framework, these can be tackled by simulated annealing, where a gradual cooling procedure helps search for groundstate solutions of a target Hamiltonian. While powerful, simulated annealing is known to have prohibitively slow sampling dynamics when the optimization landscape is rough or glassy. Here we show that by generalizing the target distribution with a parameterized model, an analogous annealing framework based on the variational principle can be used to search for groundstate solutions. Modern autoregressive models such as recurrent neural networks provide ideal parameterizations since they can be exactly sampled without slow dynamics even when the model encodes a rough landscape. We implement this procedure in the classical and quantum settings on several prototypical spin glass Hamiltonians, and find that it significantly outperforms traditional simulated annealing in the asymptotic limit, illustrating the potential power of this yet unexplored route to optimization.

artificial intelligence, ground state, machine learning, (16 more...)

doi: 10.1038/s42256-021-00401-3

2101.10154

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Styp-Rekowski, Kevin, Schmidt, Florian, Kao, Odej

Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series

arXiv.org Machine LearningJan-25-2021

Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive Online Gradient Descent optimization method, which became popular for learning of neural networks in recent years. For the iterative training of such models, we propose a new approach combining different Online Gradient Descent learners (such as Adam, AMSGrad, Adagrad, Nesterov) to achieve fast convergence. The evaluation on synthetic data and experimental datasets show that the proposed approach outperforms the existing methods resulting in an overall lower prediction error.

arima model, online arima model, optimizer, (14 more...)

2101.10037

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.91)
North America > United States (0.14)
Europe > Germany > Berlin (0.05)
(3 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Machine LearningJan-23-2021

On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

Nguyen, Quynh

This paper studies the global convergence of gradient descent for deep ReLU networks under the square loss. For this setting, the current state-of-the-art results show that gradient descent converges to a global optimum if the widths of all the hidden layers scale at least as $\Omega(N^8)$ ($N$ being the number of training samples). In this paper, we discuss a simple proof framework which allows us to improve the existing over-parameterization condition to linear, quadratic and cubic widths (depending on the type of initialization scheme and/or the depth of the network).

gradient descent, initialization, theorem 2, (11 more...)

2101.09612

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

arXiv.org Artificial IntelligenceJan-21-2021

Noisy intermediate-scale quantum (NISQ) algorithms

Bharti, Kishor, Cervera-Lierta, Alba, Kyaw, Thi Ha, Haug, Tobias, Alperin-Lea, Sumner, Anand, Abhinav, Degroote, Matthias, Heimonen, Hermanni, Kottmann, Jakob S., Menke, Tim, Mok, Wai-Keong, Sim, Sukin, Kwek, Leong-Chuan, Aspuru-Guzik, Alán

A universal fault-tolerant quantum computer that can solve efficiently problems such as integer factorization and unstructured database search requires millions of qubits with low error rates and long coherence times. While the experimental advancement towards realizing such devices will potentially take decades of research, noisy intermediate-scale quantum (NISQ) computers already exist. These computers are composed of hundreds of noisy qubits, i.e. qubits that are not error-corrected, and therefore perform imperfect operations in a limited coherence time. In the search for quantum advantage with these devices, algorithms have been proposed for applications in various disciplines spanning physics, machine learning, quantum chemistry and combinatorial optimization. The goal of such algorithms is to leverage the limited available resources to perform classically challenging tasks. In this review, we provide a thorough summary of NISQ computational paradigms and algorithms. We discuss the key structure of these algorithms, their limitations, and advantages. We additionally provide a comprehensive overview of various benchmarking and software tools useful for programming and testing NISQ devices.

neural network, output generation problem, upstream oil & gas, (26 more...)

2101.08448

Country:

Europe > United Kingdom > England (0.45)
Europe > Netherlands (0.27)
Asia > Middle East (0.14)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Experimental Study (0.45)

Industry:

Banking & Finance (1.00)
Energy > Oil & Gas > Upstream (0.67)
Government (0.67)

Technology:

Information Technology > Software (1.00)
Information Technology > Mathematics of Computing (1.00)
Information Technology > Hardware (1.00)
(6 more...)

Cao, Yongcan, Zhan, Huixin

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Journal of Artificial Intelligence ResearchJan-20-2021

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple (yet often conflicting) objectives. A typical approach to obtain the optimal policies is to first construct a loss function based on the scalarization of individual objectives and then derive optimal policies that minimize the scalarized loss function. Albeit simple and efficient, the typical approach provides no insights/mechanisms on the optimization of multiple objectives due to the lack of ability to quantify the inter-objective relationship. To address the issue, we propose to develop a new efficient gradient-based multi-objective reinforcement learning approach that seeks to iteratively uncover the quantitative inter-objective relationship via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that quantify the inter-objective relationship. Then we construct an actor and a multi-objective critic that can co-learn the policy and the multi-objective vector value function. Finally, the weight discovery process and the policy and vector value function learning process can be iteratively executed to yield stable weight-vector sets and policies. To validate the effectiveness of the proposed approach, we present a quantitative evaluation of the approach based on three case studies.

algorithm, objective, optimization, (11 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12270

AI Access Foundation

12270

Journal of Artificial Intelligence Research

Country:

North America > United States > Texas > Bexar County > San Antonio (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.67)
Overview (0.46)

Industry:

Leisure & Entertainment > Games (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJan-20-2021

A New Knowledge Gradient-based Method for Constrained Bayesian Optimization

Chen, Wenjie, Liu, Shengcai, Tang, Ke

Complex systems optimization is a critical challenge in real production and also the hot spot of academic research. The key factors that raise systems' complexity include (but are not limited to): inestimable structures, computationally intensive evaluations, stochastic noise, and multiple key performance indicators (KPIs). A typical example is a simulation-based optimization for an emergency department. Suppose we aim to optimize the patients' flow cost and departments' closeness by determining the corridors' widths via a simulation model. Due to the characteristics of the simulation model, there exists no explicit expression of the input and output, and the estimations are time-consuming and noise-corrupted. Furthermore, the multilevel performance indicators also lay a burden on optimization problems.

acquisition function, bayesian optimization, optimization, (12 more...)

2101.08743

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

arXiv.org Machine LearningJan-18-2021

Householder Dice: A Matrix-Free Algorithm for Simulating Dynamics on Gaussian and Random Orthogonal Ensembles

Lu, Yue M.

In the study of large random systems, researchers often need to simulate dynamics in the form of iterated matrix-vector multiplications interspersed with nonlinear operations. Examples include message passing algorithms, gradient descent, and matrix iterative methods for extremal eigenvalue calculations. This paper proposes a new algorithm, named Householder Dice (HD), for simulating such dynamics on several random matrix ensembles with translation-invariant properties. Examples include the Gaussian ensemble, the Haar-distributed random orthogonal ensemble, and their complex-valued counterparts. A "direct" approach to the simulation, where one first generates a dense $n \times n$ matrix from the ensemble, requires at least $\mathcal{O}(n^2)$ resource in space and time. The HD algorithm overcomes this $\mathcal{O}(n^2)$ bottleneck by using the principle of deferred decisions: rather than fixing the entire random matrix in advance, it lets the randomness unfold with the dynamics. Key to this matrix-free construction is an adaptive and recursive construction of (random) Householder reflectors. These orthogonal transformations exploit the group symmetry of the matrix ensembles, while simultaneously maintaining the statistical correlations induced by the dynamics. The memory and computation costs of the HD algorithm are $\mathcal{O}(nT)$ and $\mathcal{O}(nT^2)$, respectively, with $T$ being the number of iterations. When $T \ll n$, which is nearly always the case in practice, the HD algorithm leads to significant reductions in runtime and memory footprint. Numerical results demonstrate the promise of the new algorithm as a new computational tool in the study of high-dimensional random systems.

algorithm, ginibre, matrix, (16 more...)

2101.07464

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Artificial IntelligenceJan-17-2021

Guided parallelized stochastic gradient descent for delay compensation

Sharma, Anuraganand

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate for the delay. The experimental results demonstrate that our proposed approach has been able to mitigate the impact of delay for the quality of classification accuracy. The guided approach with SSGD clearly outperforms sequential SGD and even achieves the accuracy close to sequential SGD for some benchmark datasets.

algorithm, artificial intelligence, machine learning, (16 more...)

doi: 10.1016/j.asoc.2021.107084

2101.07259

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Finland > Paijanne Tavastia > Lahti (0.04)
Oceania > Fiji (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)